Downloading and installing software for the rMLST comparison
Posted on October 7, 2022

These are some notes for how to install software and fetch the data required for the rMLST comparison in Acintobacter.
Setting up conda env (to install software)
Here are steps for setting up a conda to manage your software installations.
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
chmod +x ./Miniconda3-py38_4.12.0-Linux-x86_64.sh
./Miniconda3-py38_4.12.0-Linux-x86_64.sh
~/miniconda3/bin/conda init
source ~/.bashrc
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda create -y -c conda-forge -n rmlst mamba
conda activate rmlst
Installing software
Using conda makes it easy to install bioinformatics software.
mamba install -y -c bioconda rapidnj cgmlst-dists mashtree
mamba install -y -c conda-forge pip notebook nb_conda_kernels jupyter_contrib_nbextensions
pip install grapetree
Download and 'install' NCBI datasets
You can fetch genome assemblies from NCBI using the datasets tool, which is available at https://www.ncbi.nlm.nih.gov/datasets/docs/v1/download-and-install/
To use it, as I have done below, you need a text file of all the accession codes you wish to fetch (I have called it get_ass.txt).
wget https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets
chmod +x ./datasets
./datasets download genome accession --inputfile get_ass.txt --exclude-protein --exclude-rna --include-gbff --exclude-genomic-cds --exclude-seq
unzip ncbi_dataset.zip
For the Acintobacter dataset I am using, some of the are not available ... for reasons.
Some of the assemblies provided ('GCA_000580355.1', 'GCA_000580435.1') are valid NCBI Assembly Accessions,
but are not in scope for NCBI Datasets.
You can pull the assemblies out of the downloaded zip file to where ever you want. By default, it be in ncbi_dataset/data.
from os import mkdir, path, listdir , getcwd
import shutil
getcwd()
if not path.exists('gen_fasta'):
mkdir('gen_fasta')
for fasta_path, name in [(path.join('ncbi_dataset/data',x), x) for x in listdir('ncbi_dataset/data') if x.startswith('GCA')]:
fasta_file = [path.join(fasta_path, x ) for x in listdir(fasta_path) if x.endswith('.fna')]
if fasta_file:
shutil.copy(fasta_file[0], f'gen_fasta/{name}.fasta')