Nabil-Fareed Alikhan

Bioinformatics · Microbial Genomics · Software Development

Episode 29: Mashtree software deep dive

📅10 September 2020
⏱️00:19:12
🎙️Microbial Bioinformatics
Listen on SoundCloudDownload MP3📝View Transcript

In this microbinfie podcast episode, Lee Katz discusses MASHtree, a rapid bioinformatics tool for comparing microbial genomes and generating approximate phylogenetic trees using MinHash algorithms.

The podcast discusses the development and implementation of MASHtree, a tool leveraging the MASH and MinHash algorithms to rapidly compare microbial genomes and build phylogenetic trees.

MASHtree uses the concept of k-mers to minimize the data size, transforming it into a smaller footprint (e.g., 15 MB raw data to an 8 KB sketch file), which significantly accelerates genomic comparisons.

The core of the algorithm involves converting the k-mers into integers and retaining the first 1,000, creating a much smaller, manageable dataset ideal for rapid computation.

MASH distances generated by the tool align comparably to Average Nucleotide Identity (ANI), providing a quality approximation of phylogenetic relationships without needing fully assembled genomes.

MASHtree utilizes a neighbor-joining algorithm to assemble the tree, which provides an approximation rather than a full phylogeny, focusing on clustering genomes based on proximity and similarity without inferring evolutionary ancestry.

Tools and Methodologies:

Challenges and Limitations:

Future Directions and Improvements: