Nabil-Fareed Alikhan

Bioinformatics · Microbial Genomics · Software Development

My Research & Impact

Pathogens respond to the environments we create. When we introduce a vaccine into poultry production, other lineages fill the niche. When we use antibiotics at scale, resistance genes spread across food chains and into clinical settings. When trade routes open, diseases travel with livestock and people. We have been shaping microbial evolution for millennia without realising it, and microbial genomes have been recording everything.

That is the thread that runs through my work. I recovered Salmonella Paratyphi C from the bones of a 12th-century girl buried in Trondheim. The same lineage turns up in 16th-century Mexico, almost certainly introduced from the Old World with European contact. The genome is largely stable across 800 years, but the regions that govern pathogenicity change over time. That kind of evidence does not come from a single outbreak investigation. It comes from reading pathogen genomes across deep time and connecting them to what humans were doing: where they travelled, what they farmed, what medicines they used.

I work at the overlap between microbial population genomics, comparative genomics, and data science. The core question is: what does genomic variation mean, and how do we make it usable? That means working at multiple scales simultaneously: macro (agricultural transitions, centuries of trade) and micro (a vaccination programme, a change in antibiotic policy), and building the tools that make large, heterogeneous datasets interpretable. The bottleneck in modern microbial genomics is no longer sequencing; it is making data explorable, comparable, and defensible. Across EnteroBase, Pathogenwatch, AMRwatch, COG-UK, and PATH-SAFE, I have contributed to the infrastructure that makes that possible.

Research vision

My research vision is to connect genomic variation to real-world dynamics: what changes, why it changes, how it spreads, and what it means in context. I pursue this as a single programme that links evolutionary understanding to practical interpretation, so methods, platforms, and biological insight reinforce each other.

To do this, I work across three connected themes:

  1. Population structure and evolution: understanding how lineages emerge, diversify, persist, and move through hosts and environments.
  2. Comparative genomics at scale: developing approaches that make differences and relationships visible and interpretable across very large datasets, because interpretation is often limited by what we can reliably compare.
  3. Infrastructure and interfaces: building resources that make large genomic datasets usable for the people who need them most: researchers, public health teams, and students learning how to reason with genomics.

My goal is work that explains biology, supports decisions, and leaves behind tools that others can build on.

Why this matters in the real world

Microbial evolution is a record of our changing world. The pressures we create, medical, agricultural, cultural, and ecological, leave traces in pathogen genomes. When we read those traces carefully, we can start to understand our long back-and-forth relationship with microbes: why some lineages emerge when they do, how resistance spreads, how the same species can be commensal in one context and lethal in another. That is the question I find most compelling, and the reason I keep coming back to large collections and long timescales.

The same understanding that clarifies evolutionary biology also improves how we respond to real problems. In public health, that means detecting outbreaks, identifying transmission links, and separating genuine signals from sequencing noise. In food safety, it means tracing contamination routes and linking the same organism across farms, processing chains, and clinical settings. Through the PATH-SAFE Consortium, I contributed to national recommendations for genomic surveillance of foodborne pathogens published by the Food Standards Agency in 2025. For antimicrobial resistance, it means connecting resistance mechanisms to lineage spread and selective pressures, so surveillance becomes something you can act on rather than just monitor.

These questions have immediate stakes. In 2026, a MenB outbreak centred on a Canterbury nightclub, 23 cases and 2 deaths, was investigated using whole genome sequencing, cgMLST phylogenetics, and vaccine antigen analysis. I am a named member of the UKHSA TARZET Invasive Meningococcal Disease Technical Group convened to respond to that outbreak. Ultimately, I want microbial genomics to be something we can rely on operationally: faster interpretation, clearer uncertainty, and results that remain comparable across labs, countries, and years.

What's next?

Over the next few years I’m focusing on a clear goal: make microbial genomics more trustworthy, comparable, and usable at the point it meets real-world decisions, from public health surveillance to research at population scale. That means strengthening the foundations (quality and standards), pushing methods that scale to truly massive collections, and investing in training so capability grows alongside data.

  • Quality and standards for public health bioinformatics. I’m developing practical best-practice guidance and quality-control standards for bacterial genomics workflows, with an emphasis on approaches that are auditable and deployable across organisations. A key strand of this work is QualiBact (https://happykhan.github.io/qualibact/), aimed at making QC expectations explicit so results are more reliable and comparable across labs, time, and platforms.

  • Next-generation population genomics at 100,000s of genomes. Building on large-scale comparative frameworks (including EnteroBase, cgMLST, and GrapeTree), I’m working towards hierarchical approaches and new visual strategies that keep analyses interpretable as collections grow. The aim is to move beyond “one giant tree” towards methods that support fast navigation, stable clustering, and clear communication of uncertainty, so scale does not come at the cost of biological meaning.

  • Reclassifying E. coli pathogenicity at population scale. The current pathovar system (STEC, EPEC, UPEC, ExPEC, and others) was built before whole-genome sequencing existed and is no longer adequate. Virulence genes sit on mobile elements that move between lineages, producing hybrid strains that current categories miss. There are now approaching one million E. coli genomes in public databases. I want to use that resource to build a population-level genomic framework for E. coli pathogenicity that replaces pre-genomic assumptions with evidence grounded in population structure, and to turn that framework into an open resource for food safety surveillance, clinical diagnostics, and public health.

  • Browser-based bioinformatics. Through GenomicX, I’m exploring what becomes possible when established bioinformatics tools (minimap2, Mash, samtools) run entirely in the browser via WebAssembly. No installation, no server uploads, no command line. This is partly a technical experiment and partly a practical question: how much of the access barrier in computational biology can be eliminated by moving computation to where the data already is?

  • Capacity building that makes genomics sustainable. I’m expanding training materials and curricula that take people from fundamentals to real-world analysis: reproducible workflows, version control, defensible interpretation, and good metadata practice. The goal is practical competence for the next generation of microbial genomicists and public health practitioners, so expertise scales, not just sequencing.

Selected Contributions

A selection of tools, platforms, and projects that make large-scale microbial genomics usable in research, public health, and teaching.

Interested in collaborating?

  • If you are building national genomic surveillance systems...
  • If you need to interpret & visualize complex population datasets...
  • If you are looking for experienced bioinformatics leadership...
Get in touch