My Research & Impact

Microbes are everywhere, and most of the time they live alongside us without causing problems. So why do some bacteria cause disease—and why can the same species behave differently in different people, places, or years? The short answer is evolution: microbial genomes change fast, and small genetic differences can shift traits like virulence, transmission, persistence, and antimicrobial resistance. My work is about understanding what that variation means in practice, and building the tools needed to turn genomic change into reliable, usable insight.

I work at the overlap between microbial population genomics, comparative genomics, and data science. My core question is: microbes evolve quickly—so what does that variation mean, and how do we turn it into usable knowledge? I’m interested in variation that matters operationally: changes that affect virulence, transmission, persistence, and antimicrobial resistance, across clinical, food, and environmental contexts.

A major strand of my work is about how we do genomics at scale. Microbial genomics is now firmly in the “oodles of data” era: millions of genomes, messy and inconsistent metadata, uneven sampling, and constant pressure to interpret results quickly and correctly. A lot of the bottleneck isn’t sequencing anymore—it’s making the data explorable, comparable, and defensible. That’s where I put a lot of energy: building methods, visualisations, and software platforms that help people move from raw sequences to clear conclusions they can stand behind—and that continue to work under operational pressure.

Research vision

My research vision is to connect genomic variation to real-world dynamics: what changes, why it changes, how it spreads, and what it means in context. I pursue this as a single programme that links evolutionary understanding to practical interpretation—so methods, platforms, and biological insight reinforce each other.

To do this, I work across three connected themes:

Population structure and evolution: understanding how lineages emerge, diversify, persist, and move through hosts and environments.
Comparative genomics at scale: developing approaches that make differences and relationships visible and interpretable across very large datasets—because interpretation is often limited by what we can reliably compare.
Infrastructure and interfaces: building resources that make large genomic datasets usable for the people who need them most: researchers, public health teams, and students learning how to reason with genomics.

Overall, my goal is science that is both conceptually strong and operationally useful: work that explains biology, supports decisions, and leaves behind tools that others can build on.

Why this matters in the real world

Genomics matters most when it improves how we respond to real problems. In public health, that can mean detecting and tracking outbreaks, identifying transmission links, or separating meaningful signals from sequencing noise. In food and environmental microbiology, it can mean tracing contamination routes, understanding persistence and adaptation, and linking the same organism across farms, processing chains, and clinical settings. For antimicrobial resistance, genomics can connect resistance mechanisms to lineage spread and selective pressures—turning surveillance into actionable understanding rather than just monitoring.

More broadly, microbial evolution is a record of our changing world. When we link microbial genome change to human history and environmental change, we can start to see how the pressures we create—medical, cultural, and ecological—shape microbial populations over time. Ultimately, I want microbial genomics to be something we can rely on operationally: faster interpretation, clearer uncertainty, and results that remain comparable across labs, countries, and years.

What's next?

Over the next few years I’m focusing on a clear goal: make microbial genomics more trustworthy, comparable, and usable at the point it meets real-world decisions—from public health surveillance to research at population scale. That means strengthening the foundations (quality and standards), pushing methods that scale to truly massive collections, and investing in training so capability grows alongside data.

Quality and standards for public health bioinformatics. I’m developing practical best-practice guidance and quality-control standards for bacterial genomics workflows, with an emphasis on approaches that are auditable and deployable across organisations. A key strand of this work is QualiBact (https://happykhan.github.io/qualibact/), aimed at making QC expectations explicit so results are more reliable and comparable across labs, time, and platforms.
Next-generation population genomics at 100,000sof genomes. Building on large-scale comparative frameworks (including EnteroBase, cgMLST, and GrapeTree), I’m working towards hierarchical approaches and new visual strategies that keep analyses interpretable as collections grow. The aim is to move beyond “one giant tree” towards methods that support fast navigation, stable clustering, and clear communication of uncertainty—so scale doesn’t come at the cost of biological meaning.
Capacity building that makes genomics sustainable. I’m expanding training materials and curricula that take people from fundamentals to real-world analysis: reproducible workflows, version control, defensible interpretation, and good metadata practice. The goal is practical competence for the next generation of microbial genomicists and public health practitioners—so expertise scales, not just sequencing.

Selected Contributions

A selection of tools, platforms, and projects that make large-scale microbial genomics usable in research, public health, and teaching.

BRIG

Comparative genomics visualisation tool

Making complex genomic comparisons understandable at a glance for biologists and multidisciplinary teams.

Evidence: 3,000+ citations; widely used in publications and teaching.

EnteroBase

Global population genomics platform

Turning enteric pathogen genomics into a shared, comparable resource for research and surveillance.

Evidence: Co-developed platform for big-data analysis of enteric pathogens; enables global, contextual comparisons across large collections.

cgMLST + GrapeTree

Scalable population structure analysis

Making relationships across large pathogen collections explorable without losing interpretability.

Evidence: Used to explore cgMLST-based population structure at scale and communicate it clearly (research, surveillance, and training contexts).

RonaQC

SARS-CoV-2 sequencing QC

Improving trust in high-throughput sequencing by turning “raw runs” into clear, actionable QC outcomes.

Evidence: Co-PI (UKRI/COG-UK, 2021); developed for rapid, operational feedback during pandemic-scale sequencing.

Scaling Infrastructure

HPC & Cloud Operations

Enabling large-scale genomics to run reliably under real-world constraints (volume, time pressure, messy data).

Evidence: Interim Head of Informatics (2018–2023); PI on £1.49M cloud renewal (BBSRC, 2022).

Ancient + Modern Genomes

Integrating diverse timescales

Using population-genomic frameworks to connect microbial evolution across centuries to millennia.

Evidence: Integrated ancient metagenomes into EnteroBase; analysed 10,000 modern S. enterica genomes as comparative context.

Teaching & Supervision

Capacity building

Raising the baseline; helping researchers and public health teams become confident, reproducible genomic analysts.

Evidence: Bioinformatics lead & board member (MMBDTP, 2021–present); ongoing supervision across projects and placements.

Community & Standards

Field-building leadership

Making pathogen genomics easier to share, compare, and reuse across labs, countries, and time.

Evidence: PHA4GE Infrastructure Working Group (since 2019); co-organised community events including a funded hackathon (2022).