My Research & Impact
Microbes are everywhere, and most of the time they live alongside us without causing problems. So why do some bacteria cause disease—and why can the same species behave differently in different people, places, or years? The short answer is evolution: microbial genomes change fast, and small genetic differences can shift traits like virulence, transmission, persistence, and antimicrobial resistance. My work is about understanding what that variation means in practice, and building the tools needed to turn genomic change into reliable, usable insight.
I work at the overlap between microbial population genomics, comparative genomics, and data science. My core question is: microbes evolve quickly—so what does that variation mean, and how do we turn it into usable knowledge? I’m interested in variation that matters operationally: changes that affect virulence, transmission, persistence, and antimicrobial resistance, across clinical, food, and environmental contexts.
A major strand of my work is about how we do genomics at scale. Microbial genomics is now firmly in the “oodles of data” era: millions of genomes, messy and inconsistent metadata, uneven sampling, and constant pressure to interpret results quickly and correctly. A lot of the bottleneck isn’t sequencing anymore—it’s making the data explorable, comparable, and defensible. That’s where I put a lot of energy: building methods, visualisations, and software platforms that help people move from raw sequences to clear conclusions they can stand behind—and that continue to work under operational pressure.
Research vision
My research vision is to connect genomic variation to real-world dynamics: what changes, why it changes, how it spreads, and what it means in context. I pursue this as a single programme that links evolutionary understanding to practical interpretation—so methods, platforms, and biological insight reinforce each other.
To do this, I work across three connected themes:
- Population structure and evolution: understanding how lineages emerge, diversify, persist, and move through hosts and environments.
- Comparative genomics at scale: developing approaches that make differences and relationships visible and interpretable across very large datasets—because interpretation is often limited by what we can reliably compare.
- Infrastructure and interfaces: building resources that make large genomic datasets usable for the people who need them most: researchers, public health teams, and students learning how to reason with genomics.
Overall, my goal is science that is both conceptually strong and operationally useful: work that explains biology, supports decisions, and leaves behind tools that others can build on.
Why this matters in the real world
Genomics matters most when it improves how we respond to real problems. In public health, that can mean detecting and tracking outbreaks, identifying transmission links, or separating meaningful signals from sequencing noise. In food and environmental microbiology, it can mean tracing contamination routes, understanding persistence and adaptation, and linking the same organism across farms, processing chains, and clinical settings. For antimicrobial resistance, genomics can connect resistance mechanisms to lineage spread and selective pressures—turning surveillance into actionable understanding rather than just monitoring.
More broadly, microbial evolution is a record of our changing world. When we link microbial genome change to human history and environmental change, we can start to see how the pressures we create—medical, cultural, and ecological—shape microbial populations over time. Ultimately, I want microbial genomics to be something we can rely on operationally: faster interpretation, clearer uncertainty, and results that remain comparable across labs, countries, and years.
What's next?
Over the next few years I’m focusing on a clear goal: make microbial genomics more trustworthy, comparable, and usable at the point it meets real-world decisions—from public health surveillance to research at population scale. That means strengthening the foundations (quality and standards), pushing methods that scale to truly massive collections, and investing in training so capability grows alongside data.
-
Quality and standards for public health bioinformatics. I’m developing practical best-practice guidance and quality-control standards for bacterial genomics workflows, with an emphasis on approaches that are auditable and deployable across organisations. A key strand of this work is QualiBact (https://happykhan.github.io/qualibact/), aimed at making QC expectations explicit so results are more reliable and comparable across labs, time, and platforms.
-
Next-generation population genomics at 100,000sof genomes. Building on large-scale comparative frameworks (including EnteroBase, cgMLST, and GrapeTree), I’m working towards hierarchical approaches and new visual strategies that keep analyses interpretable as collections grow. The aim is to move beyond “one giant tree” towards methods that support fast navigation, stable clustering, and clear communication of uncertainty—so scale doesn’t come at the cost of biological meaning.
-
Capacity building that makes genomics sustainable. I’m expanding training materials and curricula that take people from fundamentals to real-world analysis: reproducible workflows, version control, defensible interpretation, and good metadata practice. The goal is practical competence for the next generation of microbial genomicists and public health practitioners—so expertise scales, not just sequencing.
Selected Contributions
A selection of tools, platforms, and projects that make large-scale microbial genomics usable in research, public health, and teaching.