Industry CV

Email: nabil@happykhan.comWebsite: happykhan.comGitHub: happykhan

Senior bioinformatician and software engineer with 15 years building production pipelines and genomic data platforms used by researchers and public health agencies worldwide. Proven track record delivering at national scale: 80,000+ pathogen genomes processed for COVID-19 surveillance, 620,000+ for global AMR monitoring. Fluent in Python, JavaScript, Nextflow/NF-core, cloud (AWS/GCP), and HPC.

H-Index

11,369

Citations

Years Experience

94,150

Software Downloads

Coffees Drunk

Education

PhD in Microbiology

University of Queensland, Australia

2010–2015

Thesis: Escherichia coli virulence: a genomic approach

Supervisor: Scott Beatson

BSc (Hons, 1st Class) in Microbiology

University of Queensland

2009

Thesis: Comparative genome analysis of Escherichia coli VR50

BSc in Biochemistry & Bachelor of Information Technology

University of Queensland

2004–2008

Experience

2024 – Present

Senior Bioinformatician

Centre for Genomic Pathogen Surveillance, University of Oxford

Lead development of PathogenWatch, AMRwatch, and vaccines.watch: web platforms integrating over 620,000 pathogen genomes for global AMR surveillance, used by public health agencies in 90+ countries. Build and maintain production ETL pipelines processing genomic, epidemiological, and metadata across heterogeneous data sources. Architect systems for FAIR data delivery.

2018 – 2023

Bioinformatics Scientific Programmer / Interim Head of Informatics

Quadram Institute Bioscience

Ran computational infrastructure for a team of 20+ scientists. Built high-throughput pipelines for COG-UK: released 80,000+ SARS-CoV-2 genomes through infrastructure I designed and maintained. Developed CoronaHiT (Genome Medicine 2021) and RonaQC for national surveillance. Managed grants and infrastructure totalling over £5M.

2014 – 2018

Senior Research Fellow / Research Fellow in Pathogen Bioinformatics

University of Warwick

Built comparative genomics pipelines for Salmonella, E. coli, and Campylobacter at population scale. Co-developed EnteroBase: analytical infrastructure for 400,000+ bacterial genomes. Co-developed GrapeTree (Genome Research 2018), a visualisation tool for large-scale population structure.

Key Projects

Project	Scale	Stack	Outcome
COG-UK pipeline (CoronaHiT / RonaQC)	80,000+ genomes	Python, Nextflow, Docker, HPC	National SARS-CoV-2 surveillance
AMRwatch	620,000+ genomes	Python, JS, PostgreSQL, cloud	Used by WHO/ECDC-adjacent agencies
EnteroBase	400,000+ genomes	Python, HPC, web	Standard tool in molecular epidemiology
BRIG	94,000+ downloads	Java	3,000+ citations, taught in universities
PathogenWatch	Multi-pathogen, cloud	JS, Python, cloud	Production platform, CGPS flagship

Technical Skills

Pipeline Development

Nextflow / NF-core

Snakemake

Shell scripting

Languages

Python

JavaScript

Bash

Java

Infrastructure

HPC (SLURM)

Docker / Singularity

AWS

GCP

Linux admin

Data & Dev Practices

PostgreSQL / ETL

REST APIs

Git / GitHub Actions

pytest / CI

Selected Publications

Alikhan et al. CoronaHiT: high-throughput sequencing of SARS-CoV-2 genomes. Genome Medicine 2021
Page et al. GrapeTree: visualisation of core genomes at scale. Genome Research 2018
Alikhan et al. BRIG: BLAST Ring Image Generator. BMC Genomics 2011 — 3,000+ citations

Full publication list: Google Scholar · Full academic CV · Print version