Episode 26: SARS-CoV-2 contextual data specification for open genomic epidemiology
👥Guests
The microbinfie podcast explores the Public Health Alliance for Genomic Epidemiology (PHA4GE) and its critical work in developing standardized approaches for SARS-CoV-2 genomic data sharing during the COVID-19 pandemic.
In this conversation, we engage with Dr. Emma Griffiths (UBC), Dr. Ruth Timme (FDA), and Dr. Duncan MacCannell (CDC) concerning the PHA4GE SARS-CoV-2 contextual data specification, designed for open genomic epidemiology.
Resources
- Paper: PHA4GE SARS-CoV-2 Contextual Data Specification
- Specification: GitHub Repository
- Protocols: Protocols.io Workspace
About PHA4GE
The Public Health Alliance for Genomic Epidemiology (PHA4GE) (visit their website) is a global coalition dedicated to:
- Establishing consensus standards
- Documenting and sharing best practices
- Improving the availability of critical bioinformatic tools and resources
- Advocating for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics
Significance of the Specification
With the ongoing pandemic, PHA4GE has recognized an urgent need for a specifically tailored, open-source SARS-CoV-2 contextual data standard. The team has developed an extension to the INSDC pathogen package to provide a SARS-CoV-2 contextual data specification based on harmonizable, publicly available community standards.
Implementation
- The specification is implementable via a collection template.
- It includes various protocols and tools designed to:
- Support the harmonization and submission of sequence data
- Facilitate the inclusion of contextual information to public repositories
Benefits of Rich Contextual Data
- Adds Value: Well-structured data increases usefulness and reusability.
- Promotes Reuse: Easier integration of disparate datasets.
- Enhances Interoperability: Improves consistency and utility across datasets and systems.
- Facilitates Discoveries: Enables novel insights in the study of SARS-CoV-2 and COVID-19.
The adoption of this proposed standard will significantly improve data interoperability, ultimately leading to advances in understanding and combating SARS-CoV-2.
Key Points
1. PHA4GE Origins and Mission
- Established to improve bioinformatics infrastructure and interoperability in public health
- Developed eight working groups focusing on data standards, infrastructure, and ethical data sharing
- Modeled after Global Alliance for Genomics and Health (GA4GH)
2. Data Structures and Contextual Information
- Addresses critical challenges in integrating and sharing genomic surveillance data
- Focuses on developing standardized data models and specifications for SARS-CoV-2
- Aims to improve transparency and reproducibility of public health sequencing workflows
3. COVID-19 Genomic Surveillance
- Developed a contextual data specification for SARS-CoV-2 genome submissions
- Supports international collaboration across multiple national genomic surveillance networks
- Enables more comprehensive analysis of viral spread and characteristics
Take-Home Messages
- Standardized data structures are crucial for effective public health response
- Contextual metadata is essential for meaningful genomic research
- International collaboration can overcome bioinformatics infrastructure challenges