This time in the Microfinity podcast, we come from the Arctic Network and CLIMB Big Data joint workshop on COVID-19 data analysis held on the 14th and 15th of January 2021. My name is Sam Shepard. I work at the University of Bath. I'm a professor of microbial genomics, bioinformatics and microbiology, and but also a member of the CLIMB consortium. It is a great pleasure to be here today and to be able to talk to people around the world who share an enthusiasm for using genomics to understand this most obnoxious pathogen that is coronavirus. So I'm going to set the scene and provide some background about molecular epidemiology, how molecular epidemiology transitioned into what we might now call genomic epidemiology. And I'll briefly mention how this is applied to the coronavirus. So when we're thinking about disease, obviously, we want to understand the pathogen that caused the disease. OK, this is very simple. That's where we start. And many of you will be familiar with this kind of scene of a clinical microbiology lab. Here's a microbiologist working with primary samples, typically blood, urine, sputum, cerebrospinal fluid, etc. OK, and from these primary samples from an infected person, we want to work out which organism is causing the disease, is causing the infection. Confirming the actual species of the pathogen is not actually all that difficult often, but it's not as simple as that. As, of course, many of you know, in some cases, the pathogen may also be found as a commensal organism. Now, this is not specifically related to COVID, but in many cases, it might be the case that a commensal organism is causing the disease. An obvious example of this would be for an infection caused by E. coli, for example, which is found commensally in human guts. But in certain circumstances, it can cause serious disease. So as well as understanding the species, what we really need to understand is the strain of that organism that is causing the disease. For many, many years, differentiating, that is to say, telling the difference between a good strain and a bad strain has preoccupied molecular epidemiologists and microbiologists and virologists all over the world. OK, and the way that we differentiate those strains is by using typing. And so I want to introduce that term, typing. So there are lots of different typing schemes over the last 20, 30 years. Various schemes have been introduced that help us to tell the difference between strains of the same species. There are antibody-based approaches like serotyping. There are phage sensitivity approaches, phage typing, lots of approaches that extract the DNA from a potential pathogen and then run it on a gel, for example, like restriction or amplified fragment length polymorphism or pulse field gel electrophoresis. This is still widely used for differentiating strains, for strain typing all over the world and other methods. These are just some examples, really, and one that I certainly used a lot, which was multi-locus sequence typing. So in this case, you sequence usually seven genes around the genome of your pathogen and you use these profiles to tell the difference between strains. OK, so these typing approaches have underpinned what we would describe as molecular epidemiology for some years. They've been used in numerous ways for various bacterial pathogens and viral pathogens. One of the most important things is to try and relate the pathogen strains that are seen in the population, because if you can relate those strains and say that they are the same strain, then you can categorise your disease as potentially as an outbreak. And of course, other people have other strains, so this may have come from another source. However, and this is the sort of central idea, the ability to relate strains depends on how well the typing approach can differentiate them, can tell them apart. So in some cases, it might be very easy to say this is strain one and this is strain two. But in other cases, that may be difficult because the typing method may lack the resolution to differentiate the strains. So, you know, straight away we think to ourselves, well, why don't we just sequence the genome of everything? And we've been thinking about this for many years, as probably have many of you. And in the past, the challenge always has been cost. Really, it's come down to the cost of sequencing a genome. And that's been prohibitive. So backing these are just some estimates of the average cost of sequencing DNA per megabase in US dollars. So way back in 2001, it might have been around $5,000 to sequence a megabase of DNA. But this has declined rapidly and it's way less than a dollar to sequence a megabase now and has been for some time. OK, so as sequencing costs have fallen, sequencing the whole genome has become really the typing approach of choice. It's no longer a terribly good idea to bother with any other typing method when actually it's so inexpensive, it's quite cheap to just sequence the genome. Some of the analyses can be the same and we can still reference data that was collected and publications that use other typing approaches. But actually now, this world of molecular epidemiology that was based on these older typing approaches has largely been replaced by what we now might describe as genomic epidemiology. And that's sequencing the whole genome of our pathogens and comparing the strains. Unsurprisingly, this is improving understanding in lots of different ways and I'm just going to sort of run through a few of those now. One important thing you could consider as a shift from centralized genome sequencing services to servicing and sequencing in the community. So when I began sequencing genomes, I don't know, 15 or so years ago, most of my sequencing would have been done in a large specialist central laboratory. And I would send my DNA and I would receive my sequences back. This is still arguably the best way to do things. However, it is now possible to buy your own sequencer and by far the most common sequencing instrument at the moment, in terms of certainly the amount of data produced, is the Illumina platform. The MiSeq is actually very common in laboratories all around the world, with other larger instruments tending to be in more specialist laboratories. So I've called that the democratization of the technology. Another thing is what you might consider the mobilization of sequencing. So here I'm talking about a different technology. So this largely depends on the Oxford Nanopore instrument, which is this small mobile phone sized instrument. But the technology is different. There are advantages to using this technology because it produces long reads. But one of the great things about these kind of smaller, physically smaller technologies is that you can take them anywhere. OK, and you can actually put the whole laboratory really or a miniaturized laboratory into a suitcase. You can take this out to an area where an outbreak may be occurring and you can do your sequencing effectively in the field or with relatively small laboratory space. OK, so that's that's made a big shift from these old fashioned technologies. And finally, and perhaps most importantly, the rise of of genome sequencing as a tool for molecular epidemiology has allowed us to standardize the analyses. OK, so really what I mean here is that it's very easy to share DNA sequence data between laboratories. It's much harder to share a gel from a pulse field electrophoresis tank. You can introduce controls. Those of you that use these techniques will know, but it's very hard to share that between labs. The great thing about a string of letters that represented a DNA sequence is that they're easy to share. OK, and what this has allowed is creation of these data archives. You know, these are databases, open access databases where you can download, access, compare. sequences genedlaethol o lawer o ddifferent organismau. Y peth arall y mae'r ddatabasau hyn yn cefnogi, felly nid yn unig maen nhw'n cefnogi'r ddata, ond maen nhw hefyd yn cynhyrchu unrhyw ddefnyddiad cyfathrebu ar gyfer ymwneud. Ymwneuddol ymwneud, efallai, yn llawer o sylwadau, ond yn siŵr, unrhyw ffordd o ymgyrchu rhwng y gwahaniaethau gwahanol a'u cynhyrchu gan y sequences genedlaethol wahanol. Ac yn yni, wrth gwrs, mae hyn wedi rhoi, neu wedi gwneud ei gallu, i gydweithio dros ystodau gwahanol a deall ymwneud â genedlaeth genedlaethol gwahanol o llawer o ddefnyddiadau, gan gynnwys y coronavirus. Felly, yn amlwg, mae'r cynyddiadau technolegol hwn a'r cefnogaeth gysylltiadol hwn i ymwneud â genedlaeth genedlaethol gwahanol wedi rhoi heriau. Ac mae'r heriau fwyaf yw, sut ydyn ni'n aneiladu ymalwg o ddata? Sut ydyn ni'n eu gobeithio? Sut ydyn ni'n eu gofalu? Sut ydyn ni'n ymwneud â yr hyn y gallwn ni ddysgu fel data gwych? Iawn? Ac roedd hyn yn y ffordd, roedd hyn yn y ffordd o fath a oedd yn cyfrifol â'r Cynllun Cymdeithas Cymreig yn ôl, o ran 7 mlynedd yn ôl, pan roedden ni'n meddwl i'n fywydau, roedd llawer o bobl yn meddwl, nid oes gennym y capasiti yn y labau unigol, neu efallai bod rhai o ni wedi ei wneud, ond yn siŵr, dydyn ni ddim yn cael y capasiti i ymwneud â mwy o ddata fel hyn. Felly mae'r syniad ar ôl Cynllun Cymdeithas Cymreig, sydd nawr wedi cael ei hyrwyddo gan Data Gwych Cymdeithas Cymreig, yw rhoi y pŵer cyfrifol i gofalu ac aneiladu y setau data anhygoel yma, y setau data anhygoel yma. Hefyd i greu amgylchedd lle gallwn rannu'r dyluniadau analytig, felly efallai fod hyn yn algorithau gwahanol neu dyluniadau sydd angen cael eu cymryd. Genomau. Fe wnaeth Climb weithio mewn gwirionedd iawn, wel, mae llawer o gyflenwad yn ymwneud â phobl wych fel Nick a Tom Connor, sydd wedi gwneud rôl fawr yn sefydlu'r systemau sy'n anhygoel Climb a'r data gwych Climb, ond i'r defnyddiwr, mae'n eisiau. Yn y bôn, rydych chi'n gysylltu â'r cloud, cyfrifiadau cyffredin sy'n cael eu rhannu dros gwahanol sefydliadau y DU. Rydych chi'n gysylltu â hyn o'ch cyfrifiadau desktop, eich laptop, ac rydych chi'n creu yr hyn y gallwch chi ei ddweud yn ffyrddus, neu'r hyn rydyn ni'n ei ddweud yn ffyrddus. Ac mae'r ffyrddus yma, y cyfrifiadau ffyrddus yma o fewn yr amgylchedd Climb rydych chi'n cael ei gysylltu trwy'r internet, mae'n rhoi eich cysylltu â'r laptop mwy fwyaf nag y gallwch chi gael ei gysylltu eraill. Felly, llawer o ffwrdd gweithredu, llawer o adnodd, a'n bwysig, mewn rhan o sylwadau, maen nhw'n masnachau ram fawr. Felly, rhai o'r analysiadau y gallwch chi ei ddefnyddio ar gyfer sefydliad genoem efallai fod yn ofnadwy iawn. Iawn, felly yn ôl i'r epidemioleg genoem. Felly, rwy'n credu, rwyf wedi dweud ei fod gennym y technoleg, iawn. Mae gennym y sylwadau'r analysiad, y gwybodaeth yna o'r analysiad, iawn. Ac mae gennym y pŵer cyfrifol i allu defnyddio epidemioleg genoem yn wirioneddol i ddeall dylunio. Felly, mewn rhan cyffredin, gallwn ystyried y gysylltiad epidemioleg rhwng straenau, iawn. A gallwn ddefnyddio hyn ar gyfer ymgyrchu, a deall dylunio rhwng straenau. Ac hefyd, yn ddiddorol a'n gysylltiedig i'r coronavirus, gallwn edrych ar y ffyrdd o dynion. Felly gallwn edrych allan, efallai, o bobl dynol, weithiau edrych ar ymgyrchu eraill ar gyfer y straenau pethogen gwahanol. Gallwn hefyd ddefnyddio epidemioleg genoem fel rhan o fframwaith hypothesu. Iawn. Felly, yn aml, rydym am meddwl am ymgyrchu ac rydym am meddwl am y cyfnod o cyhoeddiad o straenau gwahanol. Rydym am meddwl am straenau pethogen. Rydym am meddwl am a yw ymgyrchu yn ymgyrchu gwahanol neu yn cynhyrchu unrhyw ffordd o ymgyrchu newydd neu ymgyrchu newydd. A gallwn feddwl am ymgyrchu gysylltiedig a yw ymgyrchu gwahanol yn golygu dynion. Ac yna, ar yr un, chi'n gwybod, ar y pwynt ychydig mwy cyfathrebu, gallwn feddwl am y syniadau o ymgyrchu o y strwythu pobl o pethogenau. Iawn, felly gallwn feddwl am y straenau pethogenol. Gallwn feddwl am geografi phylur, a yw ymgyrchu yn unrhyw leoedd? Gallwn ymgyrchu ymgyrchu o ymgyrchu, efallai. Felly gallwn feddwl am sut y bydd newydd ymgyrchu a'r ymgyrchu o unrhyw straen. Ac yn amlwg, mae hynny'n mynd i'n deall y bydd ymgyrchu ymgyrchu a sut maen nhw'n cymryd gysylltiad i cynyddu cynyddu epidemig. Iawn? Felly mae llawer o lefelau o epidemioleg genoeg y gallwn ei ddefnyddio. Unwaith eto, yr hyn rydw i'n hoffi ei ddweud yw yn gysylltiad â'n cwestiynau, pam mae genedlaethau'n ymgyrchu mewn epidemig? Y cyfrifiad, wrth gwrs, byddai y gallwn, ar y cyfrifiad gwleidyddol, deall gwahaniaeth straen a chynhyrchu ddwyieithiau sy'n dangos y gysylltiad o'n iselwyr. Gallwn hefyd edrych ar y geno a edrych ar gwahaniaethau genedlaethol sy'n gallu cael eu cymryd gyda'r straen a'r clwster o straenau ar ein ddwyieithiau. Ac yn amlwg, rydw i'n meddwl am cyhoeddiad o mutatiynau sy'n cael eu cymryd gyda chynyddu ymgyrchu neu ffyrdd. Ac hefyd gallwn edrych ar y cyfnodau amser yma. Ar y cyfnod gwleidyddol gallwn weld sut mae rhai straenau yn gyffredin ac yna'n mynd i fod yn fwy cyffredin a gallwn weld cyhoeddiad o straenau newydd. Felly rydw i'n mynd i ddiwedd drwy ddweud y bydd y cyfnod gwleidyddol rydw i'n gobeithio yw'r hyn rydyn ni'n siarad yn benodol amdano heddiw. Yn ein wladau ein hun neu efallai gyda rhaglenau yn ein wladau ein hun gallwn ni feddwl am ymgyrchu coronavirus ond mewn gwirionedd mae'r ffordd a'r pŵer o'r ymdrechion hyn yw pan fyddwn ni'n dod â phobl o gwmpas y byd gyda'i gilydd gan ddefnyddio yr un sefydliad neu'n leiaf yr un analysiad gallwn ni ddechrau deall trenau gwleidyddol mewn trafnidiaeth, cyhoedd, trafnidiaeth a'r epidemioleg. Diolch Sam. Felly a allwch chi ddweud ychydig mwy am y gwahaniaeth rhwng Dwi'n mynd i ddefnyddio y yw ysgrifennu y gwahaniaeth rhwng straenau a rhwydweithiau a pan fyddwn ni'n eu defnyddio efallai yw cwestiwn anodd ond os ydych chi'n gallu rhoi sylw i hynny. Y cyflwyniad hwnnw yw y mae straen yn ymddiriedolaeth mwyaf o ran rhwydweithiau, efallai. Mae straen yn ymddiriedolaeth sy'n dod allan o microbioleg ac mae'n ysgrifennu ffariant o ymdrechion sy'n ymdrechion ac mae'n ysgrifennu ffenotaethau gwahanol neu genotaethau mewn sefydliad ysgrifennol. Rwy'n credu bod straen yn term ddefnyddiol. Dwi ddim yn meddwl ei fod yn ymddiriedolaeth ychydig. Mae rhwydweithiau, ar y rhan arall, mae canlyniad ewolwyr ac mae hynny'n gwerthfawr iawn i ddefnyddio rhwydweithiau oherwydd mae'n mynd i'n mynd i'n mynd i'n meddwl o ddwy ac mae'n mynd i'n mynd i'n meddwl impacting each other and they are related to one another in some form of clustering and usually we're thinking about sequence clusters. So, to some extent, I would say that these strains are interchangeable. I certainly wouldn't want to be prescriptive and tell you that there are multiple strains in a lineage or multiple lineages in a strain because both could potentially be true, but the real difference is that strain is this kind of is a much more kind of evolutionary term that makes us think of a phylogeny. Thank you all so much for listening to us at home. If you like this podcast, please subscribe and like us on iTunes, Spotify, SoundCloud or the platform of your choice. And if you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group and edited by Nick Waters. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadrant Institute.