This time in the Microfinity podcast, we come from the Arctic Network and CLIMB
Big Data joint workshop on COVID-19 data analysis held on the 14th and 15th of
January 2021. My name is Sam Shepard. I work at the University of Bath. I'm a
professor of microbial genomics, bioinformatics and microbiology, and but also a
member of the CLIMB consortium. It is a great pleasure to be here today and to
be able to talk to people around the world who share an enthusiasm for using
genomics to understand this most obnoxious pathogen that is coronavirus. So I'm
going to set the scene and provide some background about molecular epidemiology,
how molecular epidemiology transitioned into what we might now call genomic
epidemiology. And I'll briefly mention how this is applied to the coronavirus.
So when we're thinking about disease, obviously, we want to understand the
pathogen that caused the disease. OK, this is very simple. That's where we
start. And many of you will be familiar with this kind of scene of a clinical
microbiology lab. Here's a microbiologist working with primary samples,
typically blood, urine, sputum, cerebrospinal fluid, etc. OK, and from these
primary samples from an infected person, we want to work out which organism is
causing the disease, is causing the infection. Confirming the actual species of
the pathogen is not actually all that difficult often, but it's not as simple as
that. As, of course, many of you know, in some cases, the pathogen may also be
found as a commensal organism. Now, this is not specifically related to COVID,
but in many cases, it might be the case that a commensal organism is causing the
disease. An obvious example of this would be for an infection caused by E. coli,
for example, which is found commensally in human guts. But in certain
circumstances, it can cause serious disease. So as well as understanding the
species, what we really need to understand is the strain of that organism that
is causing the disease. For many, many years, differentiating, that is to say,
telling the difference between a good strain and a bad strain has preoccupied
molecular epidemiologists and microbiologists and virologists all over the
world. OK, and the way that we differentiate those strains is by using typing.
And so I want to introduce that term, typing. So there are lots of different
typing schemes over the last 20, 30 years. Various schemes have been introduced
that help us to tell the difference between strains of the same species. There
are antibody-based approaches like serotyping. There are phage sensitivity
approaches, phage typing, lots of approaches that extract the DNA from a
potential pathogen and then run it on a gel, for example, like restriction or
amplified fragment length polymorphism or pulse field gel electrophoresis. This
is still widely used for differentiating strains, for strain typing all over the
world and other methods. These are just some examples, really, and one that I
certainly used a lot, which was multi-locus sequence typing. So in this case,
you sequence usually seven genes around the genome of your pathogen and you use
these profiles to tell the difference between strains. OK, so these typing
approaches have underpinned what we would describe as molecular epidemiology for
some years. They've been used in numerous ways for various bacterial pathogens
and viral pathogens. One of the most important things is to try and relate the
pathogen strains that are seen in the population, because if you can relate
those strains and say that they are the same strain, then you can categorise
your disease as potentially as an outbreak. And of course, other people have
other strains, so this may have come from another source. However, and this is
the sort of central idea, the ability to relate strains depends on how well the
typing approach can differentiate them, can tell them apart. So in some cases,
it might be very easy to say this is strain one and this is strain two. But in
other cases, that may be difficult because the typing method may lack the
resolution to differentiate the strains. So, you know, straight away we think to
ourselves, well, why don't we just sequence the genome of everything? And we've
been thinking about this for many years, as probably have many of you. And in
the past, the challenge always has been cost. Really, it's come down to the cost
of sequencing a genome. And that's been prohibitive. So backing these are just
some estimates of the average cost of sequencing DNA per megabase in US dollars.
So way back in 2001, it might have been around $5,000 to sequence a megabase of
DNA. But this has declined rapidly and it's way less than a dollar to sequence a
megabase now and has been for some time. OK, so as sequencing costs have fallen,
sequencing the whole genome has become really the typing approach of choice.
It's no longer a terribly good idea to bother with any other typing method when
actually it's so inexpensive, it's quite cheap to just sequence the genome. Some
of the analyses can be the same and we can still reference data that was
collected and publications that use other typing approaches. But actually now,
this world of molecular epidemiology that was based on these older typing
approaches has largely been replaced by what we now might describe as genomic
epidemiology. And that's sequencing the whole genome of our pathogens and
comparing the strains. Unsurprisingly, this is improving understanding in lots
of different ways and I'm just going to sort of run through a few of those now.
One important thing you could consider as a shift from centralized genome
sequencing services to servicing and sequencing in the community. So when I
began sequencing genomes, I don't know, 15 or so years ago, most of my
sequencing would have been done in a large specialist central laboratory. And I
would send my DNA and I would receive my sequences back. This is still arguably
the best way to do things. However, it is now possible to buy your own sequencer
and by far the most common sequencing instrument at the moment, in terms of
certainly the amount of data produced, is the Illumina platform. The MiSeq is
actually very common in laboratories all around the world, with other larger
instruments tending to be in more specialist laboratories. So I've called that
the democratization of the technology. Another thing is what you might consider
the mobilization of sequencing. So here I'm talking about a different
technology. So this largely depends on the Oxford Nanopore instrument, which is
this small mobile phone sized instrument. But the technology is different. There
are advantages to using this technology because it produces long reads. But one
of the great things about these kind of smaller, physically smaller technologies
is that you can take them anywhere. OK, and you can actually put the whole
laboratory really or a miniaturized laboratory into a suitcase. You can take
this out to an area where an outbreak may be occurring and you can do your
sequencing effectively in the field or with relatively small laboratory space.
OK, so that's that's made a big shift from these old fashioned technologies. And
finally, and perhaps most importantly, the rise of of genome sequencing as a
tool for molecular epidemiology has allowed us to standardize the analyses. OK,
so really what I mean here is that it's very easy to share DNA sequence data
between laboratories. It's much harder to share a gel from a pulse field
electrophoresis tank. You can introduce controls. Those of you that use these
techniques will know, but it's very hard to share that between labs. The great
thing about a string of letters that represented a DNA sequence is that they're
easy to share. OK, and what this has allowed is creation of these data archives.
You know, these are databases, open access databases where you can download,
access, compare.  sequences genedlaethol o lawer o ddifferent organismau. Y peth
arall y mae'r ddatabasau hyn yn cefnogi, felly nid yn unig maen nhw'n cefnogi'r
ddata, ond maen nhw hefyd yn cynhyrchu unrhyw ddefnyddiad cyfathrebu ar gyfer
ymwneud. Ymwneuddol ymwneud, efallai, yn llawer o sylwadau, ond yn siŵr, unrhyw
ffordd o ymgyrchu rhwng y gwahaniaethau gwahanol a'u cynhyrchu gan y sequences
genedlaethol wahanol. Ac yn yni, wrth gwrs, mae hyn wedi rhoi, neu wedi gwneud
ei gallu, i gydweithio dros ystodau gwahanol a deall ymwneud â genedlaeth
genedlaethol gwahanol o llawer o ddefnyddiadau, gan gynnwys y coronavirus.
Felly, yn amlwg, mae'r cynyddiadau technolegol hwn a'r cefnogaeth gysylltiadol
hwn i ymwneud â genedlaeth genedlaethol gwahanol wedi rhoi heriau. Ac mae'r
heriau fwyaf yw, sut ydyn ni'n aneiladu ymalwg o ddata? Sut ydyn ni'n eu
gobeithio? Sut ydyn ni'n eu gofalu? Sut ydyn ni'n ymwneud â yr hyn y gallwn ni
ddysgu fel data gwych? Iawn? Ac roedd hyn yn y ffordd, roedd hyn yn y ffordd o
fath a oedd yn cyfrifol â'r Cynllun Cymdeithas Cymreig yn ôl, o ran 7 mlynedd yn
ôl, pan roedden ni'n meddwl i'n fywydau, roedd llawer o bobl yn meddwl, nid oes
gennym y capasiti yn y labau unigol, neu efallai bod rhai o ni wedi ei wneud,
ond yn siŵr, dydyn ni ddim yn cael y capasiti i ymwneud â mwy o ddata fel hyn.
Felly mae'r syniad ar ôl Cynllun Cymdeithas Cymreig, sydd nawr wedi cael ei
hyrwyddo gan Data Gwych Cymdeithas Cymreig, yw rhoi y pŵer cyfrifol i gofalu ac
aneiladu y setau data anhygoel yma, y setau data anhygoel yma. Hefyd i greu
amgylchedd lle gallwn rannu'r dyluniadau analytig, felly efallai fod hyn yn
algorithau gwahanol neu dyluniadau sydd angen cael eu cymryd. Genomau. Fe wnaeth
Climb weithio mewn gwirionedd iawn, wel, mae llawer o gyflenwad yn ymwneud â
phobl wych fel Nick a Tom Connor, sydd wedi gwneud rôl fawr yn sefydlu'r
systemau sy'n anhygoel Climb a'r data gwych Climb, ond i'r defnyddiwr, mae'n
eisiau. Yn y bôn, rydych chi'n gysylltu â'r cloud, cyfrifiadau cyffredin sy'n
cael eu rhannu dros gwahanol sefydliadau y DU. Rydych chi'n gysylltu â hyn o'ch
cyfrifiadau desktop, eich laptop, ac rydych chi'n creu yr hyn y gallwch chi ei
ddweud yn ffyrddus, neu'r hyn rydyn ni'n ei ddweud yn ffyrddus. Ac mae'r
ffyrddus yma, y cyfrifiadau ffyrddus yma o fewn yr amgylchedd Climb rydych chi'n
cael ei gysylltu trwy'r internet, mae'n rhoi eich cysylltu â'r laptop mwy fwyaf
nag y gallwch chi gael ei gysylltu eraill. Felly, llawer o ffwrdd gweithredu,
llawer o adnodd, a'n bwysig, mewn rhan o sylwadau, maen nhw'n masnachau ram
fawr. Felly, rhai o'r analysiadau y gallwch chi ei ddefnyddio ar gyfer sefydliad
genoem efallai fod yn ofnadwy iawn. Iawn, felly yn ôl i'r epidemioleg genoem.
Felly, rwy'n credu, rwyf wedi dweud ei fod gennym y technoleg, iawn. Mae gennym
y sylwadau'r analysiad, y gwybodaeth yna o'r analysiad, iawn. Ac mae gennym y
pŵer cyfrifol i allu defnyddio epidemioleg genoem yn wirioneddol i ddeall
dylunio. Felly, mewn rhan cyffredin, gallwn ystyried y gysylltiad epidemioleg
rhwng straenau, iawn. A gallwn ddefnyddio hyn ar gyfer ymgyrchu, a deall dylunio
rhwng straenau. Ac hefyd, yn ddiddorol a'n gysylltiedig i'r coronavirus, gallwn
edrych ar y ffyrdd o dynion. Felly gallwn edrych allan, efallai, o bobl dynol,
weithiau edrych ar ymgyrchu eraill ar gyfer y straenau pethogen gwahanol. Gallwn
hefyd ddefnyddio epidemioleg genoem fel rhan o fframwaith hypothesu. Iawn.
Felly, yn aml, rydym am meddwl am ymgyrchu ac rydym am meddwl am y cyfnod o
cyhoeddiad o straenau gwahanol. Rydym am meddwl am straenau pethogen. Rydym am
meddwl am a yw ymgyrchu yn ymgyrchu gwahanol neu yn cynhyrchu unrhyw ffordd o
ymgyrchu newydd neu ymgyrchu newydd. A gallwn feddwl am ymgyrchu gysylltiedig a
yw ymgyrchu gwahanol yn golygu dynion. Ac yna, ar yr un, chi'n gwybod, ar y
pwynt ychydig mwy cyfathrebu, gallwn feddwl am y syniadau o ymgyrchu o y
strwythu pobl o pethogenau. Iawn, felly gallwn feddwl am y straenau pethogenol.
Gallwn feddwl am geografi phylur, a yw ymgyrchu yn unrhyw leoedd? Gallwn
ymgyrchu ymgyrchu o ymgyrchu, efallai. Felly gallwn feddwl am sut y bydd newydd
ymgyrchu a'r ymgyrchu o unrhyw straen. Ac yn amlwg, mae hynny'n mynd i'n deall y
bydd ymgyrchu ymgyrchu a sut maen nhw'n cymryd gysylltiad i cynyddu cynyddu
epidemig. Iawn? Felly mae llawer o lefelau o epidemioleg genoeg y gallwn ei
ddefnyddio. Unwaith eto, yr hyn rydw i'n hoffi ei ddweud yw yn gysylltiad â'n
cwestiynau, pam mae genedlaethau'n ymgyrchu mewn epidemig? Y cyfrifiad, wrth
gwrs, byddai y gallwn, ar y cyfrifiad gwleidyddol, deall gwahaniaeth straen a
chynhyrchu ddwyieithiau sy'n dangos y gysylltiad o'n iselwyr. Gallwn hefyd
edrych ar y geno a edrych ar gwahaniaethau genedlaethol sy'n gallu cael eu
cymryd gyda'r straen a'r clwster o straenau ar ein ddwyieithiau. Ac yn amlwg,
rydw i'n meddwl am cyhoeddiad o mutatiynau sy'n cael eu cymryd gyda chynyddu
ymgyrchu neu ffyrdd. Ac hefyd gallwn edrych ar y cyfnodau amser yma. Ar y cyfnod
gwleidyddol gallwn weld sut mae rhai straenau yn gyffredin ac yna'n mynd i fod
yn fwy cyffredin a gallwn weld cyhoeddiad o straenau newydd. Felly rydw i'n mynd
i ddiwedd drwy ddweud y bydd y cyfnod gwleidyddol rydw i'n gobeithio yw'r hyn
rydyn ni'n siarad yn benodol amdano heddiw. Yn ein wladau ein hun neu efallai
gyda rhaglenau yn ein wladau ein hun gallwn ni feddwl am ymgyrchu coronavirus
ond mewn gwirionedd mae'r ffordd a'r pŵer o'r ymdrechion hyn yw pan fyddwn ni'n
dod â phobl o gwmpas y byd gyda'i gilydd gan ddefnyddio yr un sefydliad neu'n
leiaf yr un analysiad gallwn ni ddechrau deall trenau gwleidyddol mewn
trafnidiaeth, cyhoedd, trafnidiaeth a'r epidemioleg. Diolch Sam. Felly a allwch
chi ddweud ychydig mwy am y gwahaniaeth rhwng Dwi'n mynd i ddefnyddio y yw
ysgrifennu y gwahaniaeth rhwng straenau a rhwydweithiau a pan fyddwn ni'n eu
defnyddio efallai yw cwestiwn anodd ond os ydych chi'n gallu rhoi sylw i hynny.
Y cyflwyniad hwnnw yw y mae straen yn ymddiriedolaeth mwyaf o ran rhwydweithiau,
efallai. Mae straen yn ymddiriedolaeth sy'n dod allan o microbioleg ac mae'n
ysgrifennu ffariant o ymdrechion sy'n ymdrechion ac mae'n ysgrifennu
ffenotaethau gwahanol neu genotaethau mewn sefydliad ysgrifennol. Rwy'n credu
bod straen yn term ddefnyddiol. Dwi ddim yn meddwl ei fod yn ymddiriedolaeth
ychydig. Mae rhwydweithiau, ar y rhan arall, mae canlyniad ewolwyr ac mae
hynny'n gwerthfawr iawn i ddefnyddio rhwydweithiau oherwydd mae'n mynd i'n mynd
i'n mynd i'n meddwl o ddwy ac mae'n mynd i'n mynd i'n meddwl impacting each
other and they are related to one another in some form of clustering and usually
we're thinking about sequence clusters. So, to some extent, I would say that
these strains are interchangeable. I certainly wouldn't want to be prescriptive
and tell you that there are multiple strains in a lineage or multiple lineages
in a strain because both could potentially be true, but the real difference is
that strain is this kind of is a much more kind of evolutionary term that makes
us think of a phylogeny. Thank you all so much for listening to us at home. If
you like this podcast, please subscribe and like us on iTunes, Spotify,
SoundCloud or the platform of your choice. And if you don't like this podcast,
please don't do anything. This podcast was recorded by the Microbial
Bioinformatics Group and edited by Nick Waters. The opinions expressed here are
our own and do not necessarily reflect the views of CDC or the Quadrant
Institute.