Hello, and thank you for listening to the MicroBinFeed podcast. Here, we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There is so much information we all know from working in the field, but nobody writes it down. There is no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work on microbes in food and the impact on human health. I work at Centers for Disease Control and Prevention and am an adjunct member at the University of Georgia in the US. Hello and welcome to the MicroBinFeed podcast. Today we're doing a rapid roundup of the latest information you need to know about SARS- CoV-2. Today in the booth, we're joined by a special guest, Leo Martens, who's head of phylogenomics at Quadram Institute and our Arborist-in-Residence. And then there's always Lee, Andrew, and myself. And guess what? Everyone's working on SARS-CoV-2 now. And so that's what we're talking about today. We'll be focusing on some of the latest and greatest resources available for SARS-CoV-2 analysis. So let's kick off with some news. Andrew, you ran into something interesting. Yeah, so I thought it'd be a very quiet week because a week ago, right, all the focus was on other things than genomics. I thought, great, you know, we can actually get some work done in peace and quiet and everything will be fine. You know, there's the vaccine wars. Only a week ago, we're kicking off. Late on a Friday night, the European Union drunkenly kind of wandered into this minefield of Northern Ireland politics. And my God, if any Irish person had been anywhere near that, they would have told them that it was a very bad idea. But of course, it didn't happen. Then, you know, there's a huge hoopla over all of that. And you know, we had vaccine wars kicking off. And great. But unfortunately, there has been a huge amount of progress now in genomics. And everyone wants genomics, SARS-CoV-2 genomes for all their work. And everything is about variants. And my God, it's been a very busy week. So let's talk about just all the new resources and stuff that's come out just in the past week or week or two, because it's been very, very busy. Yeah, it definitely seems like there's been a massive switch over to trying to capture these variants and track them as quickly as possible. I mean, I think we can kick off with the covariance.org page, which for me is super interesting. I mean, Emma Hodcroft is leading the charge on this one, but there's a lot of people who behind like the sort of Nextstrain group and people from JSAID and so on are contributing to this. This is covariance.org. And we'll have a link to that in the show notes. But this is a very pretty website that just catalogs all of the different variants and all these different mutations. And you can go to each of the major ones and have a look at which mutations are there, what these mutations may be conferring, like what the previous literature said about these mutations. So this one might alter recognition of antibodies, or this one might change some structure of the spike protein, or this might do something else. And so they've got a breakdown of all of the variants and then what all of those are possibly doing, which is excellent, because now I don't have to look up every time my mom asks me like, is this one more dangerous or not? I can just go here and just let up. This one does this. Yeah, this one is more effective or this one is more deadly or something like that. So it's an excellent thing to have a look at. I guess there's so many different variants out there, you know, thousands upon thousands. We don't know which ones are going to be a problem usually until long after they've emerged and have spread everywhere. Websites like this do really help because it allows you to see the signals earlier. Yeah, and it's definitely once one of these variants are flagged, they have a breakdown of these variants as they, you know, its distribution in per country or the distribution of the different variants in a country, which is all backed by the data that's coming into GISAID. And it's just graphing that for you so you can quickly jump to see how quickly it's spreading once there is one of interest on the website. And it's good to see that lots of countries are now contributing more so than they had in the past. I don't know where they're hiding data away in the basement or if they just weren't doing sequencing, but it's good to see that there is more distribution around the world of sequencing because, you know, the UK is dominated a lot. And if you look at some of these signals, you might think, oh, God, the UK is like the center of death in the world, you know, and plague for COVID. But actually, it's just they're the ones, you know, doing the sequencing. So a lot of stuff is biased towards them. Maybe there's also the case that some countries or some groups were worried about, you know, research parasites. So if you are the first to add data to the database, and then you feel that people are going to use the resource without proper acknowledgement. But now that you have so much data, I think you don't, right, that the priorities change it. You're not so worried anymore about. Yeah, it still is a big problem. People complain about it. People complain about it as well all the time. But well, you know, it's that trade off for being really open and getting stuff out there quickly and, you know, maybe getting scooped. It's an unfortunate bit, but public health is more important than retrospective academic research and credit. So another really cool report on in the same kind of vein as the covariance stuff is only a tool has a fantastic daily report on COV lineages. And so she's gone through the major variants of concern. So that would be I'm not going to use the place names because that's super controversial. But you know, you have the B117, which is associated with United Kingdom originally. And then you have B1.351, which is associated originally with South Africa. And then you have P1, which is associated with northern Brazil. So she's been tracking things like, say, passenger numbers out of countries, you know, during the time when those lineages are expanding and, you know, what countries have been identified in. So you can kind of predict where the holes are in the world and which countries are probably most affected. So you know, for a while there, there's one particular country, I think Poland, which had huge numbers of passengers from the UK, but zero recorded cases of the UK variant. I was like, that's a bit of a problem there. The surveillance there hasn't picked that up when it should be. And you can see that with some data reports, it does give kind of an early warning and an indication of the spread. And it's quite interesting, you know, some of the places where these variants have been found that the P1 has turned up in the Faroe Islands. Now, the Faroe Islands, if you don't know, are these teeny tiny little islands in the northern Atlantic, just kind of, I suppose, north of Scotland, you know, that's how far away they are. And they're a tiny, tiny population, the population of a town, more or less. And somehow they have a variant popping up there. But you know, they've obviously got a strong surveillance system there that picked it up. Oh, just a stupid thing. I just learned that my geography sucks. And wow, Faroe Islands. But you actually pronounce it correctly. It's interesting because sometimes there's football matches, international football matches, you know, between local countries, particularly in the run up to the World Cup. Now, Lee, you probably don't understand any of this as an American, but you know, football is in soccer. So, yeah, there's occasionally football matches and, you know, you'll have England playing like Faroe Islands. And it's like a population of, what, 60, 70 million versus a population of probably 50,000, you know. And it makes for a very, very uneven game. But, you know, they're a country, so they get to be represented in that way. There's a joke from my university, Emory University, that our football team is undefeated, and that's American football. And there is no football team. That's as much as I understand from football or American football or soccer or whatever you want to call it. It's like sequencing, you know, no sequencing, sure, you've got no problems. So, yeah, I really like the cobbling stuff from Anya. I mean, it's her and the rest of the guys up in Edinburgh putting this together. It's fantastic stuff that they've got. It's not just the spread of the lineages, but they've actually merged it with sort of anecdotal reports of it moving to, you know, people, reputable people mentioning this stuff on Twitter or local newspapers or so on. And they've even pulled in what the travel, sort of the number of passengers traveling from different countries, just because that data is available, who travels to where. Now, even if you don't know that a particular variant is in that country, it's quite likely that it's going to jump between countries that have a lot of passengers flying back to the ports. We're starting to get into this minority report thing where we can pre-cog, like predict possibly where it's going to spread next, because we're aggregating all of this different information. And it's presented really nicely, a lot of these thoughts of the kind of things that you can just copy paste straight to the newspaper of what's going on. And I think it was interesting to see for the B1351, was it that? one that wasn't reported in the US and then it was reported like I first saw it on this website because this is just running automatically in the background pulling in new genomes and just generating the figures. Is that the South Carolina one? Yeah I think it was the South Carolina one yeah that just flagged up on the website that was the first place I saw it. So following on from that there have been some recent updates to MicroReact. Now MicroReact is again another visualization program that is showing you the tree and the timeline and the spatial positioning of all the different samples and what's really interesting with it now is they've now got a sort of when you go to the SARS-CoV-2 section there's simple links that jump out of all of the SARS- CoV-2 genomes out there it immediately jumps to these new lineages of interest new variants of interest and then you can just see exactly how those are spreading on their own. That was one of the new updates so that's from Anthony Underwood and the rest of the guys at CGPS and the other thing that they've changed recently is now they show the frequency of a particular lineage of the total number of samples there rather than just the absolute counts because it's sort of like you wound up with this issue where you would see lineage is sort of dropping but it's not actually dropping it's just you don't have the same absolute count of sample numbers over time and so it looked like the numbers were dropping but it wasn't and it needed to be kept as a sort of proportion of the total so that's also been added now as a as an option so both of those make it like really easy to jump straight to particular variants and see exactly what's going on really quickly. And of course they get it like a live data feed from COG UK so their data is usually more up-to-date than what you can get from GISAID. Now they're not accessing secret data or anything it's that's the data that's on the COG website it's just they're kind of pulling it in more regularly and that's from David Aronson's group and it's fantastic really they you can actually use this yourself as well like with microreact you can just kind of drag and drop in your your trees and stuff and metadata and it'll give you really pretty pictures they've made that you know super easy and it just works really really well for coronavirus and I'm really impressed actually how it's scaled now to hundreds upon hundreds of thousands of genomes we're up to what nearly half a million now like it's insane like it's it's a lot. I think there's some pretty aggressive pruning in the tree or something just to make the rendering catch up on it but it's yeah it's it's super intense if you open up the SARS-CoV-2 section and you've got the map you've got all of the pie charts of the different variants circles on all over the map you've got this sort of admixture thing at the bottom of all of the of the different variants as they're coming and going of the total samples the tree I mean this is the I don't remember in Contagion in the film if they had a nice little dashboard like that with all of these like the the samples like kicking over but it is very much like that. I think they only had like a floating piece of DNA double helix or something. Maybe they had a protein structure like that was just constantly spinning I can't remember. What you're saying is that microreact is great for doing these background pictures of scientists right looking at looking busy in front of a screen. Well yeah I mean microreact has been featured with Ted's estate now I think. The New Zealand premiere? Yeah yeah New Zealand PMs had a look. I think the UK prime minister's also stood in front of it but don't quote me on that. I think he also has there's a few pictures of that so you know it looks nice but I think the most important thing is that they've changed aspects of it very quickly to adapt to this new life cycle now where we're pulling out individual variants and now you can you can not only just have pretty pictures but you can actually find out what you want to know very quickly. So another thing that's arisen this week is different variants of concern or lineages of concern getting different mutations from other lineages of concern or them arising independently and it's now emerged that some people have made up special names for some of these mutations like N501Y is now Nelly as in Nelly the elephant. So that's the one that has that is present in the UK in the South African and Brazilian lineages in fact all of the variants of concern and then you also have EEC which is E484K which is another very interesting one and actually so that is in the B1.351 originally discovered in South Africa and also in P1 originally and P2 from Brazil and that I think is associated with some vaccines that may be working as well as they should or they hope and that has now been found in the UK in our own homegrown UK variant B117 which is quite interesting because these mutations are arising independently and unfortunately you're probably going to make things slightly worse. So I've heard EEC a couple of times and you explained it but also it still went over my head can you say it one more time how EEC got its name? EEC is E484K and when you kind of look at a written down you think oh okay yeah that that is EEC. I don't know if it's a thing in America but in the UK people have personalized number plates and they have to have kind of letters and numbers in particular combinations so people kind of make words out of these things you know. That's the same thing as with Doug Douglas and Dan? Yes. E614N. My god now we need another list of all of the the colloquial names for these variants. Yeah so there is a few really good useful resources out now. One of them is the Clam Arctic Workshop online resources so that's like a course that's run for 133 people from like more than 30 countries and it has there's YouTube videos up there's homework you can do and do assignments just to give you a kind of quick introduction into all that is how to analyze data you know that your arctic sequencing data. So you've got really great people like Josh Quick who developed Arctic and you've got Ono Atul and Verity Hill who've been developing Pangolin and Civet and Llama and all the other people you know who Nick Loman just just it's like all the rock stars of the SARS-CoV-2 world have come together and that's that's super handy and I know Lee you said you have some resources as well? Yeah CDC recently came out with a COVID-19 genomic epidemiology toolkit. Several people were involved with it. Greg Armstrong and Nancy Chow, Satavia Morrison, Mike Wigan. I'm gonna stop reading names off the website but it's a bunch of videos just explaining a nice primer and it even helps epidemiologists get into it. Yeah some of them are really nice it's like it's straightforward stuff like how to read a phylogenetic tree what the gene about the genome applications in Arizona it has here listed as well. So this looks like super good primer especially for folks who haven't really dug into genomic epi that much to kind of you know try and get their head around it. So that's really nice and I think if you put that together as a as a primer for the workshop which is more practical the client big data workshop then you're gonna have you'll be able to press out some pretty well-rounded analysts pretty quickly with this sort of with these sort of resources. I guess that training and all that hasn't caught up because all the people doing this you know it changes every week it seems but also all the people doing it are super busy coping with a million different samples coming through the door that they have to get out and analyze and whatnot and it's constant firefighting so you know sitting down for a few weeks to you know fine-tune a training course is going to be very difficult for for a lot of these experts. So it's great that the resources are right there but I'm sure things will get refined as as we go along you know and we'll get more and more resources and it'll be more coordinated and more coherent. Yeah I can see like a really nice set of materials that are that are popping up that we can probably synthesize together to make a nice you know big aggregate course for people to get introduced to it. So I've seen some PCR methods coming out which just target the or try and type the variants of concern that are currently concerning and my first thought about that is that these are probably going to be at a date in a week or two unless they are going to flip it over just to looking at mutations of concern like Nelly and Eek rather than actually looking at the lineages themselves but then you lose the whole genomic epidemiology part of it. I guess you could also go to an extreme and only sequence part of the spike protein you know with Sanger sequencing or something like that. What are your thoughts on that Nabil? Yeah I think it's baby steps with this one. It's I mean we're coming from over Christmas using the spike gene target failure as the main way of detecting the B1.1.7. So this and that's a that's a real dirty hack of just I'm just seeing like oh the spike gene doesn't show up too well with this particular assay so we're going to go back to go back to that data. So now we've got people coming up with protocols. They're putting it up on Protocols.io. We're seeing some sharing. Maybe this might culminate in Taskforce turning this around quickly, or maybe some new technologies that aren't so rigid and a bit more flexible. But this is, this to me is like a great, great first couple of steps that there is an open community trying to get on top of this. And it does help, I suppose, countries which don't have the same resources. You know, PCR is a little bit more straightforward than genome sequencing. Yeah, definitely. I think that's always a consideration, always a problem that we're going to be using RT-qPCR for a very long time for just as the primary tracking method in a lot of situations. Although I suppose Denmark have shown that if you really need to get a sequencing project off the ground, you can do it very, very quickly. And you can see Mads Albertsson has got a fantastic lab. It seems to be shoved into a teeny tiny little room, but they have about 20 or 30 minions running simultaneously, just wires everywhere, computers everywhere. And they are, you know, now it's like one of the top countries in the world for generating genomes per head of population. Actually, they might be the top and they're just doing an amazing job. And what they've been able to track now is genomics rather than just PCR. The percentage increase of B1.1.7, obviously Denmark is quite close to the UK physically, but there's a little bit of water in between, but a lot of passengers going back and forth. And that I think is a good proxy for what other European countries are facing because they're up to what, 19% now. And you can kind of see it go, going up very, very rapidly, you know, 2%, 4% and so on every week. And tracking that progress with just genome sequencing. So there you go. Stop whinging and set up your lab like mads. Yeah. It's good to see that they're getting, I don't know if they're handing it over, if they're getting more support from SSI on this as well. So that's really going to formalize and, and, you know, just amplify what, what their efforts are, which is, which is great. And I think just as a closing final bit of new development is GISAID has launched a CLI API submission tool. Something I got an email about the other day previously, if you wanted to put your genomes up on GISAID, you had to make this Excel sheet or CSV file and have all your sequences in another little multi- facet file. And then you had to submit it to a webpage and it was good because you could do batches in one go. So, you know, you do a couple of hundred at a time, but it wasn't programmatic. It wasn't something that you could just have as you generated consensus sequences that you could just flick it over to them. But now they have allowed an API to do exactly that. And that's going to be excellent for turnaround of, of genomic information. So all of the resources that we've been talking about, except for the PCR resource is going to be that much faster at picking up what's going on because the data, our people will be able to submit the data that much easier. That's all the time we have for today. And we've been just talking about some of the cutting edge SARS-CoV-2 resources and yeah, special thanks to Leo for joining us and we'll see you all next time. Thank you all so much for listening to us at home. If you liked this podcast, please subscribe and like us on iTunes, Spotify, SoundCloud, or the platform of your choice. And if you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group and edited by Nick Waters. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadrant Institute.