Hello and thank you for listening to the MicroBinFeed podcast. Here we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There is so much information we all know from working in the field, but nobody writes it down. There is no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work on microbes in food and the impact on human health. I work at Centers for Disease Control and Prevention and am an adjunct member at the University of Georgia in the US. Hello and welcome to the MicroBinFeed podcast. I am Nabil Ali Khan, and I'll be your host today. This episode, we'll be doing a deep dive into bioinformatics in regards to antimicrobial resistance. Today, I'm joined by a very special guest, Dr. Kate Baker. Dr. Baker is a Wellcome Trust Clinical Research Career Development Fellow and Honorary Senior Lecturer at the Department of Functional and Comparative Genomics in the Institute of Integrative Biology at the University of Liverpool. She is interested in genomic epidemiology of infectious diseases with a particular interest in what drives the emergence and persistence of disease. Her work focuses largely on Shigella and antimicrobial resistance. She's really interested in picking apart the independent epidemiology of antimicrobial resistant determinants and how they shape bacterial populations and disease outbreaks. She works in collaboration with public health agencies and works across both high and low income settings. Thank you, Kate, for joining me today. Thanks for having me on, Nabil. I wanted to first start off by asking, who are you and what do you do? What is a typical day in the life of Kate Baker? There's not many typical days. There's not much to return. Basically, I'm a research group leader at the University of Liverpool. I guess I spend quite a large amount of my time managing my team. I've got really, really excellent people. I've got a couple of postdocs and two and a half PhD students. Obviously, a lot of my time is spent supervising them and helping them grow their projects. Yes. What happened to the other half? The other half? The other half is supervised by someone else. Hopefully, you've got the better half. Yes. Then, I guess quite a lot of time is dedicated to directing the ship of the group in terms of not just delivering the current science, but planning what we're going to do next and grant writing and just maintaining relationships and communicating with collaborators. I've got an embarrassingly small teaching load and a few other roles in the university as well as some broader community roles. I'm an editor for microbial genomics, attending at conferences and giving talks and those kinds of things as well. A few different hats. What about you in terms of dealing with antimicrobial resistance? How much of your work has been focusing on that? I'm kind of, I guess, mostly interested in emerging infectious diseases, really. I think as soon as you switch to bacteria, antimicrobial resistance becomes a massive part of that because it's so prevalent in shaping bacterial populations at the moment. And so, you become kind of an accidental expert. Expert's a strong word, but you know, you become very involved in looking at antimicrobial resistance because of your interest in other things. So that would be the key determinant. Would that be the key determinant globally for emerging infectious diseases would be antimicrobial resistance? Not necessarily. It's certainly well funded at the moment, which is another reason I guess I was working on it. I think it's one of the most obvious things because it's this really artificial kind of selection pressure, which we've placed on bacterial populations. It's very measurable. It's very changeable. And the other changes in populations are a bit more subtle and happen over time. It's not that they're not happening. It's just that AMR is kind of this great global experiment that we've been doing with bacterial populations and we're now kind of in a position to measure the impact of that. When you approach a project, what would be your favorite bioinformatics tools in regards to, say, detecting and annotating this kind of resistance? So my most frequently used tool, I mean, I don't have a favorite tool because I think you have to pick the tool that's fit for your purpose. So it depends on the dataset you're working on, on what you're trying to do, right? I most commonly use ResFinder and PointFinder because they do a very good job on the pathogen I work on. And I'm almost always interested in the whole genomic context of a gene. So I would very rarely not want assembled genomes alongside my resistance gene predictions. So it's not a big deal for me to wait for assembly to then go on to use those tools. So other than having that genomic context, do you notice any slight differences in just outright quality in detection versus assemblies against reads, for instance? I think it's, I mean, to be fair, I don't have a huge amount of experience detecting in reads. I mean, you want to get straight to the point, so why bother, I suppose, right? Well, it's just in that, I mean, I'm not in a field, you know, I'm not in metagenomics where assembly is complicated. I'm not in like bedside clinical diagnostics where you need to be working with the raw data as it comes off the machine. It's not, for the projects I work on, it's not a big deal to set up the assemblies and come back the next day and then get the resistance results. You know, I'm not working to the timescale that I need the results. Okay, but often with antimicrobial resistance, a lot of these are on fairly complex cassettes with a lot of repetitive elements. Do you find that you can actually recover these to the extent that you'd like? Right, no, sorry, I now understand your question. Of course not. No, yeah, it's horribly limited and Shigella is like the worst pathogen for that. I was just checking myself because I know it's being recorded, the worst pathogen. I'm pretty sure that it's up there in terms of fragmentation. What would be the expected, well, what would be the expected number of contigs you'd expect from a Shigella genome as compared to E. coli, for instance? So of course, it all depends how you sequence it, but using kind of standard Illumina pipelines, looking at kind of, you know, several hundred base pair fragment libraries and paired end reads, you're still only going to get like three to 400 contigs for Shigella because they have loads of insertion sequences just littered throughout their genome. There's a really good paper by Jane Hawkey and Kat Holt and Co, obviously, in preprint at the moment, which shows just how extensive the IS diversity in Shigella is. And compared to E. coli, you know, E. coli is kind of hovering in the tens of IS or repeat elements that break them down. And so it's something like 50 to 100 contigs rather to several hundred. And so it's really a problem in Shigella. Following on from that, in terms of Shigella, where is the antimicrobial resistance for people who don't know the species very well? Is a lot of it driven by point mutations on the chromosome? Is a lot of it driven by plasmids? Where is it actually coming from? I guess it depends on the phenotype you're looking at, obviously. So suprafloxacin-resistant Shigella is a really big problem. It's on the World Health Organization's top dozen, you know, priority pathogens for new antimicrobial agents. And that's because suprafloxacin is the recommended treatment for Shigella and we don't have a licensed vaccine, and it causes a massive global burden of disease. So that's a really big problem. But it's and it's caused by point mutations, but there's only one kind of genetic context to it, really, in that we get these accumulation of point mutations in a quinoline resistance determining region. But in addition to those, that really important kind of chromosomally mediated phenotype, there's a huge number of acquired resistance gene phenotypes in Shigella, and a lot of them are on quite commonly found mobile genetic elements. And so there's integrated chromosomal islands, as well as a number of plasmids that have kind of found quite a comfortable home in Shigella. And so I think quantifying antimicrobial resistance is something that's really hard to do. Yeah, so we've touched on a couple of different elements that are introducing or mobilizing antimicrobial resistance. So far, we've had point mutations on the chromosome, ICEs and plasmids, so mobile genetic elements. Are there other mobile genetic elements? Is it ever mobilized by a phage or something else? What are some of the other mechanisms out there? So obviously, you get smaller units of mobile genetic determination, so things like integrons and transposons. I know that phage-mediated resistance can happen. It's certainly not a big part of what happens in Shigella, and I can't think of a reference where someone has demonstrated it for Shigella. That's not to say it couldn't happen. But yeah, that's the list in my head. All right, and that sounds like a rather heady list of things to keep track of. And are there any particular tools other than ResFinder and PointFinder that you use to detect these? What are the approaches that you're familiar with that you would use to identify these different systems? Yeah, so I mean, obviously, there's PlasmidFinder, which is based around replicon typing and ink typing of genetic sequences. And I certainly have used that. And I'd like to couple it by actually pulling out AMR contigs, and then, you know, AMR gene-containing contigs, and then putting them through PlasmidFinder. But in my experience, I mean, they're very good, but it's got the same problem everything else has, right? It's database limitation. So you quite often get things that either don't type, or because of the fragmentation issue, you know, you might get an integral. on a plasmid but the problem is by the time you get the contigs they're fragmented and you've just got the integron so you can't obviously plasmid type that and you can look at association and stuff but really for this kind of work you know we need to be moving towards long-read sequencing to be pulling out that level of detail. So based on your experience there is no way to resolve this from short-reads you have to have more sequencing information through long-reads? I guess it depends on the pathogen but certainly for my group working mainly with Shigella that's the direction we're going in. Okay and it's interesting you pick up on the merging the sort of res finder and plasmid finder because a lot of people keep asking me the same thing of how do I get you know identify the ink type and then also figure out which resistance genes are associated with that plasmid and to my knowledge there isn't a tool that does it I just say well you're gonna have to get your hands dirty with sequences and blast to figure it out. Sort of I mean I think you can do quite a lot of statistical magic to associate you know the presence of a particular ink type with the presence of a particular resistance gene. Yeah so one way you can approach that problem is to do statistical analysis of the presence of particular you know ink types and plasmid groups with particular resistance genes and and you can always use that kind of approach to then target long-range sequencing. You know we're still not or most of us aren't in the position where we can long-range sequence the kind of data sets we've been working with with short-range sequencing. So you know we've been working with hundreds of genomes at a time and it's still not financially viable nor are the tools available to process things you know at that scale in long-range sequencing. So you can use those kind of statistical approaches to target your long-range sequencing and then once you have a scaffold of a reference then you can obviously use your short-range data in the ways that we're all familiar with by mapping back and looking to see how the coverage is across that region and those things. As part of your group I'm sure there's a lot in the wet lab as well taking these predictions and then testing them empirically. In your experience with in particular for Shigella, what is your experience with the accuracy of these predictions and what you can actually see based off the phenotype? Actually it's quite variable in terms of just preps and intra-sample variation, intra-colony variation, even intra-prep variation between different genomic preps that we've done. There's a lot of variation in you know the gene content and the phenotype between those sorts of scales and I think that's something that's going to come back to kind of you know I don't know if it's just Shigella or if everyone else's pathogens are doing this as well but it's going to become an important part of how we do our studies. Well it sort of sounds familiar from my experience. I sometimes see mixed mixed colonies from the same plate, you have multiple pigs and you sort of see different resistance patterns within something that's supposed to be clonal. I don't know if you're seeing something like that yourself. No, that's exactly it. So we see quite a lot of variation in genome content and you know including those crucial mobile genetic elements and you know we're just sort of starting to get to the bottom of some of that in some more focused studies but you know I guess it worries me in how things are interpreted. Part of the problem is as well you know some of these resistance phenotypes are more obvious than others in that you know obviously if you have a functional beta-latinase gene then that is going to confer the phenotype. You know some of the resistances are much more obvious than others whereas a lot of things like changes in modulation of expression of resistance genes and things that might cause more subtle changes in the minimum inhibitory concentrations are not so easy to detect. Because I think a lot of the time we work in this space of kind of resistant or susceptible and that's actually not the reality of what these bugs are doing. You know this is actually continuous data which we forced to be discrete for the convenience of analysis. So you mentioned that there are some that are easier to call over others. For me things like some fundamite resistance is brain-dead easy and others are a lot more cryptic. For you do you agree with that and which are the ones that you find that that you would just say are really easy to detect genomically and then in the lab? We don't do a huge amount of phenotype- genome correlation in my group but the experiences I have to date that I'd comment on for that you know beta-latinase is a really complex trait. I think you know I guess if it were a human trait we'd probably call it polygenic. You know for enterobacteriaceae they have you know AMPC genes encoded on the chromosome and small variations in their promoter region can cause a higher resistance than others and then obviously there's other genes that can come in as horizontal gene exchange like Blarox or Blartem genes. You know to look at that as a gene presence equals phenotype presence is just kind of artificial and you know obviously for a lot of these there's efflux pumps and things involved as well and when it comes to something more like azithromycin resistance I mean if you have azithromycin resistance if you have an MPHA gene and an ARMLB gene you will be resistant at a high level against azithromycin but then there's again all these kind of shades of grey at the lower regions that are harder to pick apart. Is it always this cascade of multiple events in play? I mean it sounds it sounds like even the simplest ones are fairly difficult to call or you wouldn't necessarily say that the predictions you get straight out of these databases would be reliable. I'm probably getting too caught up in the grey right some of them are very obvious if you have you know MPHA and ARMLB you will have a high level azithromycin resistance but I think it's important not to lose sight of the fact that below these massive you know these ones where there's this obvious massive increase in MIC where there's a very specific mode of action there's a lot of stuff happening underneath to do with regulation and to do with small changes in expression of you know a bunch of normal bacterial defense mechanisms that we are not yet classifying as resistance gene. Yeah I mean you touch on mode of action how much do we understand of the mode of action I suppose at the moment we have a good idea of the low-hanging fruit the stuff that's really obvious but what proportion do we just simply don't understand? I really don't think that there's a quantitation for I mean in terms of do you mean for every resistance gene do we have a function is that or is it kind of for the amount of volume of burden of disease do we how much of the resistance do we understand? No I mean I'm talking more in the sense that we've got the we've got this obvious threshold where we can figure out very obvious very obvious cases and then they're probably more much more oblique ones which are gray but it's are they what's the range of that is it that are the gray area ones like 5% of the total case total cases we could expect or is it 90%? You mean kind of correlation of genotype and phenotype? No just being able to just being able to predict it reliably with these tools bioinformatically or even just in the lab. Yeah okay so it depends on what you're working on right so for Shigella the prediction is actually pretty good and that varies with antimicrobial class and probably for the reasons we were discussing before where a lot of things have more than one thing involved and might have other mechanisms but obviously Shigella is really closely related to everyone's favorite pathogen E.coli you know it's a big human pathogen so there's been a lot of studies into the antimicrobial resistance of it and the mechanisms that underpin that whereas if you are working with some wild and beautiful you know heretofore unknown pathogen environmented from you know from environmental samples then your chances of being able to correlate phenotype and genotype are probably much less good you know it's a biased system towards human clinical pathogens. But you're saying that I can't I can't write one universal tool that just solves everyone's problems? Well if you can you'd never have to work again. There's so much money in microbial bioinformatics. You're saying but that tool would be very unlikely. Yeah I think unlikely because it here's the thing well I think so antimicrobial resistance is a bit like how everyone used to talk about cancer. It's such a complex multifaceted thing that you know it just involves so many different mechanisms and forms of mobility and you know methods of detection in that you know some things are all in the regulation and and that won't be detected by you know most of the tools we have now and and I think once we can start to kind of break it down a bit you know we are starting to in terms of talking about you know point mutations versus horizontally acquired genes you know we are picking it apart a bit but I don't think there will ever be a one- size-fits-all tool because it's just such a massive complex phenotypes that we're trying to capture with this catch-all word. We used to talk about cancer and cancer was one thing and people talked about curing cancer and at least now you know that's matured a bit and now people are talking about breast cancer and melanoma and testicular cancer and and you know I think to be fair you know antimicrobial resistance needs the same kind of thing instead of trying to talk about it as if it's this one thing that we will be able to solve in one way. It's not the case you know it's far too complex a phenomenon to be you know trying to address. I mean I think on that note of like you know trying to correlate gene is present therefore resistance I mean one thing that really gets me about the whole as we move into genomics for prediction of resistance in surveillance what bothers me is that the gold standard is sort of MIC breakpoints mewn gwirionedd, yn amlwg, mae cyfrifiad genedlaethol yw a yw hynny'n cyfathrebu gyda'r data MIC. Ac eto, mae'n arddangosfa i'r rhaglen, ond dydyn ni ddim yn cyfathrebu gyda'r data MIC o ran y ffordd y bydd y cyfrifiad genedlaethol a'r cyfrifiad genedlaethol, yn hytrach na sut y bydd y bugiau yn ymdrechu yn yr ystafell. Ac dydyn ni ddim yn gwybod sut y byddwn ni'n mynd allan o hynny, ond mae'n un peth sy'n fy nghyfathrebu gyda'r holl cyfrifiad genedlaethol. Mae'n dweud, oh, iawn, ac rydyn ni'n ei gael i'w hyrwyddo y MIC hwn o'r bug yn y ystafell, a yw hynny wir yn y cyfrifiad genedlaethol rydyn ni'n mynd allan i gyfrifiad genedlaethol y MIC yn y cyfrifiad genedlaethol? Dydyn ni ddim yn gwybod. Dwi'n meddwl, mae'n anodd cael gysylltiad o'r math o wybodaeth ynghylch y canlyniad penodol i ddarparu, o fy nghefnd, fel rhywun sy'n rhaid ysgrifennu dylunio a ceisio gysylltiad o'r math o wybodaeth y canlyniad penodol i ddarparu, dydyn ni ddim yn gwybod. Felly, os yw cyfrifiad genedlaethol, dwi'n meddwl, byddai cyfrifiad genedlaethol yn anhygoel a bod hynny'n canolbwyntio fel standard Dugaldd byddai'n dda, ond dydw i ddim yn gwybod sut y byddwn ni'n cyfrifiad hynny. Nid yw'n wirioneddol, mae'n mwyfactorol hefyd, nid yw'n wirioneddol, y byddai rhywun sy'n cymryd cyfrifiad genedlaethol nid yw'n wirioneddol i'w wneud, yn enwedig pan ydynt ar y pwynt hwnnw, nid yw'n wirioneddol i'w wneud gyda'r antibodiadau maen nhw'n cael, ond dwi'n teimlo, rydych chi'n gwybod, yn ceisio gysylltu â'i gyfrifiad genedlaethol nid yw'n wirioneddol y peth y gallwn ei ddod â. Ond mae hynny'n ddiddorol fel cyfrifiad llawer mwyaf. Mae hynny'n dod, mae hynny'n dod yn ôl o'r cwestiwn canser o ceisio ddod â'r cyfrifiadau sy'n gosodol ar gyfer sefyllfa penodol, y byddwn yn rhaid i ni ddod â phobl dros amser byd eang a recordio popeth a chael llawer o bobl i alluogi o unrhyw ddiddorol ac yna fynd yn ôl a edrych ar beth oeddant yn cael eu hysbysu a sut oedd hynny'n ymwneud â hynny ac yna edrych ar beth oedd y elementau genedlaethol sy'n cyfrifio un ffordd neu'r ail. Ie. Dyna'n ddiddorol. Bydd hynny'n gwneud ni'n gweithredu erioed. Ie, roeddwn i'n mynd i ddweud a edrych yn gyffredinol, iawn? Felly, byddwn ni'n mynd i fod yn cyfrifio llawer o samplau o unrhyw ddiddorol ac, ie, mae'n mynd i fod yn ddiddorol. Ond yna, wel, gadewch, gadewch, gadewch i ni edrych yn ôl ychydig. Dwi'n meddwl efallai oeddech chi wedi edrych ar y niferoedd o sut o ddiddorol i wneud rhywbeth i'r niferoedd hwn. Ie, dwi'n mynd i roi cyfraith politig ar hyn i'r niferoedd hwn. Felly, dwi'n meddwl bod cyfrifiadu AMR yn wir, dim ond eang oherwydd dydyn ni ddim yn gwybod beth ydym yn ei ystyried. Ac, dwi'n meddwl, dydych chi'n gwybod, dydych chi'n gwybod, dydych chi'n gwybod, dydych chi'n gwybod, dydych chi'n gwybod, dydych chi'n gwybod, dydych chi'n gwybod, Mae'n siŵr Mae'n siŵr Mae'n siůr Mae'n siŵr Mae'n siŵr mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŸr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Mae'n siŵr Manau Manau Bydd bod sylwyadau Bydd bod sylwadau Gwych Mae cyfeiriadau symudi Aller yngo feicmor y tradyn y gwydraddau yr glaswyd. Rydym ddim gyda life sylwydau amdano mewn ffordd systemiedig. Offioliaeth Node rhyddhau chweinidod Cymryd Befydl Mae'r Dod Mae'r �� a gopethau rhaid i atal i ddeall y systemau cymhwysedig hwnnw ac, wyt ti'n gwybod, yn y pen draw, rwy'n meddwl am y pethau isolaethol, yn siŵr, ar gyfer pathogenau clinigol dynol. Dw i'n meddwl ei bod Shigella yn ymdrech yn cael ei gysylltiedig o lawer o'r cyflwynoedd o ddarpariaethau a phethau ond mae'r rhaid ymdrech yn mynd i gael llawer o ymddygiad i ddod ymlaen ac mae hynny'n ddiddorol. Rydyn ni'n gwneud gyda'n amser gyda ni, sydd bob amser yn anodd, ond unrhyw sylwadau gwreiddiol o chi? Nid yn wir, ond roedd gen i stori byr os oeddech chi eisiau rhoi hynny. Oh, roeddwn i'n hoffi clywed stori byr, ie. Mae'n stori byr anhygoel, ond rwy'n meddwl roeddwn i eisiau gwneud ymddygiad y byddwn i yn y diwrnodau cyntaf o'n amser mewn pathogenau clinigol dynol, roeddwn i wedi gwneud ymddygiad a gwneud ymddygiad o ran ARDB Anno, syddwn i'n gwybod nad oedd yn cael ei hyrwyddo, ond roeddwn i'n dweud, mae'n iawn, mae'n gwneud yr hyn sy'n ei ddweud ar y tin. Roedd hwn yn amser byr, ac roeddwn i'n gwneud ymddygiad o'r dynol sy'n bwysig iawn ac mae'n llwyddiannus bod cydweithwyr wedi cyflwyno'r hyn cyn i'w cael ei ddatblygu, ond rwy'n meddwl mae'r pwynt cydweithredol yw, rydych chi'n gwybod, mae'r peth hwn yn bwysig, mae'r pwyllgor dylunio'n bwysig, mae'r pwyllgor datblygu'n bwysig, ac felly, rydw i'n gofyn i bawb fod yn gwybod beth y maen nhw'n ei wneud, beth y mae'r dyluniau sy'n eu defnyddio, a bod yn gwybod beth y mae'r biaisau sy'n eiddo eu bod yn rhedeg. Iawn, well, dyna ddiddorol ddiddorol i bawb gwrando'n dda. Ac felly, gyda hynny, byddwn i'n hoffi ddiddorol y peth hwn i'r llwybrau. Diolch yn fawr iawn, Kate, ar gyfer mynd ymlaen heddiw. Mae wedi bod yn llawer o hwyl. Rwy'n credu ein bod ni'n defnyddio llawer o ddiddorolau gwahanol, sy'n siŵr y byddwn ni'n cael llawer o ddiddorol ar y Twitterverse. Felly, diolch yn fawr iawn ar gyfer mynd ymlaen heddiw. Diolch, Nabil. Diolch i chi i gyd am wylio ni yn ein tu hwn. Os ydych chi'n hoffi'r podcast hwn, gallwch chi asgrifio a hoffi ni ar iTunes, Spotify, Soundcloud neu'r ddal hwn o'ch penderfyniad. Ac os ydych chi'n hoffi'r podcast hwn, gallwch chi ddim gwneud unrhyw beth. Roedd y podcast hwn cael ei recordio gan Gwgrop Bio-informatig Cymdeithasol Migrobial a'i edrych gan Nick Waters. Mae'r cyfredodiadau sy'n cael eu hanfod yma yn ein ei hun ac nid yw'n ymwneud â'r ymdrechion o'r CDC neu'r Gwgrop Bio-informatig Cymdeithasol Migrobial.