[Speaker A]: Hello, and thank you for listening to the Micro Binti podcast. Here, we will be discussing topics in microbial bioinformatics. We hope that we can give you some insights, tips, and tricks along the way. There is so much information we all know from working on the field, but nobody really writes it down. There's no manual, and it's assumed you'll pick it up. We hope to fill in a few of these gaps. I am Dr. Lee Katz. My co-hosts are Dr. Nabil Ali Khan and Professor Andrew Page. Nabil is a senior bioinformatician at the Centre for Genomic Pathogen Surveillance at the University of Oxford. Andrew is the CTO at Origin Sciences and Visiting Professor at the University of East Anglia. Hi and welcome to the Microbinfeed podcast. Nabil and I are here. Andrew is still detained in his real job. And in his place, we still have a friend and colleague, Clint. He's an expert in SARS-CoV-2 and has a background in virology. And just a reminder that the opinions expressed here are our own and not those of our employers. We're still talking about pathoplexis. So it's been some time we're letting our special guests from pathoplexis out of the basement and back into the recording after this time. So welcome back, Dr. Emma Hudcraft, Dr. Theo Sanderson and Mr. Arthur Shem. And how do you guys feel after all this time in the basement? Are you okay? Do you need any water?

[Speaker B]: I mean, it would have been great if you'd fed us, but I think we can struggle through.

[Speaker A]: Water only. So last time we talked about interoperability, how data is handled, how you guys put stuff in INSDC. I thought maybe we talk about governance and maybe your future directions. Do you want to talk about how pathoplexis is sort of a special place because you guys have thought so much about the governance behind it?

[Speaker B]: I can start, but I think it would be great to also get Theo and maybe even Arthur's thoughts about the governance because this was something that was really close to our hearts as we were starting out with this crazy idea of making another pathogen database because we thought it was so important for defining how that database works and how it's attached to the community. So for us, it was really critical that if we were going to start a database. Is it the part of the community that it's aiming to serve? So Pathoplex is, you know, all of us that have founded it, all of the members that are part of it, we're all people that are at the coalface of pathogen genomics and public health. We work with this data, we use this data, we generate this data, we need this data to work with governments to come up with interventions to try and mitigate the impact that diseases have. And so it's very important that the database that Pathoplex is. is always working to help realize those goals. And the best way that it can do that is by ensuring that it's getting feedback from the community. So Path of Plexus is something that is designed to be kind of with members that are held accountable. We do that through a legal structure that's available here in Switzerland called a Verein, which is kind of like a it's an association in English, but it's a legally defined structure. And it means that all of our members have real power. So they can, for example, vote out the executive board if they're not happy with how the executive board is doing things and vote in a new executive board. And this was really important to us because it sounds a little extreme, but it means that Means that if it ever feels to the community like pathoplexis is not achieving its purpose, if it is going in a wrong direction, if it is doing things that the community doesn't think are helpful or good, then they can change that direction. The members and the community can change that direction. And I think we're a little bit unique in that regard. And it also does sometimes pose some. challenges because for example you know setting up a legal organization where all of our executive board for example is from five different continents around the world and this is not what most associations in any country are kind of used to accommodating but it's something that we're really proud that we're able to say and we're proud that we're able to have that structure but yeah I also think it would be great to hear Theo and Arthur's thoughts as well.

[Speaker C]: Yeah, no, I mean, I think Emma said most of it. But yeah, I just think it is really important that these, you know, this is a tool for the community and which needs to serve that community and in the best way that it possibly can and setting up sort of transparent governance and accountable governance is really important to make that possible.

[Speaker D]: Yeah, just just what on that is the transport. transparency in how decisions are made, even how the feedback is collected, not only from the community users, actually all users because everyone uses both the platforms, but other experts in the field. And we do have these monthly meetings where we're able to follow up and track their issues and how all this is documented and monitored.

[Speaker C]: I guess the other thing that sort of intersects with that is that, of course, everything's open source. so like if there's a feature you want you can you can raise an issue on github and you know you can you can even make you know implement it yourself and make a pull request and ask us to to add that so that's all part of this sort of open ethos of and that just is the most efficient way to do things because it means that you become aware of bugs that you wouldn't have otherwise and so on

[Speaker A]: Wait, so do you get... Tell us about any interesting issues or stuff that people put onto GitHub. That's an interesting avenue, actually.

[Speaker C]: Yeah, I mean, I think Andrew Rambo has got a lot of requests about how we do our checkboxes, whether we should be able to being able to sort of select to select like all of so you so to be able to do a. search and then select everything but then unselect like 10 different sequences so if you the listener desperately want that feature then chime in on the issue and we can prioritize that

[Speaker A]: That's incredible. So as a user, Arthur, have you done this before also? Have you added your own opinions or requests on there?

[Speaker D]: Unfortunately not, but I initially, I think, participated in documenting this. At that time, I think we wanted to have another, it was a suggestion from some of the meetings that we hold weekly. And one of the other groups had suggested to us that we can have another Excel sheet where we have sort of these issues sort of compiled within there from GitHub. I started help start documenting that, but personally I didn't raise any issues there.

[Speaker C]: A sort of related point there is that so you can raise issues about our software, you can also raise issues about our sequences. So any of the sequences have a button to say report an issue with this sequence and that actually at the moment again creates a GitHub issue. And so you can say, look, you know, I don't think this collection date makes sense like that lineage wasn't circulating at that point. And then again, with this sort of open and transparent. Transparent approach, there's a team of curators who can go in and either raise with the original sequence submitters that this is a problem, or if it's an INSDC sequence and one can't get hold of the original sequence submitters, we can again add a curation to update the data to correct things that are clearly incorrect.

[Speaker A]: Wow.

[Speaker D]: Yeah, but maybe just on top of what you said, I think The corrections from the directly submitted sequences, that's mandated of the submitters, that the groups. It can be flagged by the curators, but the ultimate correction is done by the submitters, who can decline by the way, but they can say that the issue remains flagged until they're able to respond to that.

[Speaker E]: So I had one question. How do we feel about commercial use?

[Speaker B]: Yeah, so this is always a bit of a touchy subject. So when you upload sequences to Pathoplexis, you allow them to be used within whatever the data use term states. So for restricted use sequences, obviously this does restrict how you can use those for publications and for preprints, but it doesn't restrict that someone could use them for something for commercial use. This was something that we talked about a lot because as I'm sure many listeners... are aware this is something that is a hot topic in all of the infectious disease world right now especially when it comes to kind of pathogen access benefits so the idea that many countries or in some cases one country if that's the country where an outbreak is is happening contribute this data, contribute viral genomic data about a pathogen, and then companies can use this to develop, for example, a vaccine that may then not be affordable or even available in the country that needs it. And unfortunately, we saw this in the pandemic where we had sequences, you know, for new variants that were being uploaded and shared by some countries who then were not able to afford or weren't able to access vaccines that came from those. And this is a real problem because obviously that does not encourage data sharing and it highlights some big inequality and lack of equity in the public health world and in the pharmaceutical world as well. But it's a really big problem and it has not been solved by joint efforts of many global organizations. And so at Pathoplexus, we did not think that we could, we didn't think so highly of ourselves that we thought we could solve the problem that has. has stumped many, many countries and many global experts for years now. What we did do though is we acknowledge this is a problem and we hope that there will be a solution someday. And so what we do have in our statutes is that when there is a globally accepted solution to pathogen access benefits, Pathoplexes will comply to this. So we want to be part of the solution, but we didn't think that we are in a place to decide what that solution is because it is a a big complex problem. The last remark I want to say on that front is also just to say that when you upload sequences to pathoplexus, you don't explicitly give any rights away. So for example, if a company did decide to use your sequence to create a vaccine, you have not given away the rights for them to do that in an explicit sense, which you do with some other databases. And so you could in theory, take you know have a legal challenge if you you know obviously the space here between theory and reality is fairly big how likely some scientists can go up against the pharmaceutical country but in theory you have not kind of given away your right to to make a legal challenge about how that sequence could be used or what what benefits you should get from the use of that sequence it's a very small thing but for us it was also an important thing

[Speaker F]: And so that would apply to that one year, up to one year embargo period. And after that. What happened?

[Speaker B]: This is separate from the restricted use. This is just your rights as someone who generated that sequence. I mean, you know, legally it's very complex and I don't want to imply that necessarily this would be a successful challenge, but in some other databases you explicitly say that companies can develop vaccines from your sequences. At Pathoplexus,

[Speaker F]: Okay.

[Speaker B]: you do not give away that right and so you could in theory challenge that. And that's forever, not just for a year.

[Speaker A]: So moving on a little bit, I know that you guys are approaching your one year anniversary. And I guess at this time, we want to say that today's date is August 22nd when we're recording. And this is going to be, you know, released in the future. So I imagine we're going to be past the one year anniversary. Do you guys want to bring up any highlights after your one year?

[Speaker B]: Yeah, I think that one of our big highlights, so yeah, I'll just name one because I think it would be great for Theo and Arthur to jump in as well. But I think one that I'm quite proud of is that we did have more mPOK sequences shared to Pathoplexus than shared to INSDC directly. So more uploaded to us directly than to GenBank or to ENA. Of course, very importantly, most of those are now in GenBank and ENA because we sent them there. but I think it really showed that making an interface that's really easy to use and in some cases people did use the restricted use terms when they first uploaded those sequences so that they could prepare a publication even though they're now open it shows that there's a need for that and it shows that it can attract people and it can attract people to share sequences that they otherwise wouldn't have shared that early or may not have shared at all and especially in the face of the ongoing MPOX crisis to me that really really speaks to hopefully, well, I think it really speaks to the fact that we have built something that addresses one of the problems that we wanted to address.

[Speaker C]: From my side, like we've had a lot of the code is constantly changing. This is what I spend a fair amount of my time doing. So we've had thousands of commits to the code base. I think the code base is now on sort of 3000 commits. And you know, those are new features that we're adding all the time to just try to make. make things a bit smoother, a bit better. And so that's been exciting. Yeah, to see, to see, you know, it really getting used and actually empowering real sharing of data from important outbreaks is really exciting. And, and to really have expanded the number of organisms is great. And the fact that we're continuing to do that with this launch of measles, which by the time you're listening, you will, you will be able to be able to submit mesal sequences to Pathoplexus, so do come and come and do that. Arthur, sorry.

[Speaker D]: That's a good, this is something that I'd love to not miss and cause the sequencing initiatives here in Uganda worked so hard on it and also the collaborators as well across the continent. It's the first Ebola sequence that was shared with Plexus, that was a monumental moment for us, but a proper Plexus as well. So it's something worth highlighting.

[Speaker B]: I guess maybe if I can add one last highlight, we've also seen that people are starting to use Pathoplexes in, for example, training workshops and tutorials and online materials. We've seen that people are including Pathoplexes as a way of getting data because it's really easy because you can either do this and you can do this both ways. You can, you know, for a kind of slower ramp up, you can have them go to the website. filter for particular countries or mutations or whatever and download that data set and get going with whatever you know phylogenetics or whatever you're trying to teach or of course for a kind of a higher level workshop you can have them calling the API and learning how to get those sequences filter them make different requests and it's been super super cool to see this being incorporated into teaching material sometimes we stumble across it on on github or online and that's like a little a little celebration to see people people finding this so useful for teaching others.

[Speaker A]: That's amazing. So with all of your accomplishments, with all the governance, I'm going to go into one other topic, and that is funding. So I guess, I guess this is a lot of of community supported efforts, and it's a lot that you guys have done a lot. How do people donate to you? Is that your preferred thing? Do you want to do you want to take a moment to express how people can support you?

[Speaker C]: Yeah, I mean, I guess we should we should we should highlight that we have a we should highlight firstly that that this has been the work of a lot of people beyond people on this call. So so. You can go on Pathflex's website, you can see the development team, which is an expansive set of people both in Switzerland, in the UK and across the world. And but yeah, people if people want to donate, they'd be extremely welcome to there's a there's the funding page on our website and that provides details of all sorts of different ways that you can provide money. And yeah, it's generally sort of interesting that it's surprisingly challenging if you think about about how much the world spends on sequencing viruses you know I mean just yeah SARS-CoV-2 it was sort of so so many millions of dollars and and and I think you know to we should we should use some resources to make sure that those sequence data that we've spent so much money generating actually get shared effectively and gets analysed well and I guess we found it surprisingly tricky to to attract institutional support and so yeah if you're if you're in touch with a rich philanthropist do also send them our way we'd be more than happy to chat to anyone

[Speaker E]: I just checked on the website. You can donate in Bitcoin. Something I didn't expect.

[Speaker F]: And Ethereum.

[Speaker B]: open source from top to bottom man i guess if i can maybe make one more point i mean it's it's really just doubling down on what theo said but i think it's so important because it's worth emphasizing there are so many funding agencies right now that that are rightfully so are emphasizing what increased sequencing could be doing for public health for research because we saw how game-changing it was in the pandemic we could be doing this for so many other pathogens out there and so they're investing a lot in training they're investing a lot in up up scaling sequencing in having cheaper and more efficient sequencing in having sequencing be more available in getting sequences to more countries this is all fantastic and I'm so happy they're doing it and yet what happens in those sequencers once they fall off the end of that sequencer is still something that is seen as kind of somebody else's problem I think it's the classic case of everyone kind of kind of pointing a finger and being like, well, someone else will take care of this bit. And I think there's a chance this becomes a real lost data situation where we've invested so much to get that data. Hopefully there were still health outcomes, for example, for interventions or for understanding, even if that data stays private, but we could get so much more out of it if we invested to make sure that people have easy ways to share that data. ways to share it quickly while feeling protected, ways to share it where they feel comfortable, and also that it ends up being truly open because we also don't want to be in a situation where that data can disappear forever. And one of the things that's really important about Pathoplexus is that the data can be redistributed. So you have to do that with rules. You have to include the data use terms. You have to make it very clear the data that's restricted if you're redistributing the restricted. restricted data. But it means that if the bottom fell out tomorrow and, you know, pathoplexus exploded, someone could take our data set and they could create a way that that data doesn't die, a way that that data is still available. And for us, that was critical because we don't want there to be one point of failure where thousands or even millions of sequences could get lost because there was one link in the chain that had control over that or one link in the chain that could fail. and so we really hope that not only is this something that makes data sharing easier and more equitable but also it means that data is shared forever so it's a permanent sharing that that data is open and available

[Speaker A]: That's incredible.

[Speaker E]: What? I think that's a great place to end.

[Speaker F]: And that ends in a bunch of my questions too.

[Speaker A]: Okay, so we talked a lot about governance and your amazing one year, the one year anniversary of Path of Plexus. We talked about all sorts of things on here. I mean, you've made some excellent points. I really appreciate it. What happens at the end of sequencing? Whose problem is it? So thank you all so much, Emma, Theo, Arthur. Thank you, thank you, thank you. Thank you for all the work you do.

[Speaker C]: Thanks for having us.

[Speaker D]: Thank you.

[Speaker B]: It's been a real pleasure to be here. Thank you, Lee, Nabil and Clint.

[Speaker A]: Thank you so much for listening to our podcast. If you liked this podcast, please subscribe and rate us on iTunes, Spotify, SoundCloud, or the platform of your choice. This podcast was recorded by the Microbial Biowinformatics Group. For more information, go to microbinfie.github.io. The opinions expressed here are our own and do not necessarily reflect the views of is of origin sciences, the Center for Genomic Pathogen Surveillance, or CDC.