Hi, and welcome to the MicroBINFI podcast. I am Nabeel, your host for today, and I'm once again joined by our favorite arborist, Dr. Conor Meehan and Dr. Leo Martins. Dr. Conor Meehan is a lecturer in molecular microbiology at the University of Bradford. He specializes in whole genome sequencing and molecular epidemiology of pathogens, particularly mycobacterium tuberculosis, and genome- based bacterial taxonomy. And also with me is Dr. Leo Martins, who works with me and is head of the phylogenomics at the Quadram Institute Bioscience. He enjoys developing and implementing tree-based models and has a number of different software such as BioMC Squared, Konomu, and TreeSignal. Previously he used to work with viruses, eukaryotes, and so on, and has recently switched to working with bacteria. And today we will delve into the dark arts of Bayesian inference and as it applies to microbial bioinformatics that we know and love. Thank you both for joining me again. Thank you. Thanks for inviting us. Yeah. Thanks for having us. All right. So let's start. Let's get right into it. What is Bayesian? Why do we need it? Bayesians are, it's a model. It's a class of models. They are not a philosophy. They're not a cult or a religion, rational thinking. They're just a class of models that are based on conditional probabilities. From a point of phylogenetics, we can talk about the difference between maximum likelihood and Bayesian, and that's really, I guess, what we'll focus on talking about today. So Bayesian itself is a set of models that are able to be used for a variety of different things. It is a mathematical framework inside which you can undertake a lot of hypothesis testing. So if we talk about it from a phylogenetics point of view, we have, we're trying to integrate across the entirety of the data that we have, and this data will be the alignment, but also a lot of other things, which we can talk about later. And here you're trying to say, what is the probability of the model given the data? And that is actually equal to the probability of the data given the model multiplied by the probability of the model divided by the probability of the data. I think I got that right. So it allows you to basically integrate and find out what your phylogeny is, but also a lot of other things on top of that through the incorporation of different data inputs, which we will call priors. And the priors that we put on it are things like model of evolution, which we talked about in the previous episode, the alignment, which is our underlying sequences, and a lot of other data on top of that. So some examples of some data on top of that is, when was this sample taken? Where was this sample taken? Do we know that a sample is actually the ancestor of another sample? And other things like this. So for me, when we talk about microbial bioinformatics and how it relates to phylogenetic Bayesian, it's about integrating multiple different data sources together in order to do some kind of inference. Yeah. And I would just like to add that Bayesian models, they are particularly interesting since you mentioned phylogenetic. They're particularly interesting to phylogenetics because one of the parameters are trees. And trees, they are a parameter that we don't know the average of trees. So you cannot calculate the average of trees or the derivative of trees. You know, they're not like a double, they're not like a real number. And with Bayesian models, you can write this level of complexity, assuming that the trees are one of the variables in your model. So what comes at the end of a maximum likelihood, let's say, basic analysis that you would do in RaxML is a phylogeny. And what comes at the end of a Bayesian one is also a phylogeny, but that can just be either a portion of the analysis that you want to do or actually can even be the input to a Bayesian phylogenetic analysis in order to do other things, especially in something like molecular epidemiology. Yeah. And in Bayesian statistics in general, you are always interested in the uncertainty in the distribution of variables. So depending on your data, you can have more precise output or less precise output. I think I am particularly attracted to Bayesian models because you have to think that the uncertainty and the hypothesis testing is not an afterthought. You have this from the start, right? Yes. And the statistical support for the answer is incorporated into the process. So unlike maximum likelihood, where generally we will run it and get a tree and then do something like bootstrapping in order to look at our support for that tree, the Bayesian process itself incorporates the uncertainty and gives you the statistical support for the answer at the end, all in one package. Okay. And in terms of talking about ends, what is actually a posterior? To talk about the posterior, you have to talk about the prior. So the prior is your prior belief about the parameters. And this is in the absence of data. This is before looking at any data or looking at your particular data. For instance, what do you expect the trees to look like? So this is your prior. And then once you have the data, you incorporate this data to have now the posterior distribution of trees, the posterior distribution for your data. And in the same way as in the other models, you would still have the likelihood there. But now the likelihood is just one of the terms that have to be compared with your prior to have the posterior. So I think in statistics in general, you always split the reality into data, the things that you see that are known, and parameters, things that can have a distribution or you can assume to know. And so in Bayesian statistics, everything that is not data is a parameter. So you have, for instance, if you have your normal distribution, you have the mean, but then you don't assume that the mean is something that once you look at the data, you know with infinite precision. It's something that comes from a prior, and then after looking at the data, it has a posterior distribution. So you still have this uncertainty in the parameters. So in Bayesian, there sometimes is an assumption that you are an expert in the data that is underlying it. And from that, you're able to give it correct priors in order to guide the analysis in the right direction. So I'll give you a small example. So a lot of the work that we do for microbial Bayesian phylogenetics would be something like I want to estimate the mutation rate of the pathogen or the strain that I'm working on. Here, let's say I wanted to work on Mycobacterium kanzasii, and maybe we don't know the mutation rate of that. But we know the mutation rate of Mycobacterium tuberculosis, it's about 10 to the minus seven around that. So I would put in a prior that says for the mutation rate, which is the parameter, I think that the value at the end is probably going to fall somewhere between 10 to the minus five and 10 to the minus nine. It's not going to be 10 to the minus one or two, it's not a virus. So I know as a bacteriologist that it's going to be around here. So I put a prior with a distribution saying the mean is probably 10 to the minus seven, but it could vary somewhere up to 10 to the minus four and down to 10 to the minus nine. So that's the prior that goes in on that parameter. I run my data, et cetera, et cetera, through. And then at the end of it, I'm able to look back at the distribution and that'll say, even though you told me that the prior was probably around here, actually the data strongly pushed it towards 10 to the minus eight or 10 to the minus five or something. So that's the posterior distribution that comes out the end. So I think that maybe helps see how these three things link together. That sounds a lot like if I was, when I'm trying to try and find a rent, a flat to rent, and when they have that search terms of, you know, you go onto the website and you put in the range of the price you want to pay for the flat and you say, Oh, I think it's about five to 700 pounds a month, like, I don't know, whatever. And you look at it and then you look at all the results and you go, these are terrible. I don't want to live here. Or these are, these are, or there are no results or something. And then you change the number and you keep updating till you sort of, sort of fix on your final, final price of what you think it should actually be. And you'll type, and basically you're just refining that range over, over and over again as you keep searching. So you're refining it with data, exactly. With data, yeah, with the existing prices of the properties available. So when I moved from Dublin and I had been renting there for a long time and then I moved to West, sorry, to East Canada, I put in around what I thought it was to rent an apartment in Dublin. Then found I was basically going to be able to get a mansion, but then understood that the income was lower there. So the new data was, this was the income. And then you say, what can you get from that? So it's, as the data comes in from a lot of different sources, what you can afford, what it's like to live there, what the city is, that's all your prior information that you put into the search term. And then, as you said, as you find the parameter is price, and then the posterior is actually the distribution is vastly different to what you think it is. Yeah, it's the same, but more formalized. More formalized, okay. As in, we do those calculations in our head or through the computer, but this is like actual just some formulas that say, how do we really put all that data together? We're integrating it into a model in our own heads. Just people get scared when they hear the word integration, but that is what we do. But so is the type of, is this kind of process, you know, it's an intuitive thing that we're doing and is this an inherently Bayesian way of approaching a problem? Yes and no. So I... Or is Bayesian a subset of this kind of thinking? Yeah, I think we use Bayesian, but maybe in a very similar way how we use non-Bayesian thinking, because, you know, as you accumulate data and so the thing is that I think it's a bit dangerous to assume that Bayesian is a way of thinking of seeing the world because it's very tempting, right? So it's a belief system. So, you know, even the names, they try to lure you into thinking that you can update. But yeah, you can also be a frequentist or a likelihoodist, you know, that everything comes from the likelihood and still reach the similar approach. I think in our mind, the problem is that we do a lot of model selection. So we change the models all the time in our minds. And this is something that, well, maybe Bayesian models, they are a bit better at, you know, you can incorporate more complex models. So maybe in this sense, we are more Bayesian than not. But still, there's a lot of model selection going on in our mind. You know, you change your assumptions, not all the time, but you keep. So science is a way, right, of changing the models that you work with. And this is something that when we are doing the statistical analysis, usually we forget about all the possible models, about how to restrict to a small set of possible models. Hmm. So how would this, so in this example, we've used the term frequentist a few times already. In the scope of finding a flat, how would that proceed if you're a frequentist in terms of picking the place? Would it be something like you take the distribution of all of the prices and you just find the mode of it or the median value of it? Yeah, you look at the distribution of all the prices and then because it's a multivariate distribution and then you see that the price is in one dimension, but then you see how many rooms you have and the location and everything like this as the other ones. So we also do that as well, I think, like you make that calculation like, yes, this flat, roughly around this price, I can get two bedrooms, roughly around this price, I can get a studio, but it's a better location, but it's closer to work, you know, and so that's a more frequentist way of approaching the problem, just weighing up this. Yeah, I think so. But I have some difficulties sometimes in discriminating. Of course, I'm Bayesian at heart because Bayesian, it's easy to implement and you can, anyway, but there is a school of thought that says that actually, if you have non-uniform priors for everything, so if you could write a model in which you don't put any extra information in your priors, then the Bayesian and the frequentist are equivalent. And so the problem is that when you write a Bayesian model, when you write models in general, it's very hard to have a completely flat, you know, to pretend that you know nothing about any of the parameters. You always end up having some prior distribution, some prior knowledge on the parameters, and that's why they become different. By the way, this is called an objective prior. So an objective prior is an attempt at making all the information come from the data. So both of you were extolling that as a virtue of Bayesian, that you have to walk into it with a set of beliefs, with a set of hypotheses. That's what a hypothesis is, it's a belief. And so, yeah, obviously, but in Bayesian, that would not be that sort of contradictory or counterintuitive to… It's not necessarily a belief, and I think that word is difficult, especially when we try to talk about science, we don't like to talk about beliefs. It's that somebody, a lot of the time when we run this, it's that somebody else has already run a similar analysis. They have gotten a result that is well supported, and it's about allowing yourself to use that data in order to inform your data, instead of going in completely blind and just saying, I have an alignment, and it's this new set, this new fresh set of data that's never, ever been seen before. It's going in and saying, this is a set of data, somebody else has done very similar work over here, so I'm just going to give you a better starting point in what I think, or what other people's work has shown to be more likely to be correct. And an example of this is just models of evolution. Models of evolution are built upon lots of different testing about, we know what transitions occur more often, or sorry, what rates occur more often, or that frequency of the nucleotide is important. These are all just models that we put in in order to guide the data better, so Bayesian is just a larger extension of that, allowing you to put a lot of different things in to guide the data. Okay, anything to add, Leo, on that? Yeah, no, I think it's a very good point. In science, we already have models, we already know how things should look like, and so I think you can use this knowledge when you have new data. And if not, then you can do model selection. So you have two competing hypotheses, you have two competing models, and then you can compare them with the same data, or with repeated experiments, and then see what do they look like after seeing the data, so what models are better for some particular situations. A power I often see with Bayesian is that we often think when we work on phylogenetics, our end goal is to get the tree. But you can do a lot of Bayesian analysis, which include phylogenetics, but you're looking for something else, like a different part of the model. So an example of this is Tanja Stadler did a fantastic paper on the Ebola outbreak, where she used the tree that had been generated by somebody else already to look at the reproductive number, as in how many secondary cases do you get from N primary cases, and the incubation time and all this kind of stuff. And it's a very good paper that has no tree in it, but it's all Bayesian statistics still. Mm-hmm. So it allows you to explore your data in a completely different way, and not just have the tree as the end. The tree is not the result, the tree is part of the model, and you can put in, take out, examine, or set any different part of that model. So you can put in the mutation rate and all these other things, and then figure out when a sample occurred, as in trying to date things, or you can put in all those dates and find out a different part of the model. It just allows, everything is one large model, and then you have to decide what part you want to set, and what part you want to examine. Okay. Well, let's change tack a little bit for this first part, where we're just sort of introducing the topic. And so often when you're reading the literature, people talk about Bayesian, and then people maybe in the same breath talk about MCMC, Monte Carlo Markov Chains. And is there a difference? What exactly is going on there? So, yeah, I think that there has to be a clear distinction, because you can have a Bayesian model where you know the answer, you can calculate analytically the solution, and so you don't need to use the computer at all. You already know what's going to be the posterior probability. For instance, if you have conjugate priors, so if you have a conjugate prior, the posterior distribution will come from the same family as the prior distribution. But then, in practice, one of the powers of Bayesian statistics is that you can write very complex models. So what they call hierarchical models, where you describe how the priors are interconnected, how parts of your data connect to other parts of the data. And so, in order to, so you can write the model very easily, but then how do you calculate once you have the data, how to get an answer from that? So it's like a numerical optimization. So MCMC, it's like a numerical optimization, but for extracting the posterior information from the data. And so there's a lot of terms that we hear when you talk about the MCMC, so for instance, convergence, burning, so metropolis husting, Gibbs sampling, all these terms, they are related to the algorithm, to the optimization, let's say. And not related to the model itself. So they are basically independent problems, but especially in phylogenetics, that we have very complex models, it's very hard to do one without the other. By the way, recently, there's some, I think we mentioned that the phylo-seminars, and There's a recent discussion on the FILO seminars about what they call variational inference. So variational inference is an alternative to MCMC methods. So instead of, because MCMC, you try to sample values from the posterior. So you have your priors, you have your data, and then it tries to simulate samples from the posterior distribution. And variational inference, it tries to find the posterior distribution. It tries to find a distribution that is very similar to the posterior distribution, and it's easier to calculate. So we're just starting to see them in phylogenetics. I think there is one paper, one of the most recent FILO seminars, where they touched upon the topic of sampling trees, of looking at the trees in this context. But variational inference now, because of artificial intelligence, I think it's becoming quite popular, which is, let's say, a modern Bayesian framework. But still, for phylogenetics, MCMC is the workhorse. So I can give you a very quick overview of how MCMC works in a very basic manner. Okay, so let me give you a short, simple kind of explanation of how MCMC works in broadest terms. So we call it the Monte Carlo-Markov chain, and it's really a chain of individual cycles, kind of like a bike chain. And what we do is we start off with something which is our starting model. And this model will probably be a starting tree, which we either have given it, or we generate through parsimony, or distance, or any of the ones we talked about in the previous episode. So we have that along with the other aspects of the model, and these can be whatever other parameters we have. This could be mutation rate, it could be sampling times, it could be how does the population grow, and we can talk about some of these later. So we set these as our current tree. We then go into the cycle, and basically we say, we have a basket, and in that basket, we are collecting all of our samples, like pieces of paper. And each sample is all of the different aspects of the model. So this is our current one, so we add it to the basket. We want to do three cycles, so we continue on into the second cycle, because we've only on cycle one. Here we randomly propose a new tree and model, maybe we change some branch lengths, or maybe we change the mutation rate. We calculate the posterior for this proposal. If the posterior is better than the one we had before, then we accept this, it becomes our current tree, and then we add that tree to the basket. So the basket contains tree and model one, and tree and model two. Now we, do we continue? Yes, we ask for three cycles. So we continue into our next cycle, we randomly propose a new tree. Let's say this time we reject that tree. What we do then is, we keep the one we had in cycle two, and we add it to the basket. So now the basket has one copy of tree one, and two copies of tree two. And then now we've said we only wanted three cycles, so we exit out of our chain. We then gather all these up at the end, and we basically count up how often we see any parts of the different parameters. So here we saw one set for the first tree, and two times the second one. And we may continue to run this over and over and over, thousands upon thousands upon thousands of time. And eventually we'll start to converge, hopefully, to a single answer. And convergence means that every time we try to update and do a cycle, we reject the proposal, and we're adding the same tree and the same models over and over to the basket. So right at the end, we have 10,000 samples, let's say. We've gathered them all up, and we have what's called a burn-in. This means we remove some of the samples from the early cycles, while we're just finding our feet, because we had a starting tree, which maybe wasn't very good. We hadn't explored the data very well. So we remove them by taking maybe 10% off the top and throwing them away. And we then count along all of the rest of the chain to see how often we see a model, and we create distributions from that data for each of the different parameters. And these are our posterior distributions for the parameters that we can then go and explore. This sounds a lot like trying to organize a group of people to go to a restaurant, where you just start picking, do you want Thai food, do you want Chinese food, do you want, and then eventually you start to converge on, okay, we want pub, but which pub, okay. And then when you keep hearing the reply, the same reply of this place, this place, this place, then you go, okay, yeah, all right, we're done. I'm fed up, I've done enough cycles of this, I'm gonna pick this restaurant. And the difficulty with Bayesian is it takes about the same amount of time to do that as it does to gather those people into one location, as in several days. But essentially, it is that same process where you have the scattergun of options, and then you're slowly trying to converge on one particular range of solutions or one solution. So since now you have, as you explained it, if you look at neighbor states, they're probably gonna be very similar, because in some cases you might reject many of the proposals, or even if you propose a new state, you know, a new tree or a new branch lengths, they still are correlated with the previous choice, because otherwise you will reject all of them. So what you have is a correlated chain. So if you look at neighboring states, they're probably gonna be very similar. That's why you do what they call thinning. So with thinning, instead of looking at all the samples after the burning, you look at every, I don't know, 1,000 points, so you take one point, and then you exclude 999 points, and then you look at another one, because then these points that you chose, they are more likely to be uncorrelated. And that's why when you run this convergence tests, one of the things that you look at is the ESS, the effective sample size, because the effective sample size is the, for instance, you simulate it for 10,000 steps, but then from those 10,000 steps, how many of those are independent? If you exclude all of those that are similar, how those are independent? What's the effective sample size? And then the magic number is 100. There's no point asking the restaurant from the same person over and, or the same group of people over and over again. You wanna get a distribution. You wait, you wait until they discuss, yeah. Yeah, it really actually, that's a good example. When you move to a new city and you make like one friend, and then you just end up going to the restaurants that they like, you need to like independently go and test and see maybe they are the best, but let's go and make sure that I've tested a fuller range of that. And if you have very, very tight, strict priors on your data, as in you've said, the data of isolation of this can only be this hour, that can be either informative or uninformative, but it might mean that you may not explore the data in the same way. The restaurant that your friend picked may be the best restaurant, may be the most fantastic restaurant ever, but you don't know that unless you've explored all the other ones and found out that they were terrible. But then you'll know for the next person you can say, so a Bayesian way of thinking is when a new person then comes to Citi, you go to them and say, trust me, I've actually explored all of the data and I'm telling you that this one is the best. So that's how you become more of an informative prior as analysis go on and on and on. So Bayesian sounds perfect. And we've touched on a couple of these as we've gone through these mock examples, but what are really the negative points of this kind of approach? You've already pointed out a lot at runtime of this sort of analysis. It's complex. So I often say I taught in this course on molecular evolution in Woods Hole where Paul Lewis teaches it every year. And I've seen his lecture six times and every time I get something a little bit better. So you can even see it myself at Leo, I would count both of us as being experts in this. It can be difficult to convey that information. So it's a lot of experience in order to be able to run these kinds of things. So it was a lot of time invested in order to get how to do Bayesian, which can really, really, really pay off. I think it's fantastic. It's great work. But sometimes you don't need to invest all that. You just need to collaborate with people who know what they're doing. Yeah, yeah. Summarizing the information as well. Yeah, I've definitely had to know less about other things in my job so I could know more about this. Yeah, I think, so for me, what usually I'm always worried when I do an MCMC analysis, let's say, not only Bayesian, but is that when you, so if it's converging. So convergence, I think it's an issue, especially now that we have genomic, I don't know, tree of life data sets, thousands of species and the whole genomes. Did it actually converge? And then how do you check for that? So this is one of the problems, because even if you realize that it's not converging, you'll have some problem. For instance, if you run the thing three times and then the results are a bit different, so the mode might be the same. Your best tree might be the same, but then you're not interested only in the best tree, but you're interested in the distribution of tree. So this distribution is a bit different. So it means that it's not converging. Why it's not converging? Because you don't have enough data or because there's some problem with the MCMC, instead of running the thing for one week, you have to run it for one month. So I think this is the... And not only my experience, but what many people complain about Bayesian models is that it's a bit hard to interpret the output, to make sure that what you're seeing there, it's actually what you expect to see. I think a lot of bioinformatics especially has moved towards automation and speed. So we're definitely becoming less patient and the people that we work with tend to not be very patient. So there's no such thing as, let me just, I'll do a quick Bayesian analysis and I'll see if it's right. You can't do that. It takes time and effort to do this. This is the main part of the paper. This is not the intro to the rest of the data a lot of the time. So this is your four month experiment in the lab. This is our equivalent of that. So it takes a lot of effort to run and that can put a lot of people off. So sometimes you kind of have to just show them the benefits. But the more benefits you have, sometimes it is a lot more effort. I mean, if for the true biologists out there, this really is a mutagenesis experiment, isn't it? Where you have that clone and you are putting into that condition over and over again to make it converge. And then you're looking at what are the features that make it suited for that condition, isn't it? You have to be quite of an expert on your pathogen in that way, like I said earlier. You have to know, kind of know how long it's going to take to run. So I worked on HIV for my PhD and you could probably get it done in a short enough MCMC chain, you know, maybe 10,000, 20,000, but on tuberculosis, we have to run it normally for 40 million. And then at the end of it, you go, oh, that parameter probably wasn't specified correctly. I have to rerun it with all of these parameters. That's why these papers can take a long time. And you have to know what needs to go into the model at the start. So there's a lot of different aspects to the model, which I think we'll talk about in the next episode. But you have to test all these different parts and see which one is appropriate. Is it this kind of distribution or that kind of distribution? What do I set for, is it a Laplace or is it a, do I mean using gamma distributions? All this kind of stuff needs to be known beforehand and that can be a little bit more difficult. Yeah. And especially once after you run the thing for one week and then you say, oh, this is the tree. And then the guy says, well, this is the same as the parsimony tree, you know, so in the end, you're overcomplicating things. Yeah, that must that must feel real good when that comes up, but at least, you know, at least, you know, exactly. So you say, but then that becomes one line in the paper of like, oh, yeah, we did a Bayesian framework for this and it's the same. We validated this, you know, using a Bayesian framework. One week in the cluster. Yeah, exactly. So I'll talk about one. So this was a funny question that you often see on websites. So I picked up a website that talks about intro to Bayesian statistics while preparing for this, and it talked about the gambler's fallacy and how that would cause a problem if you approach the problem in a Bayesian way. And what the gambler's fallacy basically means is like, if you did 10 spin, if you're playing roulette, where you have everyone knows what roulette is, right? Right. You have a ball that randomly picks a number, randomly picks a number, and there are different states. If you don't know the podcast, read the Wikipedia entry and come back to us. But please gamble responsibly. But if you had if you did 10 spins on roulette and every time it came up black, then that would mean you should always bet on black or Wesley Snipes in a Bayesian world. And this is obviously we know that that the odd that this makes absolutely no sense. We know that the odds because we can see the game state. We know that this should the odds of coming up black should be slightly less than 50 percent. So but we would not reach that under a Bayesian approach. We would keep seeing this data and we keep finding it back. And we would just say, yeah, it's it's always going to be on black, always bet on black. Yeah. When I saw this, when I saw this discussion, I didn't understand that the problem because the if you're if you're a Bayesian or if you're a frequentist or if you're a likelihoodist and you write your model as being from independent samples and then so I don't know the roulette. So I talk about coin. So, you know, if you flip a coin and then you always have heads, I would say as a Bayesian or as a frequentist or as a likelihood is that that's not a fair coin. And I would say that the probability, you know, with our confidence intervals or credible intervals or whatever, you know, this this coin is is biased. So I think that anybody would would do that. So it's not a problem with the with being a Bayesian. The problem is that so a gambler's fallacy is only a fallacy when it's not true, when you assume that the events, because when you start the thing, you assume that the events are independent. And what's an independent event? It's each one of the of the coin tosses or in the case of the roulette, you know, you see that the ball, they are going to them to the black thing. And and the gambler, I don't know, I think they start seeing too many of that. They start seeing patterns. Yeah, they start seeing patterns. And then maybe he's going to think that, well, the next one is going to be black or he can think in the opposite sense. They can say, well, if if they are all black until now, not then the next one is going to be red. Yeah. And I mean, it's a very, very seductive idea, right? You keep seeing black over and over again. You go like the next one's got to be red. Yeah. Yeah. But you think that. Yeah. But I think this is so for me, I couldn't I couldn't see this as a problem. You know, of thinking Bayesian, because if you do a Bayesian, so maybe maybe they mentioned this because under a Bayesian model, if you if you update your if you update your posterior every time at every new draw, you update your posterior, for instance, during the 10 times or you first observe the 10 things and then you update your posterior at once using the sample of 10, the posterior should be the same. Maybe that's yeah, maybe that's why they say. But yes, yeah, I think it comes down to the independence, like Leo was saying, if you have the 10 spins that come directly after each other, they are actually influenced by each other because the ball has been dropped in where the other one stopped from. So if it stops with a black in front of the dealer every single time, then the position of the table is actually influenced by what happened in the run before. If you saw 10 times where you went in every third day and you randomly looked at the table or every five hours or with a different dealer, then maybe you could start to say that. But you have to make sure that actually those 10 samples were truly independent samples. That's the proper Bayesian way of doing it. It's not about, oh, I just look 10 times for five minutes and it's done. Yeah. Or a frequentist, as I said, this is not a this is, you know, it's a fallacy. So what does it mean that it's a fallacy? It means that it's wrong somewhere. And the problem is not of assuming that the model is this or that. There is no there's nothing special about the prior. You know, you can have a completely non-uniform prior, in which case, as I mentioned, Bayesian and frequentist, they become equivalent, they become the same. Yeah, I think this comes more down to humans are built to look for patterns and the computer is not. The mathematics is not built to look for the pattern. It's just there to detect the pattern if it's truly there. Yeah, I think that maybe the gambler starts changing the model as it goes. So it's right to know. No, maybe the model that I was thinking was wrong. Maybe this model is better. I don't know. But then, you know, the gambler might be a frequentist as well. We'd like to see patterns in boxes and then we can talk about it. So that gets on to a higher level question that I wanted to ask both of you. Where this is so this is just a simple. It's just a poorly constructed experiment, really, with the with the with the roulette example. So how does this apply, like in science? So when you're reviewing a paper, how would you assess for a robust analysis? And we've already touched on you'd look for are they making sure the the data points are independent of what's converging? If it's and what would be the other things you would be looking for in a publication to make to feel confident of their result? So the size of the methods for a maximum likelihood analysis and for Bayesian are vastly different. When I teach maximum likelihood, I often say here's we're going to talk three hours so that you can understand one line of a paper because it's just going to say we use maximum likelihood with the GTR plus G plus I with a bootstrap 100. And there's a lot to unpack in that Bayesian is actually almost the opposite. You should be setting out all of the different parts of the model that you put in, as in we set the prior to be this distribution between with the mean of this, et cetera, et cetera. So that can be redone and you should be why did you set it to that? Either you need to have tested the model by doing what's called a marginal likelihood analysis, which is where you run a Bayesian analysis, but you're only trying to estimate a certain parameter to see which and then you compare different models and their effect on the final answer. So you should either be running that to say this was the most appropriate prior for this parameter or you should be referencing another paper that did that, et cetera, et cetera. So that's I need to make sure that the analysis that you did was informed correctly so that you're not. allowing your model to either restrain it too much or to let it go really too far. So that's like the methods way that should be in there. I don't think I've ever seen a paper that ever actually described that. They just said we use this tool and gave a citation. Yeah. So it comes down to a similar thing to statistics is that vast majority of these papers are biologists who are going through it and that's great and they can do the rest of the paper, but it's not a Bayesian person. A Bayesian researcher who's going through saying you need all these because there's not a lot of us because it's a lot of work to really understand how these things work. It's like saying I use R and you know, so how do you reach this conclusion? I use it. No, no, no. It's a perfectly legitimate analytical tool that I did my work using an in-house custom Python script. Yes. That is the explanation of how I achieved this answer. Available on request. Reasonable request. What would be nice is you take the, if you're using something like Beast, it depends on the program, they have a certain file that will have all of the different bits of the model set out and you should put that on somewhere like Figshare for the reviewer and for other people to be able to run it. It's the same as having a standard operating procedure that you've put up online for people to do your experiment again. So if you want to be open and transparent, just put that up online so other people can look at it. Yeah. And for me, I think besides, you know, besides seeing this convergence and analysis, and then as Connor mentioned, to see if you explore the possible models, if you have a legitimate description of the models, I want to see the distribution of values because otherwise it's not Bayesian. I think there is no point, you know, in coming up with the Bayesian model or running some Bayesian analysis and then you just don't show what's the uncertainty on that. So it's like people say, you know, you always have to put the bootstrap values on the tree. I don't agree completely with that, but, you know, in this sense, I agree with the feeling. So you have to pay attention to the uncertainty and in Bayesian they are the front of the model. So you get a lot of papers which are like, oh, we dated when humans evolved out as a separate species and it was blah a million years ago. It's like in Bayesian, there is no blah a million years ago. It's that plus- minus. Yeah. So in Bayesian, you should always have plus-minus confidence around it. What's called the HPD, posterior density, highest posterior density. So are you both going to be happy if I just give you the date and the confidence interval or do you want more from me? Yeah, no, they are called credible intervals in Bayesian analysis. They are the same, but yeah. But don't tell them that we said it was the same. This is on the record. They know. They are going to get you in your sleep. Is this recorded? There is always a credible interval around that and that tends to be much larger than people think it is. Oh, yeah. With some of the dating stuff, it can be pretty ridiculous. Yeah. So for me, if we were able to even just get to that stage, I'd be much happier because we're just not even at that stage. We can go back in a few years and then say and maybe and, and, and, but you want to make sure that it converged and then you want to do some kind of testing to make sure that the model was correct and that the data was informative. Just like we talked about in the previous phylogenetics one, if you have an alignment that's all invariant and only a couple of sites are driving that tree structure, the same can happen with Bayesian. It can be very uninformative in the data. You may give it sampling dates, but they may not actually help in any way. So you want to make sure that the model is not just driving the answer because you'll get an answer. If you run MCMC for three cycles and you put it through all the different things, you'll get a tree and you'll get a distribution. It'll be a terrible distribution, but you will get an answer. So it's just about being confident in that answer. Okay, that sounds like I haven't been caught out on my latest manuscript. I've got everything in place. Just to finish up, when is Bayesian inappropriate? Inappropriate Bayesian inference. I would actually more talk about necessity. Do you need to really do all this work to build this? I often say to people, because I'm not a, there's always this camp, are you a frequentist or a cladist or a Bayesian? I'm whatever is required for the job. If you just want to know what the tree is that comes at the end, do a really good maximum likelihood. But if you want to do more than that and you want to explore your data and get more answers, then you should do Bayesian. So don't put all the work in just to get a tree, I don't think. So you're a pragmatist. Yeah, I got other things to do. Yeah, so I think my answer would be pretty similar. I would say that if the model starts to become too abstract for you to describe, because you have to defend this against reviewer number three. Oh yeah, reviewer number three. So if you don't feel very confident in the model, then you should go for a simpler model. I intuitively, I am always suspicious of a publication that just presents the world's most, regard Bayesian or not, the world's most complicated model and doesn't have a primitive sanity check. I like seeing papers where they talk about higher, you know, all these sort of population structure things. And then they just say, by the way, this clade is X snips away from this. And you're like, okay, I can, that makes sense. Yeah. You know, just something really simplistic, but just something to help you along. Like that bot that replaces artificial intelligence by logistic regression. So actually, I thought of something where it's not appropriate. If the data underneath it violates assumptions for certain models, it's not appropriate. So for example, a lot of people do population size analysis and you need to have very specific things that you tick off that tell you that you can actually apply this Bayesian model to this data. As in, it needs to be a single population that is randomly sampled from it underneath. So it's like, if you want to do Bayesian molecular epidemiology, you need to do good epidemiology or else it's not appropriate. Yeah. Oh, but that sounds like actual work. I know it's the worst, but that's how we get employed. All right. And on that bombshell, I think we'll draw this episode to a close. I'd like to thank both Leo and Connor for joining me today. Thanks very much. Thank you. Yep. And I'll see everyone else back on the next exciting episode of the MicroBinfy podcast. Thank you all so much for listening to us at home. If you like this podcast, please subscribe and like us on iTunes, Spotify, SoundCloud, or the platform of your choice. And if you don't like this podcast, please don't do anything. This podcast was recorded by the Microbial Bioinformatics Group and edited by Nick Waters. The opinions expressed here are our own and do not necessarily reflect the views of CDC or the Quadrant Institute.