Hi, and welcome to the MicroBINFI podcast. I am Nabeel, your host for today, and
I'm once again joined by our favorite arborist, Dr. Conor Meehan and Dr. Leo
Martins. Dr. Conor Meehan is a lecturer in molecular microbiology at the
University of Bradford. He specializes in whole genome sequencing and molecular
epidemiology of pathogens, particularly mycobacterium tuberculosis, and genome-
based bacterial taxonomy. And also with me is Dr. Leo Martins, who works with me
and is head of the phylogenomics at the Quadram Institute Bioscience. He enjoys
developing and implementing tree-based models and has a number of different
software such as BioMC Squared, Konomu, and TreeSignal. Previously he used to
work with viruses, eukaryotes, and so on, and has recently switched to working
with bacteria. And today we will delve into the dark arts of Bayesian inference
and as it applies to microbial bioinformatics that we know and love. Thank you
both for joining me again. Thank you. Thanks for inviting us. Yeah. Thanks for
having us. All right. So let's start. Let's get right into it. What is Bayesian?
Why do we need it? Bayesians are, it's a model. It's a class of models. They are
not a philosophy. They're not a cult or a religion, rational thinking. They're
just a class of models that are based on conditional probabilities. From a point
of phylogenetics, we can talk about the difference between maximum likelihood
and Bayesian, and that's really, I guess, what we'll focus on talking about
today. So Bayesian itself is a set of models that are able to be used for a
variety of different things. It is a mathematical framework inside which you can
undertake a lot of hypothesis testing. So if we talk about it from a
phylogenetics point of view, we have, we're trying to integrate across the
entirety of the data that we have, and this data will be the alignment, but also
a lot of other things, which we can talk about later. And here you're trying to
say, what is the probability of the model given the data? And that is actually
equal to the probability of the data given the model multiplied by the
probability of the model divided by the probability of the data. I think I got
that right. So it allows you to basically integrate and find out what your
phylogeny is, but also a lot of other things on top of that through the
incorporation of different data inputs, which we will call priors. And the
priors that we put on it are things like model of evolution, which we talked
about in the previous episode, the alignment, which is our underlying sequences,
and a lot of other data on top of that. So some examples of some data on top of
that is, when was this sample taken? Where was this sample taken? Do we know
that a sample is actually the ancestor of another sample? And other things like
this. So for me, when we talk about microbial bioinformatics and how it relates
to phylogenetic Bayesian, it's about integrating multiple different data sources
together in order to do some kind of inference. Yeah. And I would just like to
add that Bayesian models, they are particularly interesting since you mentioned
phylogenetic. They're particularly interesting to phylogenetics because one of
the parameters are trees. And trees, they are a parameter that we don't know the
average of trees. So you cannot calculate the average of trees or the derivative
of trees. You know, they're not like a double, they're not like a real number.
And with Bayesian models, you can write this level of complexity, assuming that
the trees are one of the variables in your model. So what comes at the end of a
maximum likelihood, let's say, basic analysis that you would do in RaxML is a
phylogeny. And what comes at the end of a Bayesian one is also a phylogeny, but
that can just be either a portion of the analysis that you want to do or
actually can even be the input to a Bayesian phylogenetic analysis in order to
do other things, especially in something like molecular epidemiology. Yeah. And
in Bayesian statistics in general, you are always interested in the uncertainty
in the distribution of variables. So depending on your data, you can have more
precise output or less precise output. I think I am particularly attracted to
Bayesian models because you have to think that the uncertainty and the
hypothesis testing is not an afterthought. You have this from the start, right?
Yes. And the statistical support for the answer is incorporated into the
process. So unlike maximum likelihood, where generally we will run it and get a
tree and then do something like bootstrapping in order to look at our support
for that tree, the Bayesian process itself incorporates the uncertainty and
gives you the statistical support for the answer at the end, all in one package.
Okay. And in terms of talking about ends, what is actually a posterior? To talk
about the posterior, you have to talk about the prior. So the prior is your
prior belief about the parameters. And this is in the absence of data. This is
before looking at any data or looking at your particular data. For instance,
what do you expect the trees to look like? So this is your prior. And then once
you have the data, you incorporate this data to have now the posterior
distribution of trees, the posterior distribution for your data. And in the same
way as in the other models, you would still have the likelihood there. But now
the likelihood is just one of the terms that have to be compared with your prior
to have the posterior. So I think in statistics in general, you always split the
reality into data, the things that you see that are known, and parameters,
things that can have a distribution or you can assume to know. And so in
Bayesian statistics, everything that is not data is a parameter. So you have,
for instance, if you have your normal distribution, you have the mean, but then
you don't assume that the mean is something that once you look at the data, you
know with infinite precision. It's something that comes from a prior, and then
after looking at the data, it has a posterior distribution. So you still have
this uncertainty in the parameters. So in Bayesian, there sometimes is an
assumption that you are an expert in the data that is underlying it. And from
that, you're able to give it correct priors in order to guide the analysis in
the right direction. So I'll give you a small example. So a lot of the work that
we do for microbial Bayesian phylogenetics would be something like I want to
estimate the mutation rate of the pathogen or the strain that I'm working on.
Here, let's say I wanted to work on Mycobacterium kanzasii, and maybe we don't
know the mutation rate of that. But we know the mutation rate of Mycobacterium
tuberculosis, it's about 10 to the minus seven around that. So I would put in a
prior that says for the mutation rate, which is the parameter, I think that the
value at the end is probably going to fall somewhere between 10 to the minus
five and 10 to the minus nine. It's not going to be 10 to the minus one or two,
it's not a virus. So I know as a bacteriologist that it's going to be around
here. So I put a prior with a distribution saying the mean is probably 10 to the
minus seven, but it could vary somewhere up to 10 to the minus four and down to
10 to the minus nine. So that's the prior that goes in on that parameter. I run
my data, et cetera, et cetera, through. And then at the end of it, I'm able to
look back at the distribution and that'll say, even though you told me that the
prior was probably around here, actually the data strongly pushed it towards 10
to the minus eight or 10 to the minus five or something. So that's the posterior
distribution that comes out the end. So I think that maybe helps see how these
three things link together. That sounds a lot like if I was, when I'm trying to
try and find a rent, a flat to rent, and when they have that search terms of,
you know, you go onto the website and you put in the range of the price you want
to pay for the flat and you say, Oh, I think it's about five to 700 pounds a
month, like, I don't know, whatever. And you look at it and then you look at all
the results and you go, these are terrible. I don't want to live here. Or these
are, these are, or there are no results or something. And then you change the
number and you keep updating till you sort of, sort of fix on your final, final
price of what you think it should actually be. And you'll type, and basically
you're just refining that range over, over and over again as you keep searching.
So you're refining it with data, exactly. With data, yeah, with the existing
prices of the properties available. So when I moved from Dublin and I had been
renting there for a long time and then I moved to West, sorry, to East Canada, I
put in around what I thought it was to rent an apartment in Dublin. Then found I
was basically going to be able to get a mansion, but then understood that the
income was lower there. So the new data was, this was the income. And then you
say, what can you get from that? So it's, as the data comes in from a lot of
different sources, what you can afford, what it's like to live there, what the
city is, that's all your prior information that you put into the search term.
And then, as you said, as you find the parameter is price, and then the
posterior is actually the distribution is vastly different to what you think it
is. Yeah, it's the same, but more formalized. More formalized, okay. As in, we
do those calculations in our head or through the computer, but this is like
actual just some formulas that say, how do we really put all that data together?
We're integrating it into a model in our own heads. Just people get scared when
they hear the word integration, but that is what we do. But so is the type of,
is this kind of process, you know, it's an intuitive thing that we're doing and
is this an inherently Bayesian way of approaching a problem? Yes and no. So I...
Or is Bayesian a subset of this kind of thinking? Yeah, I think we use Bayesian,
but maybe in a very similar way how we use non-Bayesian thinking, because, you
know, as you accumulate data and so the thing is that I think it's a bit
dangerous to assume that Bayesian is a way of thinking of seeing the world
because it's very tempting, right? So it's a belief system. So, you know, even
the names, they try to lure you into thinking that you can update. But yeah, you
can also be a frequentist or a likelihoodist, you know, that everything comes
from the likelihood and still reach the similar approach. I think in our mind,
the problem is that we do a lot of model selection. So we change the models all
the time in our minds. And this is something that, well, maybe Bayesian models,
they are a bit better at, you know, you can incorporate more complex models. So
maybe in this sense, we are more Bayesian than not. But still, there's a lot of
model selection going on in our mind. You know, you change your assumptions, not
all the time, but you keep. So science is a way, right, of changing the models
that you work with. And this is something that when we are doing the statistical
analysis, usually we forget about all the possible models, about how to restrict
to a small set of possible models. Hmm. So how would this, so in this example,
we've used the term frequentist a few times already. In the scope of finding a
flat, how would that proceed if you're a frequentist in terms of picking the
place? Would it be something like you take the distribution of all of the prices
and you just find the mode of it or the median value of it? Yeah, you look at
the distribution of all the prices and then because it's a multivariate
distribution and then you see that the price is in one dimension, but then you
see how many rooms you have and the location and everything like this as the
other ones. So we also do that as well, I think, like you make that calculation
like, yes, this flat, roughly around this price, I can get two bedrooms, roughly
around this price, I can get a studio, but it's a better location, but it's
closer to work, you know, and so that's a more frequentist way of approaching
the problem, just weighing up this. Yeah, I think so. But I have some
difficulties sometimes in discriminating. Of course, I'm Bayesian at heart
because Bayesian, it's easy to implement and you can, anyway, but there is a
school of thought that says that actually, if you have non-uniform priors for
everything, so if you could write a model in which you don't put any extra
information in your priors, then the Bayesian and the frequentist are
equivalent. And so the problem is that when you write a Bayesian model, when you
write models in general, it's very hard to have a completely flat, you know, to
pretend that you know nothing about any of the parameters. You always end up
having some prior distribution, some prior knowledge on the parameters, and
that's why they become different. By the way, this is called an objective prior.
So an objective prior is an attempt at making all the information come from the
data. So both of you were extolling that as a virtue of Bayesian, that you have
to walk into it with a set of beliefs, with a set of hypotheses. That's what a
hypothesis is, it's a belief. And so, yeah, obviously, but in Bayesian, that
would not be that sort of contradictory or counterintuitive to… It's not
necessarily a belief, and I think that word is difficult, especially when we try
to talk about science, we don't like to talk about beliefs. It's that somebody,
a lot of the time when we run this, it's that somebody else has already run a
similar analysis. They have gotten a result that is well supported, and it's
about allowing yourself to use that data in order to inform your data, instead
of going in completely blind and just saying, I have an alignment, and it's this
new set, this new fresh set of data that's never, ever been seen before. It's
going in and saying, this is a set of data, somebody else has done very similar
work over here, so I'm just going to give you a better starting point in what I
think, or what other people's work has shown to be more likely to be correct.
And an example of this is just models of evolution. Models of evolution are
built upon lots of different testing about, we know what transitions occur more
often, or sorry, what rates occur more often, or that frequency of the
nucleotide is important. These are all just models that we put in in order to
guide the data better, so Bayesian is just a larger extension of that, allowing
you to put a lot of different things in to guide the data. Okay, anything to
add, Leo, on that? Yeah, no, I think it's a very good point. In science, we
already have models, we already know how things should look like, and so I think
you can use this knowledge when you have new data. And if not, then you can do
model selection. So you have two competing hypotheses, you have two competing
models, and then you can compare them with the same data, or with repeated
experiments, and then see what do they look like after seeing the data, so what
models are better for some particular situations. A power I often see with
Bayesian is that we often think when we work on phylogenetics, our end goal is
to get the tree. But you can do a lot of Bayesian analysis, which include
phylogenetics, but you're looking for something else, like a different part of
the model. So an example of this is Tanja Stadler did a fantastic paper on the
Ebola outbreak, where she used the tree that had been generated by somebody else
already to look at the reproductive number, as in how many secondary cases do
you get from N primary cases, and the incubation time and all this kind of
stuff. And it's a very good paper that has no tree in it, but it's all Bayesian
statistics still. Mm-hmm. So it allows you to explore your data in a completely
different way, and not just have the tree as the end. The tree is not the
result, the tree is part of the model, and you can put in, take out, examine, or
set any different part of that model. So you can put in the mutation rate and
all these other things, and then figure out when a sample occurred, as in trying
to date things, or you can put in all those dates and find out a different part
of the model. It just allows, everything is one large model, and then you have
to decide what part you want to set, and what part you want to examine. Okay.
Well, let's change tack a little bit for this first part, where we're just sort
of introducing the topic. And so often when you're reading the literature,
people talk about Bayesian, and then people maybe in the same breath talk about
MCMC, Monte Carlo Markov Chains. And is there a difference? What exactly is
going on there? So, yeah, I think that there has to be a clear distinction,
because you can have a Bayesian model where you know the answer, you can
calculate analytically the solution, and so you don't need to use the computer
at all. You already know what's going to be the posterior probability. For
instance, if you have conjugate priors, so if you have a conjugate prior, the
posterior distribution will come from the same family as the prior distribution.
But then, in practice, one of the powers of Bayesian statistics is that you can
write very complex models. So what they call hierarchical models, where you
describe how the priors are interconnected, how parts of your data connect to
other parts of the data. And so, in order to, so you can write the model very
easily, but then how do you calculate once you have the data, how to get an
answer from that? So it's like a numerical optimization. So MCMC, it's like a
numerical optimization, but for extracting the posterior information from the
data. And so there's a lot of terms that we hear when you talk about the MCMC,
so for instance, convergence, burning, so metropolis husting, Gibbs sampling,
all these terms, they are related to the algorithm, to the optimization, let's
say. And not related to the model itself. So they are basically independent
problems, but especially in phylogenetics, that we have very complex models,
it's very hard to do one without the other. By the way, recently, there's some,
I think we mentioned that the phylo-seminars, and  There's a recent discussion
on the FILO seminars about what they call variational inference. So variational
inference is an alternative to MCMC methods. So instead of, because MCMC, you
try to sample values from the posterior. So you have your priors, you have your
data, and then it tries to simulate samples from the posterior distribution. And
variational inference, it tries to find the posterior distribution. It tries to
find a distribution that is very similar to the posterior distribution, and it's
easier to calculate. So we're just starting to see them in phylogenetics. I
think there is one paper, one of the most recent FILO seminars, where they
touched upon the topic of sampling trees, of looking at the trees in this
context. But variational inference now, because of artificial intelligence, I
think it's becoming quite popular, which is, let's say, a modern Bayesian
framework. But still, for phylogenetics, MCMC is the workhorse. So I can give
you a very quick overview of how MCMC works in a very basic manner. Okay, so let
me give you a short, simple kind of explanation of how MCMC works in broadest
terms. So we call it the Monte Carlo-Markov chain, and it's really a chain of
individual cycles, kind of like a bike chain. And what we do is we start off
with something which is our starting model. And this model will probably be a
starting tree, which we either have given it, or we generate through parsimony,
or distance, or any of the ones we talked about in the previous episode. So we
have that along with the other aspects of the model, and these can be whatever
other parameters we have. This could be mutation rate, it could be sampling
times, it could be how does the population grow, and we can talk about some of
these later. So we set these as our current tree. We then go into the cycle, and
basically we say, we have a basket, and in that basket, we are collecting all of
our samples, like pieces of paper. And each sample is all of the different
aspects of the model. So this is our current one, so we add it to the basket. We
want to do three cycles, so we continue on into the second cycle, because we've
only on cycle one. Here we randomly propose a new tree and model, maybe we
change some branch lengths, or maybe we change the mutation rate. We calculate
the posterior for this proposal. If the posterior is better than the one we had
before, then we accept this, it becomes our current tree, and then we add that
tree to the basket. So the basket contains tree and model one, and tree and
model two. Now we, do we continue? Yes, we ask for three cycles. So we continue
into our next cycle, we randomly propose a new tree. Let's say this time we
reject that tree. What we do then is, we keep the one we had in cycle two, and
we add it to the basket. So now the basket has one copy of tree one, and two
copies of tree two. And then now we've said we only wanted three cycles, so we
exit out of our chain. We then gather all these up at the end, and we basically
count up how often we see any parts of the different parameters. So here we saw
one set for the first tree, and two times the second one. And we may continue to
run this over and over and over, thousands upon thousands upon thousands of
time. And eventually we'll start to converge, hopefully, to a single answer. And
convergence means that every time we try to update and do a cycle, we reject the
proposal, and we're adding the same tree and the same models over and over to
the basket. So right at the end, we have 10,000 samples, let's say. We've
gathered them all up, and we have what's called a burn-in. This means we remove
some of the samples from the early cycles, while we're just finding our feet,
because we had a starting tree, which maybe wasn't very good. We hadn't explored
the data very well. So we remove them by taking maybe 10% off the top and
throwing them away. And we then count along all of the rest of the chain to see
how often we see a model, and we create distributions from that data for each of
the different parameters. And these are our posterior distributions for the
parameters that we can then go and explore. This sounds a lot like trying to
organize a group of people to go to a restaurant, where you just start picking,
do you want Thai food, do you want Chinese food, do you want, and then
eventually you start to converge on, okay, we want pub, but which pub, okay. And
then when you keep hearing the reply, the same reply of this place, this place,
this place, then you go, okay, yeah, all right, we're done. I'm fed up, I've
done enough cycles of this, I'm gonna pick this restaurant. And the difficulty
with Bayesian is it takes about the same amount of time to do that as it does to
gather those people into one location, as in several days. But essentially, it
is that same process where you have the scattergun of options, and then you're
slowly trying to converge on one particular range of solutions or one solution.
So since now you have, as you explained it, if you look at neighbor states,
they're probably gonna be very similar, because in some cases you might reject
many of the proposals, or even if you propose a new state, you know, a new tree
or a new branch lengths, they still are correlated with the previous choice,
because otherwise you will reject all of them. So what you have is a correlated
chain. So if you look at neighboring states, they're probably gonna be very
similar. That's why you do what they call thinning. So with thinning, instead of
looking at all the samples after the burning, you look at every, I don't know,
1,000 points, so you take one point, and then you exclude 999 points, and then
you look at another one, because then these points that you chose, they are more
likely to be uncorrelated. And that's why when you run this convergence tests,
one of the things that you look at is the ESS, the effective sample size,
because the effective sample size is the, for instance, you simulate it for
10,000 steps, but then from those 10,000 steps, how many of those are
independent? If you exclude all of those that are similar, how those are
independent? What's the effective sample size? And then the magic number is 100.
There's no point asking the restaurant from the same person over and, or the
same group of people over and over again. You wanna get a distribution. You
wait, you wait until they discuss, yeah. Yeah, it really actually, that's a good
example. When you move to a new city and you make like one friend, and then you
just end up going to the restaurants that they like, you need to like
independently go and test and see maybe they are the best, but let's go and make
sure that I've tested a fuller range of that. And if you have very, very tight,
strict priors on your data, as in you've said, the data of isolation of this can
only be this hour, that can be either informative or uninformative, but it might
mean that you may not explore the data in the same way. The restaurant that your
friend picked may be the best restaurant, may be the most fantastic restaurant
ever, but you don't know that unless you've explored all the other ones and
found out that they were terrible. But then you'll know for the next person you
can say, so a Bayesian way of thinking is when a new person then comes to Citi,
you go to them and say, trust me, I've actually explored all of the data and I'm
telling you that this one is the best. So that's how you become more of an
informative prior as analysis go on and on and on. So Bayesian sounds perfect.
And we've touched on a couple of these as we've gone through these mock
examples, but what are really the negative points of this kind of approach?
You've already pointed out a lot at runtime of this sort of analysis. It's
complex. So I often say I taught in this course on molecular evolution in Woods
Hole where Paul Lewis teaches it every year. And I've seen his lecture six times
and every time I get something a little bit better. So you can even see it
myself at Leo, I would count both of us as being experts in this. It can be
difficult to convey that information. So it's a lot of experience in order to be
able to run these kinds of things. So it was a lot of time invested in order to
get how to do Bayesian, which can really, really, really pay off. I think it's
fantastic. It's great work. But sometimes you don't need to invest all that. You
just need to collaborate with people who know what they're doing. Yeah, yeah.
Summarizing the information as well. Yeah, I've definitely had to know less
about other things in my job so I could know more about this. Yeah, I think, so
for me, what usually I'm always worried when I do an MCMC analysis, let's say,
not only Bayesian, but is that when you, so if it's converging. So convergence,
I think it's an issue, especially now that we have genomic, I don't know, tree
of life data sets, thousands of species and the whole genomes. Did it actually
converge? And then how do you check for that? So this is one of the problems,
because even if you realize that it's not converging, you'll have some problem.
For instance, if you run the thing three times and then the results are a bit
different, so the mode might be the same. Your best tree might be the same, but
then you're not interested only in the best tree, but you're interested in the
distribution of tree. So this distribution is a bit different. So it means that
it's not converging. Why it's not converging? Because you don't have enough data
or because there's some problem with the MCMC, instead of running the thing for
one week, you have to run it for one month. So I think this is the...  And not
only my experience, but what many people complain about Bayesian models is that
it's a bit hard to interpret the output, to make sure that what you're seeing
there, it's actually what you expect to see. I think a lot of bioinformatics
especially has moved towards automation and speed. So we're definitely becoming
less patient and the people that we work with tend to not be very patient. So
there's no such thing as, let me just, I'll do a quick Bayesian analysis and
I'll see if it's right. You can't do that. It takes time and effort to do this.
This is the main part of the paper. This is not the intro to the rest of the
data a lot of the time. So this is your four month experiment in the lab. This
is our equivalent of that. So it takes a lot of effort to run and that can put a
lot of people off. So sometimes you kind of have to just show them the benefits.
But the more benefits you have, sometimes it is a lot more effort. I mean, if
for the true biologists out there, this really is a mutagenesis experiment,
isn't it? Where you have that clone and you are putting into that condition over
and over again to make it converge. And then you're looking at what are the
features that make it suited for that condition, isn't it? You have to be quite
of an expert on your pathogen in that way, like I said earlier. You have to
know, kind of know how long it's going to take to run. So I worked on HIV for my
PhD and you could probably get it done in a short enough MCMC chain, you know,
maybe 10,000, 20,000, but on tuberculosis, we have to run it normally for 40
million. And then at the end of it, you go, oh, that parameter probably wasn't
specified correctly. I have to rerun it with all of these parameters. That's why
these papers can take a long time. And you have to know what needs to go into
the model at the start. So there's a lot of different aspects to the model,
which I think we'll talk about in the next episode. But you have to test all
these different parts and see which one is appropriate. Is it this kind of
distribution or that kind of distribution? What do I set for, is it a Laplace or
is it a, do I mean using gamma distributions? All this kind of stuff needs to be
known beforehand and that can be a little bit more difficult. Yeah. And
especially once after you run the thing for one week and then you say, oh, this
is the tree. And then the guy says, well, this is the same as the parsimony
tree, you know, so in the end, you're overcomplicating things. Yeah, that must
that must feel real good when that comes up, but at least, you know, at least,
you know, exactly. So you say, but then that becomes one line in the paper of
like, oh, yeah, we did a Bayesian framework for this and it's the same. We
validated this, you know, using a Bayesian framework. One week in the cluster.
Yeah, exactly. So I'll talk about one. So this was a funny question that you
often see on websites. So I picked up a website that talks about intro to
Bayesian statistics while preparing for this, and it talked about the gambler's
fallacy and how that would cause a problem if you approach the problem in a
Bayesian way. And what the gambler's fallacy basically means is like, if you did
10 spin, if you're playing roulette, where you have everyone knows what roulette
is, right? Right. You have a ball that randomly picks a number, randomly picks a
number, and there are different states. If you don't know the podcast, read the
Wikipedia entry and come back to us. But please gamble responsibly. But if you
had if you did 10 spins on roulette and every time it came up black, then that
would mean you should always bet on black or Wesley Snipes in a Bayesian world.
And this is obviously we know that that the odd that this makes absolutely no
sense. We know that the odds because we can see the game state. We know that
this should the odds of coming up black should be slightly less than 50 percent.
So but we would not reach that under a Bayesian approach. We would keep seeing
this data and we keep finding it back. And we would just say, yeah, it's it's
always going to be on black, always bet on black. Yeah. When I saw this, when I
saw this discussion, I didn't understand that the problem because the if you're
if you're a Bayesian or if you're a frequentist or if you're a likelihoodist and
you write your model as being from independent samples and then so I don't know
the roulette. So I talk about coin. So, you know, if you flip a coin and then
you always have heads, I would say as a Bayesian or as a frequentist or as a
likelihood is that that's not a fair coin. And I would say that the probability,
you know, with our confidence intervals or credible intervals or whatever, you
know, this this coin is is biased. So I think that anybody would would do that.
So it's not a problem with the with being a Bayesian. The problem is that so a
gambler's fallacy is only a fallacy when it's not true, when you assume that the
events, because when you start the thing, you assume that the events are
independent. And what's an independent event? It's each one of the of the coin
tosses or in the case of the roulette, you know, you see that the ball, they are
going to them to the black thing. And and the gambler, I don't know, I think
they start seeing too many of that. They start seeing patterns. Yeah, they start
seeing patterns. And then maybe he's going to think that, well, the next one is
going to be black or he can think in the opposite sense. They can say, well, if
if they are all black until now, not then the next one is going to be red. Yeah.
And I mean, it's a very, very seductive idea, right? You keep seeing black over
and over again. You go like the next one's got to be red. Yeah. Yeah. But you
think that. Yeah. But I think this is so for me, I couldn't I couldn't see this
as a problem. You know, of thinking Bayesian, because if you do a Bayesian, so
maybe maybe they mentioned this because under a Bayesian model, if you if you
update your if you update your posterior every time at every new draw, you
update your posterior, for instance, during the 10 times or you first observe
the 10 things and then you update your posterior at once using the sample of 10,
the posterior should be the same. Maybe that's yeah, maybe that's why they say.
But yes, yeah, I think it comes down to the independence, like Leo was saying,
if you have the 10 spins that come directly after each other, they are actually
influenced by each other because the ball has been dropped in where the other
one stopped from. So if it stops with a black in front of the dealer every
single time, then the position of the table is actually influenced by what
happened in the run before. If you saw 10 times where you went in every third
day and you randomly looked at the table or every five hours or with a different
dealer, then maybe you could start to say that. But you have to make sure that
actually those 10 samples were truly independent samples. That's the proper
Bayesian way of doing it. It's not about, oh, I just look 10 times for five
minutes and it's done. Yeah. Or a frequentist, as I said, this is not a this is,
you know, it's a fallacy. So what does it mean that it's a fallacy? It means
that it's wrong somewhere. And the problem is not of assuming that the model is
this or that. There is no there's nothing special about the prior. You know, you
can have a completely non-uniform prior, in which case, as I mentioned, Bayesian
and frequentist, they become equivalent, they become the same. Yeah, I think
this comes more down to humans are built to look for patterns and the computer
is not. The mathematics is not built to look for the pattern. It's just there to
detect the pattern if it's truly there. Yeah, I think that maybe the gambler
starts changing the model as it goes. So it's right to know. No, maybe the model
that I was thinking was wrong. Maybe this model is better. I don't know. But
then, you know, the gambler might be a frequentist as well. We'd like to see
patterns in boxes and then we can talk about it. So that gets on to a higher
level question that I wanted to ask both of you. Where this is so this is just a
simple. It's just a poorly constructed experiment, really, with the with the
with the roulette example. So how does this apply, like in science? So when
you're reviewing a paper, how would you assess for a robust analysis? And we've
already touched on you'd look for are they making sure the the data points are
independent of what's converging? If it's and what would be the other things you
would be looking for in a publication to make to feel confident of their result?
So the size of the methods for a maximum likelihood analysis and for Bayesian
are vastly different. When I teach maximum likelihood, I often say here's we're
going to talk three hours so that you can understand one line of a paper because
it's just going to say we use maximum likelihood with the GTR plus G plus I with
a bootstrap 100. And there's a lot to unpack in that Bayesian is actually almost
the opposite. You should be setting out all of the different parts of the model
that you put in, as in we set the prior to be this distribution between with the
mean of this, et cetera, et cetera. So that can be redone and you should be why
did you set it to that? Either you need to have tested the model by doing what's
called a marginal likelihood analysis, which is where you run a Bayesian
analysis, but you're only trying to estimate a certain parameter to see which
and then you compare different models and their effect on the final answer. So
you should either be running that to say this was the most appropriate prior for
this parameter or you should be referencing another paper that did that, et
cetera, et cetera. So that's I need to make sure that the analysis that you did
was informed correctly so that you're not.  allowing your model to either
restrain it too much or to let it go really too far. So that's like the methods
way that should be in there. I don't think I've ever seen a paper that ever
actually described that. They just said we use this tool and gave a citation.
Yeah. So it comes down to a similar thing to statistics is that vast majority of
these papers are biologists who are going through it and that's great and they
can do the rest of the paper, but it's not a Bayesian person. A Bayesian
researcher who's going through saying you need all these because there's not a
lot of us because it's a lot of work to really understand how these things work.
It's like saying I use R and you know, so how do you reach this conclusion? I
use it. No, no, no. It's a perfectly legitimate analytical tool that I did my
work using an in-house custom Python script. Yes. That is the explanation of how
I achieved this answer. Available on request. Reasonable request. What would be
nice is you take the, if you're using something like Beast, it depends on the
program, they have a certain file that will have all of the different bits of
the model set out and you should put that on somewhere like Figshare for the
reviewer and for other people to be able to run it. It's the same as having a
standard operating procedure that you've put up online for people to do your
experiment again. So if you want to be open and transparent, just put that up
online so other people can look at it. Yeah. And for me, I think besides, you
know, besides seeing this convergence and analysis, and then as Connor
mentioned, to see if you explore the possible models, if you have a legitimate
description of the models, I want to see the distribution of values because
otherwise it's not Bayesian. I think there is no point, you know, in coming up
with the Bayesian model or running some Bayesian analysis and then you just
don't show what's the uncertainty on that. So it's like people say, you know,
you always have to put the bootstrap values on the tree. I don't agree
completely with that, but, you know, in this sense, I agree with the feeling. So
you have to pay attention to the uncertainty and in Bayesian they are the front
of the model. So you get a lot of papers which are like, oh, we dated when
humans evolved out as a separate species and it was blah a million years ago.
It's like in Bayesian, there is no blah a million years ago. It's that plus-
minus. Yeah. So in Bayesian, you should always have plus-minus confidence around
it. What's called the HPD, posterior density, highest posterior density. So are
you both going to be happy if I just give you the date and the confidence
interval or do you want more from me? Yeah, no, they are called credible
intervals in Bayesian analysis. They are the same, but yeah. But don't tell them
that we said it was the same. This is on the record. They know. They are going
to get you in your sleep. Is this recorded? There is always a credible interval
around that and that tends to be much larger than people think it is. Oh, yeah.
With some of the dating stuff, it can be pretty ridiculous. Yeah. So for me, if
we were able to even just get to that stage, I'd be much happier because we're
just not even at that stage. We can go back in a few years and then say and
maybe and, and, and, but you want to make sure that it converged and then you
want to do some kind of testing to make sure that the model was correct and that
the data was informative. Just like we talked about in the previous
phylogenetics one, if you have an alignment that's all invariant and only a
couple of sites are driving that tree structure, the same can happen with
Bayesian. It can be very uninformative in the data. You may give it sampling
dates, but they may not actually help in any way. So you want to make sure that
the model is not just driving the answer because you'll get an answer. If you
run MCMC for three cycles and you put it through all the different things,
you'll get a tree and you'll get a distribution. It'll be a terrible
distribution, but you will get an answer. So it's just about being confident in
that answer. Okay, that sounds like I haven't been caught out on my latest
manuscript. I've got everything in place. Just to finish up, when is Bayesian
inappropriate? Inappropriate Bayesian inference. I would actually more talk
about necessity. Do you need to really do all this work to build this? I often
say to people, because I'm not a, there's always this camp, are you a
frequentist or a cladist or a Bayesian? I'm whatever is required for the job. If
you just want to know what the tree is that comes at the end, do a really good
maximum likelihood. But if you want to do more than that and you want to explore
your data and get more answers, then you should do Bayesian. So don't put all
the work in just to get a tree, I don't think. So you're a pragmatist. Yeah, I
got other things to do. Yeah, so I think my answer would be pretty similar. I
would say that if the model starts to become too abstract for you to describe,
because you have to defend this against reviewer number three. Oh yeah, reviewer
number three. So if you don't feel very confident in the model, then you should
go for a simpler model. I intuitively, I am always suspicious of a publication
that just presents the world's most, regard Bayesian or not, the world's most
complicated model and doesn't have a primitive sanity check. I like seeing
papers where they talk about higher, you know, all these sort of population
structure things. And then they just say, by the way, this clade is X snips away
from this. And you're like, okay, I can, that makes sense. Yeah. You know, just
something really simplistic, but just something to help you along. Like that bot
that replaces artificial intelligence by logistic regression. So actually, I
thought of something where it's not appropriate. If the data underneath it
violates assumptions for certain models, it's not appropriate. So for example, a
lot of people do population size analysis and you need to have very specific
things that you tick off that tell you that you can actually apply this Bayesian
model to this data. As in, it needs to be a single population that is randomly
sampled from it underneath. So it's like, if you want to do Bayesian molecular
epidemiology, you need to do good epidemiology or else it's not appropriate.
Yeah. Oh, but that sounds like actual work. I know it's the worst, but that's
how we get employed. All right. And on that bombshell, I think we'll draw this
episode to a close. I'd like to thank both Leo and Connor for joining me today.
Thanks very much. Thank you. Yep. And I'll see everyone else back on the next
exciting episode of the MicroBinfy podcast. Thank you all so much for listening
to us at home. If you like this podcast, please subscribe and like us on iTunes,
Spotify, SoundCloud, or the platform of your choice. And if you don't like this
podcast, please don't do anything. This podcast was recorded by the Microbial
Bioinformatics Group and edited by Nick Waters. The opinions expressed here are
our own and do not necessarily reflect the views of CDC or the Quadrant
Institute.