Hello, and thank you for listening to the MicroBinfeed podcast. Here, we will be
discussing topics in microbial bioinformatics. We hope that we can give you some
insights, tips, and tricks along the way. There is so much information we all
know from working in the field, but nobody writes it down. There is no manual,
and it's assumed you'll pick it up. We hope to fill in a few of these gaps. My
co-hosts are Dr. Nabil Ali Khan and Dr. Andrew Page. I am Dr. Lee Katz. Both
Andrew and Nabil work in the Quadram Institute in Norwich, UK, where they work
on microbes in food and the impact on human health. I work at Centers for
Disease Control and Prevention, and am an adjunct member at the University of
Georgia in the U.S. Hello and welcome to the Microbial Bioinformatics podcast.
Lee, Andrew, and myself, Nabil, are your hosts for today, and today we are
joined by Professor Mark Palin, who is a Professor of Microbial Genomics at the
University of East Anglia and Research Group Leader at the Quadram Institute.
Today we're taking a trip down memory lane to reflect on some of the most
exciting bioinformatics moments in Mark's very long and very illustrious career.
So good day to you, Mark, good to have you on the podcast yet again. Thank you
for inviting me, and thank you for giving me a chance to go down memory lane and
reminisce. Let's start, let's see chronologically, start off at the beginning
and meander through time as we as we go along. So what was perhaps maybe the
earliest, most exciting bioinformatics moment for you? It's kind of ironic,
1977, as you probably know, was a great year, because that was the year which
Sanger, Fred Sanger actually described his method of sequencing, which took over
the world. But at that time, protein sequencing was far more prevalent than DNA
sequencing. And I applied to go to University of Cambridge. And interestingly,
when you look, I looked at my textbook that I used at school, biology textbook,
you could look in the in the index, and there was no mention of DNA at all. So
we weren't taught about DNA. And there was nothing, almost nothing about DNA
sequences at that time. But I had to do this Cambridge entrance exam, which was
a special thing that the University of Cambridge did, where they set you some
questions. And if you pass that, then they let you in a very low grade in your
normal school exams. And the irony was, I got this one of the questions in that
entrance exam was to give me a lot of peptide sequences, and say, assemble these
into a larger protein sequence. And I did it. And obviously, did it well enough,
because I passed the entrance exam and got in, but it was a kind of slightly
prophetic moment, hang on, I was asked way back then in 1977 to assemble some
protein sequences by hand in under exam conditions. So that was that that's my
first recollection of what we might now call bioinformatics at the time, it
didn't have a name, I don't suppose. And then I got into Cambridge, I remember
having conversations with some of fellow students about all the things that were
going on, Richard Dawkins had just released the selfish gene. So kind of new
ways of thinking about the genes being important in and kind of the importance
in evolution and all that was was coming to the fore. But there was very little
known at that stage, about how hardly any genes were even known, let alone how
much similarities there were between them and so forth. I remember in my final
year at Cambridge, one of the lecturers coming in and said, oh, we just
discovered these strange things called introns. And we don't know why they're
there and what they do. But it's a kind of interesting new finding. Oh, okay,
that's kind of weird. Actually, I was training as a as a medic. And so I then
went off and completed my medical training. I did my three years clinical, I did
my house jobs. But I was always interested in getting back into research. And in
fact, when I finished my house jobs, I went and did six months working on using
what were then very cutting edge techniques of polycolloid gel electrophoresis
and Western blotting. And I did I did some molecular biology, got paper out of
that work. But I decided, no, I've got to keep going with the clinical stuff.
And so I did that and I qualified as a medical microbiologist. And I got lucky
because although I was on working as a job in medical microbiologist, I applied
for a job at the Barts Hospital in London, the medical school at Barts Hospital,
and I didn't think I'd have a chance of getting it was a clinical lecturer
position because there was already a guy there who was like the internal
candidate was doing his MD thesis with the professor and so forth. But I just
turned up at the interview and and I thought, well, this interview practice and
I just wasn't at all nervous. And then they talked about molecular biology and
said, oh, yeah, I kind of grew up with this. I can do this stuff and fine. And
so I got offered the job and that was a bit of a shock. Wow. OK, but it turned
out to be one of the best things that happened to me because it turned out that
in that lab I was working in, it was headed up by a quite remarkable Iraqi
British lady called Syed Tabakchali. She had managed to get onto the idea that
molecular biology was the next happening thing. And she'd written a first grant
that got in a guy called Brendan Wren, who might be known to some of you to work
there. And he had then helped her write the second grant and the third grant and
so forth. So she put up a small team, even though she was a job being a medic
herself and wasn't didn't have a great grasp of molecular biology. But we had a
small team of people there working at what was at the time the cutting edge. So
Brendan Wren's job was to clone the toxin from C. Difficile, which turned out to
be far harder than anyone ever imagined, is that he didn't quite get that fully
done. But one of the other guys there, a guy called Chris Clayton, was working
on helicobacter pylori, which had only been discovered a few years previously.
And he'd been trying to clone the urease gene from helicobacter pylori. And
around that time, after I'd arrived there, I kind of managed to get in. With the
molecular biology guys, they kind of I could just about talk their language and
I kind of won their trust. And the other thing that was happening was that
computers were coming in in a big way, personal computers. I mean, when I've
been doing my as a medical student and just after that, I bought myself a BBC
microcomputer and I learnt a little bit of BBC basic. But then the Apple Mac
came in and we bought, I managed to persuade them to buy an Apple Mac that we
put into the lab. And I started mucking about with sequences and software. And I
think most of them were saying, what, that guy, he's all right, but he's a bit
old, isn't he? Spent all that time mucking about with the bloody computer. What
use is that? And then one day Chris came to me and he said, he explained that he
thought he might have cloned the urease genes from H. pylori, but it was all
hanging by a bit of a thread. So what he'd done was he'd got some rather crude
antibody preparation of being raised against a rather crude urease prep from H.
pylori. And he'd made a lambda expression library from H. pylori. And he used
his antibody to probe the plaques. And he said he got this very faint kind of
glow around one of the plaques. But but there were no there was no urease
activity associated with the clone that he subcultured from it. And so he wasn't
clear whether he was really traveling in the right direction or whether he had
the right thing cloned or whether it was all just a red herring. And I said to
him, well, have you got some sequences? Yeah, I've been sequencing it. And in
those days, this shows my age, you use sequencing gels, you use radioactivity.
It was a strange old business where people had to be very skilled in the art of
pouring gels without bubbles and making sure that they ran them correctly. In
fact, I remember one day they used to run the gels overnight. And one day one of
the guys in the lab came in and just caught a gel that caught fire in the early
morning before it actually went up in flames and set the whole lab on fire. So
it was a strange old time in those days. But he had all these sequences and he
said, I could have a look at them. And so what we did, and again, this is just
laughable looking back at what we have to do, was he would sit there with his
autoradiograph and he'd hold it up against the light. And I would have the
computer and then he would call out the bases as he read them from his
autoradiograph. And in that way, we digitized his autorads and got these
sequences from them. And I just managed to lay my hands on a program for the
Apple Mac. I think it was called Fast A or Fast P, I know it's also the name of
a format, but it was the name of a program at the time as well. And there was
this library called the PIR library, which is a protein database library. And I
said to him, well, I can search this library, but are there any other urease
sequences known? And he said, well, there is this plant, this Jack bean urease.
I said, that's a bit of a bit of a leap, isn't it, to think that we might find
any similarity between a bacterial urease and a plant urease. And at that time,
you know, we had so few sequences that comparisons across domains was wasn't
done very much anyway. But I said, well, we'll have a go. And I remember the day
it was the 15th of September, 1989, when I sat down in front of the Mac and took
his sequences and ran this translated, well, effectively a translated blast
search, translated homology search.  to see whether his sequences encoded
anything that looked like the urease from the Jack bean. And it was one of those
things where you were just knocked off the chair, because there was around 50%
amino acid identity coming up from the various sequences. And interestingly, it
was on one strand, but it was wavering from one reading frame to another. So you
get a great block of sequences from one reading frame, then just nothing, and
then another reading frame would light up. And so I went back to him and I said,
there is no question. You have cloned H. pylori urease genes. And not only that,
but they show this remarkable sequence homology to the plant urease. And this
kind of similarity between a bacterial protein and a plant protein, I think, is
unprecedented. And so that was one of my moments where there's a poem by Keats,
which goes on, Then felt I like some watcher of the skies when a new planet
swims into his ken, Or like stout Cortez when with eagle eyes he stared the
Pacific, And with all his men looked at each other with a wild surmise, Silent
upon a peak in Darien. And to me, that's one of those moments where you're on
that peak in Darien looking out at something completely new. And it was just
amazing to see this. But what I said, well, actually, you've got the sequences,
but there must be mistakes because they're coming in and out of frame. Let's go
back and look at the autorads and see if we can correct this. And so we went
back through each of the bits where it went out of frame. And he said, I said,
oh, you've got six A's here. He said, oh, no, there's only five. Oh, you've got
four G's here. Oh, it's five. And we managed to correct it. So we got everything
in frame. And in fact, that still stands true. Apart from there was one final
frame shift at the end. I think there was five amino acids we missed that were
out of frame at the end that we didn't sort out by this process. So we
immediately wrote that up. We tried to get into Nature, but the editor at Nature
didn't like it or didn't think it was exciting or whatever. But it did get
published in Nucleic Acid Research. And that was, for me, the beginning of a
voyage because it meant I convinced myself and my colleagues that mucking about
with sequences and playing around on your computer actually could make a
decisive, have a decisive influence on the outcome of research and actually
inform the way forward. For lab-based research, it wasn't just some kind of
weird kind of activity that you did in a corner when you couldn't be bothered to
do anything useful. So that was a great step forward. And within the same
context, we found other things as well. There were several other efforts that
came out of our work at the time that were similar. We found a group to intron
that included a reverse transcriptase in a gram positive. That was the first in
a gram positive. It wasn't quite the first in a bacterium. We also found a
cluster of genes in Clostridium difficile, which were homologues of the set of
genes in Clostridium acetobutyllicum that were responsible for butanol
fermentation. And I got very excited about that because I said, well, these are
the genes that gave rise to the state of Israel. Because a bit of a kind of
knight's move thinking there. But it turned out that Chaim Weizmann, who made
the case to the British government for the support to support Israel with the
so-called Balfour Declaration, made his reputation by being able to produce
butanol by fermentation during the First World War. And that made a decisive
contribution to the First World War, the British efforts in the First World War.
And it was using this particular genetic pathway in Clostridium acetobutyllicum
that he'd done that. So that was an example where I got perhaps more excited
than I should have done about what we found. But we did publish a paper on that.
So that was a very exciting time back then. You know, that little phrase, we
band of brothers, we happy few comes springs to mind as well, because it was a
great time to be doing science. It's interesting as you as you go through it,
some of the problems that you were talking about are some of the same ones I
still have to feel today, where someone will come to me with some sequence
they've generated in the lab, some, some mutant, they made under some condition
and then asking whether what other organisms have something similar. And
interesting you had to deal with homopolymer errors, probably a few decades
before people had to deal with it with pyro sequencing. Yeah, indeed. Yeah. So
that's always been the bane of any bioinformaticians existence is homopolymer,
polymeric tracks. So many of these problems go back a long time. In some ways,
there's nothing new under the sun. These things have the same problems have
cropped up repeatedly. I did have a question about how you actually did the
alignments considering like BLAST was published in 1990. It wasn't. So let me
correct myself that it wasn't BLAST it was a forerunner of BLAST I think it was
called FASTA the program back at the time, and it allowed you to just sequence
to do homology searches against FASTA formatted sequences. I suspect that was a
simpler, just local alignment. Yeah, and I'm sure it didn't. It wasn't as
elegant and as quick and as efficient but the database probably only had like
1000 sequences in it at the time so it didn't have to be massively elegant and
efficient. BLAST as you say was a few years later, Altschul et al. It's one of
the most highly cited papers, and later in my career I almost became a BLAST
symbiont or a SyBLAST symbiont if you like, in that I spent so much time, you
know, fused to a SyBLAST terminal, but we can go on to that in a moment. You
said that in 1977 you were just like handed sequences and you had to assemble
them by hand. Did, did you have anything more on that this is a bioinformatics
podcast so I wanted to get like a little bit into that. Did you have like a way
that you did it I wouldn't say an algorithm because you're obviously doing it by
hand but but how did you do it back then. Well, it's like one of those things
where you look back and think well I did that but how did I do that how did I
know how to do that I must have must have had an aptitude for these things,
because it seemed fairly straightforward to me that you just line them up and
look for the overlaps and where they overlap you join them together until you've
got the whole thing. The good thing about protein sequences is of course that
they're all effectively in one strand if you like. Whereas if you're doing DNA
sequences you have to reverse complement and it gets more complicated. But yeah,
it was one of those. It was just like doing a jigsaw, but doing it with
sequences instead and I guess I had some kind of affinity for that. For language
I suppose I've done Latin and French at school I taught myself the language
Esperanto. And so sort of playing around with, with strings of letters was
something that wasn't that unusual to me. I mean if you if anyone's ever done
Latin where you have to pause a Latin unseen, it's a similar kind of process of
working out what the parts are and how they fit together, it's not that
different. Yes, we shouldn't forget that most of our most fundamental algorithms
and bioinformatics are essentially derived from tech searching and linguistics.
Indeed, yeah I mean there are many parallels and jumping forward quite a long
way I did actually persuade one of my colleagues to use similar approaches to
the origin of species where they treat it like a series of genomes and look for
among all the editions look for the additions the deletions and the edits, and
they came up with an online Barry or the origin of species by thinking like a
biome petition and I did later in my career, I rubbed shoulders with a
theologian, who was using these kind of approaches to analyze gospel manuscripts
gospel Luke, looking at, you know, how do you align the text as you look for
changes, how do you identify work out whether it's an insertion in one or
deletion in the other all those kind of issues, and it's. There are many
parallels between these ways of thinking in in bioinformatics and in the digital
humanities and elsewhere. And one more question from from before also. You said
that your, your textbook wet back when you started there was like no mention of
DNA at all. I thought that was really startling so back then I mean, did you
ever get to rub shoulders also with with Franklin Roslyn or a prick or anybody.
Not back then. You know, it was something that British people at the time had
this warm glow of satisfaction that the these two guys working in Cambridge that
solved it, and I did read the double helix as well, Jim Watson's memoir of the
time, which is, I think, probably not very politically correct these days but
it's an entertaining read if you, if you allow for that. And Jim Watson, the MRC
gotten to come along I think in Cheltenham was the hundredth anniversary of the
MRC they had him talking there. And that was a very strange business because he
had just been saying all those stupid things that got him sacked  all those
advisory boards about black people not being the same as white people and
whatever. And the MRC had been very brave actually giving him a platform and one
felt the whole time he was talking it was like treading on eggshells because he
but I didn't speak to him but I took a photograph of him with my daughter and so
she's got that in her archive somewhere she wanted to meet him because she'd
heard of him. Yeah so how did you get into I suppose developing bigger grants
and you know taking your own research forward involved Maddox? So what happened
after when I was at BART I was doing this I was making the contribution to the
research the other thing that had come in at that time was from raised chain
reaction PCR and we we spent a lot of time playing around with PCR. In fact one
of those moments where I had a moment of inspiration which then fizzled out and
I looked back and think that was a missed opportunity I remember in the tea room
saying to someone look the way which we type bacteria at the moment is that we
cut them with these restriction enzymes and we get all these bands I wonder if
we could do something similar with PCR. If we could make PCR primers that had
the restriction enzyme sites at the end of the primers maybe we could replicate
those bands and he said yeah yeah and I said but actually we don't need to do
that we could just put any old sequencing so the primers could be completely
arbitrary because as long as we take the temperature down for the annealing low
enough something's going to get amplified and so why don't we try that why don't
we just do a PCR with a very very low annealing temperature on some bacteria
with any old primer and see what happens and we did it and it did work you took
different strains of E. coli and you could see different banding patterns I said
oh we've invented a new typing method and we started to write it up and we tried
to get it published in nucleic acid research and they said oh this is a bit
niche it's only for bacteria we're not really interested and then at the time I
wasn't that skilled in the lab so so my my colleague was doing a work I called
Ian Lamb he continued then he came back and said look I'm a bit worried because
I'm not getting the reproducible profiles so when I've run the experiment the
second time I'm getting a few extra bands and he said I don't think we should
press on with this and so I said well you're more of an expert than me we put it
to one side and then about 18 months later a paper was published in NAR on on
exactly the same technique they'd applied it to both plants and bacteria and so
I keep a gel from those experiments a picture of a gel from those experiments on
the wall in my office to remind myself not to let opportunities pass you by and
looking back probably what had happened is that because it's now called RAPD
that approach what had happened is that from one round to the next some of the
the um amplicons had hung around and there was contamination of subsequent runs
and that's why they weren't giving consistent results so if you run the if you
run the same experiment under perfect conditions you would get the same results
probably but and as I say later on people did adopt this RAPD approach and it
became very highly used so that's one of those missed opportunities but anyway
getting back to Andrew's point so I did an MD thesis at the time which is a kind
of thesis that that medics do in the UK and that was looking at using the PCR to
detect the diphtheria toxin gene and I realised that there was a kind of gap in
the repertoire that if we wanted to determine whether a strain of corynebacter
diphtheria was toxigenic or not the conventional approach is very laborious and
and required very complex agents that were not necessarily reproducible
antiserum and such and so I said well we could just do it by PCR and I did
manage to get it to work. I got the MD I then got into a senior lecturer
position where I became an honorary consultant and I thought well I can make
that can be a clinical academic I've kind of got some credentials here but then
people said well you know medics in the lab basically there was this idea that
all medics are jokers in the lab and the MD thesis was a kind of noddy PhD for
medics. It became a debased coinage in a sense. People said if you want to be
co-equal with your medical colleagues you have to do a PhD and around the same
time Gordon Duggan who I was through Brendan Wren had come up with the same idea
he basically said oh yeah medical microbiology in the UK as a profession is just
rubbish. I was like this is my whole profession that's a bit harsh but yeah
you've got a point and he so he managed to persuade the Wellcome Trust to launch
a scheme to allow medics to go and do a PhD, medical microbiologists to do a PhD
and I said oh how do I get onto this scheme? Where would I go? Could I do it in
Bath? He said no if you stay where you are that's not how fellowships work
you've got to go somewhere else. I said well where shall I go? He said well I'll
take you on and so I went and did a three-year PhD with Gordon Duggan. It was an
interesting experience because I went from being a self-important or at least an
important person or self-important person I was a consultant you know I was the
youngest consultant in my year at medical school and I like a lab idiot once I
started the PhD you know that they were just joking on this guy his gels are
rubbish and he can't make mutants and all this sort of stuff but it was three
years of hard graft and I actually in that time I did change completely into
becoming a jobbing molecular biologist and I also extended my bioinformatics
skills. I continued to play around with computers and interestingly at that time
the internet had fully arrived so the internet had been a kind of a niche thing
for many years but with the arrival of of the worldwide web and increasing use
of email it was coming into public consciousness and I started mucking around on
the internet. I built my own web pages and I actually wrote a series of articles
I persuaded the British Medical Journal to allow me to write a series of
articles introducing the medical profession to the internet and so I wrote these
these three articles about the internet. My daughter read them recently and she
just laughed at me she said oh well I suppose at the time this was all very
pioneering but some of the things you say it says here getting email is so much
more fun than getting snail mail because there's no such thing as junk email and
that's what we believed at the time we believe we're going into the promised
land everything would be wonderful in the internet was only a force for good of
course it's been very much a double-edged sword since then but anyway as a
result of that I got the PhD and then I was in a place to start writing grant
proposals I was kind of tooled up and I wrote I think I wrote eight proposals I
wrote four proposals to welcome trust none of which got funded and I wrote three
to the bbslc a fourth of bbslc three of which did get funded so I I kind of
started off with three project grants and that was the launch of of my career
and in one or relaunch if you like all right so let's take some questions I I
can't help myself do you remember when your opinion changed about spam no I
think that's more like the frog being boiled by the temperature going up slowly
isn't it the amount of spam slowly increased over the years until now we have
spam filters and all that sort of stuff yeah I'm just looking at google scholar
and you've hundreds of citations for these guide to the internet yeah yeah
exactly no I I look back and think well I yeah I did make a make a difference
the other thing I did while I was doing the PhD was obviously I went on
university challenge and captained a team on the university challenge which is a
UK version of college bowl for those in America yeah we run that so that was
that was a bit of fun as well along the side the other thing that happened and
this is I think the theme of what I'm saying to you is that nobody remembers all
those days in the office is the sort of thing that we say nobody remembers all
those days in the lab apart from the fact that there are a few days that you do
remember and there was one particular moment I remember when I was in Duke's lab
where a guy called Duncan Maskell who is friends with Ian Charles our director
he's now running the University of Melbourne in the US but he was on a
secondment in Duke's lab but he'd been to a conference in America and he
returned to this conference and we all saw him in the lab we all stood around
him with bated breath listening to what he was saying he said oh he just met
this guy called Craig Venter and and this other guy Ham Smith and they had just
sequenced a bacterial genome and and they they'd done one but then they'd done
Haemophilus influenzae but just to show they weren't a one-trick pony they'd
then done a second one a mycoplasma genome as well and and it was one of those
moments where wow they've actually done it and it was one of those things where
Craig Venter you know had a a mixed reputation in the field I guess slightly a
bit of a troublemaker but he certainly stirred the pot and you have to hand it
to him what was going on at the time was that there were efforts going on at the
Sanger and people were talking about oh yes we can sequence bacterial genomes
but it was going through this very laborious process of making this top-down
approach where they made a a very large library and then they subcultured bits
from the large library and then and then subculture bits and then sequenced all
that and put it all together in this very slow and laborious you had to have a
total genome map before you could even start doing sequencing and Craig Venter
and Ham Smith just upturned the uppercut because they just said no you just
sequence all this stuff uh in lots and lots of small fragments and you let the
bioinformatics do the heavy lifting of assembly  And that was one of those
moments again, wow, okay, that's incredible. You said shotgun approach. Yeah,
the shotgun approach. So using a shotgun approach rather than a top-down kind of
mapping approach. That was a breakthrough. Yeah, and oddly, we still use that
nomenclature even though no one does back-loadings anymore. Well, not for
genomes in that way. No, no, exactly. So that was very exciting, and I got
interested in that area. And Sanger very quickly decided that they were going to
do the same thing. So they were at the time sequencing mycobacterium
tuberculosis using the top-down approach, and they did complete that genome. But
the next one they did was Campylobacter jejuni, and I got involved in that
project. So what happened next was one of those strange serendipitous moments
where something unexpected happened. So because I'd been working with the
British Medical Journal on writing those articles, and they'd been well-
received, what followed was that they asked me to do a regular column in the
British Medical Journal on the internet. It was called Netlines, and I did that
for a few months. So there's one question that I don't know if I'll ever get the
answer to, but you mentioned that they said that you give all the data to the
bioinformaticians and they can sort it all out. And the question that I'll never
get the answer to, I think, is when was bioinformatics the term actually used
first? And I've heard before, like maybe 1997 when the journal renamed itself to
Bioinformatics, but I don't really know. So while I was doing the PhD with
Gordon Dugan, I remember Gordon Dugan using the term bioinformatics and said,
oh, you're interested in bioinformatics. This is a growth area. And that was the
first time I'd heard the term bioinformatics. And that was in the mid-90s, 95,
96. I'd been calling it sequence analysis up till then, but he used the term
bioinformatics. And that's when the term came into my consciousness, and
obviously became much more widely used as the field grew. Now, what happened
next was one of those strange moments of serendipity, where I'd written those
articles to the BMJ, the British Medical Journal. I did this regular column
called Netlines, discussing innovations in the use of the Internet in medicine,
new websites and interesting new developments. And after I'd been doing that for
a few months, they said, oh, we have this student British Medical Journal as
well, maybe you could do a column for that. And I said, yeah, right. They said,
the thing is, there's this other guy that also wants to do that. And would you
be happy to work with him? And I said, I suppose so. I mean, I don't need to
have this other guy. I could do it myself. And who is it? And they said, oh,
it's a guy called Nick Loman, and you might want to meet him. So I had a chat
with him. And I said, it turned out he was at school. He was in his final year
at school. He was applying to get into medical school. And he had, from a very
young age, he taught himself the program. And so we started writing this column
in a student BMJ together, which I think we called Net Files with a PH. And I
said to him, why don't we run a little workshop to introduce people to the
Internet for medics, to show them how to use the Internet, how to use the web,
how to send emails and do stuff. And he said, yeah, right. And so we got
together. And it was one of those experiences where you forget how much you've
already picked up, because we had people turning up to that workshop who didn't
know how to use a mouse, a computer mouse. You know, they were just looking at
it. What's this? How does it work? And all that. And so we ran that and it was a
qualified success. I should point out that we have had workshops recently where
you kind of still run into the same stumbling blocks. It's the same. It's the
same issues that recur year on year. You think you're running a workshop with a
basic expectation of what people know and they and they fail you. They can be
less skilled than you could ever imagine. Anyway, as a result of that, I
thought, well, Nick, you're in this kind of programming space. And I'm still
working with Sarah Tabatchley at the time. My mentor. And I said, look, there's
this bright guy here. He's going to do a gap year. Maybe we can employ him in
that gap year to do some kind of programming for us. And she said, yeah, right.
So I took Nick on. And around the same time, the Sanger had started sequencing
compiler back to judge and I. And the Sanger had said what they were going to do
was sequence the whole genome. And only when they finished it and got all of the
reads that they needed, would they then assemble it. And then once it assembled
it, then they would annotate it one gene at a time. And then when they got to
the end of that process, they would start creating vignettes, trying to draw out
any kind of biological lessons from the genome. And through Brendan Renn, I was
involved with the compiler back to the research community. And he said, well,
the research community want to get access to this data straight away. And they
don't want to wait for 18 months or two years or however long it's going to take
for the whole thing to be finished. Around the same time, there had been this
thing called the Bermuda Accords, where publicly funded or any sequencing
project funded by public funds or charity funds. There was an obligation to
release the data straight away. So the Sanger were releasing all their shotgun
data, but they just weren't analysing it. And so what I said to Nick Langman is
what we need to do is basically assemble that. We'll take those reads and then
blast them against the E. coli genome, which was in a fairly good state, against
the proteins in the E. coli genome. And then see if we can assemble them into
the various protein metabolic pathways. So people can see Krebs cycle or
glycolysis or whatever and see that those things are present in the genome. And
they can do their own searches against the proteins as we go along. So if you
could make a kind of genome browser. And he said, yeah, yeah, I can do that. I
can make a database, a relational database, and then I can build a web front
end. So I said, can you go away and do that? We were going to employ him for the
best part of a year. I thought it might take him three months or six months. I
think it took him about three days to do it. And it was just amazing. And it was
at that stage that I had started thinking, well, maybe I should become a biome
petition. I'd spoken to Bart Burrell, who was one of Sanger's kind of disciples
at the Sanger, who was involved running this genome sequencing. He said, oh, if
you're interested in this, you should learn Perl. And I started trying to learn
a bit of Perl. And I was OK. But then when I saw Nick Langman at the age of 18
come along and do this work in three days, I would have taken three months, I
thought. I just thought I can't compete with you. You know, I'll just let him do
it. What happened then was that we managed to do all these analyses. People were
very grateful. I did some of my own analyses on stuff in the genome, and I
managed to get myself onto the genome paper, which appeared in Nature. I'm
looking at your Nature paper with Campylobacter. And so that figure one, I guess
I'll put that in the show notes later for everyone to hear. That figure one, it
looks like a modern figure. So the Sanger had their own bioinformaticians, and
the Sanger did a perfectly good job of analysing the genome. It's just that they
had this particular way of working, which was that we don't start one stage
until the previous stage is completed. So we're not going to start analysing
this data and interrogating it and basically iteratively looking at the
biological significance of it and refining hypotheses and whatever, until we get
to the end of not just the shotgun, not just the finishing, but the end of the
annotation process. And the thing that the back in those days, the annotation
was done manually. So an individual looked every protein coding gene in the
genome, looked at the homology search results for that protein coding gene, made
an annotation, made a call to say this is an ortholog of that, it can be called
that, or this is distant related to something we can say it's a sugar transport
protein, but we can't say what. They made all those kind of calls manually.
There was none of this business of just running it through a pipeline in five
minutes. It took months and months to actually do those annotations. And so I
got in because we could actually help people make sense of the data in advance
and so some of the things that we got written up in the nature paper. People had
been working on them even in advance of the Sanger saying it's now finished and
we're ready to do the next part. They'd already been like we've been laying the
kind of groundwork for it. On that cliffhanger, I want to thank our guest Mark,
Professor Mark Palin for joining us today we've been talking about some of the
bioinformatics highlights from his career gleaning some advice and thoughts for
the rest of us. We've gone from the start of his career to the dawn of the of
the millennium, with exciting events like next generation sequencing on the
horizon so we're going to cut back to that in the next exciting episode of the
MicroBinfy podcast. See you then. Thank you so much for listening to us at home.
If you like this podcast, please subscribe and rate us on iTunes, Spotify,
SoundCloud, or the platform of your choice. Follow us on Twitter at MicroBinfy.
And if you don't like this podcast, please don't do anything. This podcast was
recorded by the Microbial Bioinformatics Group. The opinions expressed here are
our own and do not necessarily reflect the views of CDC or the Quadram
Institute.