NHGRI's Oral History Collection: Interview with Bob Waterston and Jane Rogers
>> Music >> Christopher Donohue: Okay. To start, could you each tell me your name and your most recent position? >> Robert Waterston: I'm Bob Waterstone. I'm currently a professor of Genome Sciences at the University of Washington, School of Medicine.
>> Jane Rogers: And I'm Jane Rogers. I've now retired but my most recent position was working as a consultant with the International Wheat Genome Sequencing Consortium to deliver a sequence for hexaploid wheat. >> Christopher Donohue: Could you each talk about your respective roles in the Human Genome Project? What are you each most proud of and what do you consider to be, in hindsight, the most significant roles? >> Jane Rogers: I worked with John Sulston at the Sanger Center. John was director of the Sanger Center at
that time and we worked on building up the sequencing facility so that it was a large-scale facility. I took over responsibility for delivery of the Sanger contribution to the Human Genome Project, the generation of the data, that was then integrated with the other centers. And the Sanger, looking back at the history, I think played a number of significant roles in the actual project ranging from helping to develop the strategy for the whole project based on the maps and the clones. And I think we also played a significant role in keeping the project as an international consortium project. We took the
lead with Wash U, focus group at Wash U, in making data freely available as soon as possible after the sequence was generated. And we also made a very significant amount of contribution to the finished sequence. And I think it's probably -- I think all of those are really very important roles. For me, the contribution that we made to the finished sequence in generating a very high-quality product that has provided an excellent foundation for the human biologies since then is, you know, a major, major contribution. [talking simultaneously] >> Christopher Donohue: Go ahead, Bob.
>> Robert Waterston: Yeah. Well, as you know, I led the Genome Center at Wash U which along with the Sanger represented the two arms of an international consortium. And I think together, as Jane said, we worked to make the project truly international. And Washington played a leading role in several
areas. We'll get into them later, I think. And I did everything from lead the center to make the clone subcon libraries for a long time. I think most of those libraries I had a hand in. [laughter] Anyway, what am I most proud of? Obviously, our contribution to the sequence and the finished sequence, as Jane said, was really critical. And remarkably that was the basis for I think HG-37. I can't remember the exact acronym ahead of time but
37 is the version number. And it's still being used. It's the primary source for most people today. It's remarkable, 20 years gone and still -- it's going to be replaced very soon by the telomere-to-telomere. The telomere sequences really finished a whole genome sequence but it held up very well. But I
think the thing I'm most proud of is the -- our effort, our successful effort to get the sequence in the public domain quickly. And it -- the Bermuda rules came out I think from basically the history of the worm. John and I advocated strongly that the sequence had to get out there quickly and without patents.
And that was based on our very successful experience with the worm sequence. And I don't think people, other groups, certainly the human genetics community is -- that kind of thing is very foreign to them. And the Alberts Committee I think thought of sharing but they thought of sharing as sharing materials and sharing things between centers. They were not thinking of this very rapid public release of the information. And they didn't -- I don't think they talked about patenting at all. And with the Bayh-Dole Act and things like that, I think it was very important that the HGP took a stand on that. So, I think again, I mean, together with the
Sanger that was something that I'm very proud of and has that impact on other science projects today. >> Christopher Donohue: Now, I think the whole data release is one of the essential legacies of the Human Genome Project, which really continues to be impactful to this day. >> Robert Waterston: I agree. And I think you really can trace it right back to the worm. And we did it because the worm community was a community of sharing. There is the Worm
Breeder's Gazette that got published and we put in one-page abstracts of what we were doing on public stuff. There was an agreement basically that nobody would take advantage of that. And it was a good community and we shared. And John, when he started the map, made it an explicitly community-linked endeavor. And we continued that with the sequencing. >> Christopher Donohue: So, could you both give me a sense of the significance of the initial sequencing and analysis paper and do either of you recalls your initial impressions about the significance of sort of these efforts in 2000 and 2001? >> Robert Waterston: Well, I mean, there's the paper and there's the sequence. Those are both important items. I mean, and the sequence was
really spectacular. I mean, it was a draft sequence. It was crappy. We were missing lots of things. There were holes in it and everything, but it provided a view of the landscape, the whole landscape for the first time. And I can remember looking at sequence coming off the machines and popping it up on my computer and thinking, you know, "This is four billion years in the making and I'm the first person to ever look at it." [laughs] I couldn't make sense of it. It was just A, G, C, and T but there it was, I was looking at it.
And to be reading our genome and have it all there, I mean, well, not all but almost all there, was spectacular. And the paper gave us a -- you know, it was the first pass at trying to understand what's in there. It was -- I was awed by it in the sense of 65 pages and Nature took it all [laughs]. And, you know, we'll come to it maybe. You know, there were mistakes and so forth, but it was a serious effort at trying to understand what was there with what -- with all its limitations. >> Jane Rogers: I can remember also being so impressed at how many analyses could be done with the draft sequence. I mean, it really
was, you know, quite amazing to come up with what we did. And, as Bob said, to have the global view and also to have tools that began to allow you to access different parts of the genome, the genome sequence, and to be able to view it at different levels that are sort of, you know, on a whole genome scale down to the individual sequence data. >> Robert Waterston: Yeah. On top of it was already trying to give us access to the sequence in a convenient way. >> Jane Rogers: Yeah.
>> Christopher Donohue: Both of those are really, really interesting expositions of the significance and I think the sense of it as imperfect but still meaningful according to a number of registers, it's really well backed by posterity in many ways. It's also reflected in the materials that we have in the archives as well. >> Robert Waterston: But, you know, I went back for this interview and looked at the paper and I looked at the list of major conclusions. And they're not very impressive, actually [laughs]. We talked about GC rich islands and -- well, that was already known and we got the number of genes wrong [laughs]. And we had bacterial
contamination that we all transferred but at the time, it was really awesome. And I think it was -- I think it really was not the specifics but the -- but this idea of a global view of really knowing where we are in the world. >> Christopher Donohue: I think that's a really wonderful way of putting it. Jane, any further thoughts?
>> Jane Rogers: No. I think I we're -- you know, I agree with what Bob said. And I suppose the other thing that really had an impact was the fact that we were interested. Everybody had an interest in this potentially because we all have a genome.
We've all got multiple genomes but it's relevant to all of us as human beings. And I think this was -- probably, this is the first time that a genome sequence have had, you know, this much relevance to, you know, our curiosity about who we are and where we've come from. >> Christopher Donohue: So, my next question is around both of your involvement in the actual development and writing of the paper. I suppose we could start with, you know, your thoughts about how the paper took shape and your sense of who is decided to do what sections and your reminiscences about the discussion of the division of labor and the writing and the conceptualization of the piece. >> Jane Rogers: This one I think Bob was far more involved than I was. I was much more involved on the
generation of the data side of things. And at the Sanger, the people who were deeply involved in putting the genome together and certainly doing gene calling and developing the algorithms for that were the bioinformaticians. So, Tim Hubbard, Michele Clamp, Ewan Birney, and Richard were involved in the discussions as part of the analysis group. But Bob, you are a curator so --
>> Christopher Donohue: Yeah, you were a manuscript curator. >> Robert Waterston: But let's step back from the paper that -- you asked for how it took shape. There was controversy about what the paper should -- what kind of paper there should be. For the worm, we had done -- we had gotten the sequence out there. We've done a fairly brief analysis of things and then we enlisted others to write papers and we made the sequence available to them. And they wrote papers on
different aspects of things that they could find. And John was -- John Sulston was of the view that something like that should be done with the human and what's more, it was just a draft. And he wanted to get it over with, get it out there quickly. He wanted the September publication and have it be fairly brief and then let the community have at it. And that's the nature of it. That's how John viewed these things. And Eric, on the other hand, was like a kid in a candy store. He
wanted to get his hands on it. He wanted to see what he could see in it. And so, there was some back and forth about what kind of paper it would be. And John eventually conceded he could see that with Celera doing their thing and so forth we had to be there with a decent job of analysis. We had to know what
was in our product and we had to know if the product was good, good enough to do this kind of stuff. And so, with that, Eric took the lead. I think there's something famous -- there's some quip about John telling him them, "If you want to do it that way, Eric, you write it." [laughs] And I think that's right. I think I actually remember that conversation. And so, Eric organized what became known as the analysis working groups. And we had a phone call, I think, every week. And somebody would be tasked to look at, you know, the people that Jane mentioned from the Sanger, people from Wash U, people from all over the place. Eric invited people and I think
it was -- I don't think there was an open call for people, but people -- anybody who thought they could contribute, I think, was welcome. And then they were tasked with presenting a figure of what they wanted to talk about in a paragraph or two about what they had found. And so, Arian Smit talked about -- delved into the repeat sequences and people wrote Ewan was very involved in gene predictions. And we had -- we went through each of these topics, the G.C. rich regions. And each of these topics came up and they would get discussed. And Eric led the calls. He was
a vigorous critic. I think I had a good input on some of those things, too. If I couldn't understand it, the reader wasn't going to understand it. And so, this went back and forth for months with things getting refined. They would be presented once and then they get critiqued and then they -- all this figure looks crappy, you should do this, how about this kind of representation of your data instead of a bar graph, and things went on like that. And so -- and I don't know. In terms of how the specific tasks
I think it comes in maybe in one of your other questions but we'll deal with it here. Francis, I think was tasked with writing the introduction. I think that's right. I was tasked with writing the -- I don't know, the methods and summarizing just what the sequence was. And Eric took over -- took the analysis and I don't know what John's responsibility was. Do you remember, Jane? >> Jane Rogers: I don't. And I think it was probably -- >> Robert Waterston: Have you known what John's responsibility was? >> Jane Rogers: Probably overall editing I would think.
>> Robert Waterston: Yeah I mean, John is a very good writer. And anyway, so things were divvied up that way. And then Eric took all these different vignettes and stitched them together into a paper. And then I edited heavily what Eric wrote and Francis had a go at it too and so forth. And then Phil Green reviewed it and told us
how we should have done the paper [laughs]. >> Christopher Donohue: I think most reviewers are like that actually. >> Jane Rogers: And this was -- Bon Waterstone: Oh, no. >> Jane Rogers: 14 pages or something. >> Robert Waterston: I think it was 17 single-spaced pages. >> Jane Rogers: All right.
>> Robert Waterston: I don't know. It was a very long and detailed critique of -- and told us, you know, "On this analysis, you should have done this statistic instead of that." I mean, it was very, very detailed. >> Christopher Donohue: So, that kind of critique. I wonder if you still has a copy of it because that would be -- >> Robert Waterston: I probably do.
>> Christopher Donohue: Then actually that would be really valuable. I would -- if you don't -- >> Robert Waterston: You haven't heard of that before? >> Christopher Donohue: Not that explicitly. >> Robert Waterston: Or, you know, Phil had deliberately not partaken in the analysis and so forth because I don't know. He likes to work alone partly. He's not very good. He has a history with Eric that might have been part
of it. I don't know. But -- so, he was an objective reviewer and we were lucky to have him. He had very good suggestions for us. >> Christopher Donohue: Could you just, you know -- as Bob's been talking about the manuscript, could you just describe what it was like in general just working with Bob, John Sulston, Eric Lander, and Francis, and other leaders on the HGP during this time? >> Jane Rogers: The time was not -- I mean, I think the time that you really are referring to is the time when the project really became together and it accelerated. So, this was stimulated by Celera making their announcement, the HGP agreed on a strategy and how we would go forward. But at that point, the
time we had to deliver it was we had short time to deliver it and we had a number of major obstacles generating maps for all the chromosomes was one significant hurdle. Scaling up the sequencing was another significant hurdle applied by systems had come up with new sequences that would make the job easier and those were the sequences that Celera was going to use. But for those of us who were already sequencing, this presented a challenge in having to change over all of our technologies to adapt to the new sequencers.
And I have to say this because it was always my beef and the sequencers as they came did not work properly. So, there was always the challenge of actually having to get them working efficiently at the same time as working up everything else to match with them. So, you know, as far as what would you call it now, you'd call it a perfect storm now of, you know, lots and lots of pressure to get things working, deliver, and the sequence had to be very, you know, efficient. We had targets to meet. We met on phone calls once a week. And we had to report amongst the -- what we deliberately amongst ourselves called the G5s. So,
this is the five largest sequencing centers that we had to report on what our progress was. And you know what was it like? Well, it was obviously -- it was stimulating. It was challenging but at times we used to have these calls timed for Friday afternoons. And certainly, over the summer months in the Sanger on a Friday afternoon because we're very close to the -- what is now the Imperial War Museum in Duxford if they were air shows over the weekend, then we would have the sounds of Spitfires, Lancaster bombers, you know, the old World War 2 planes in the background of these calls.
And sometimes the negotiations with our collaborators in the United States, I have to say that the, you know, way that commanders during the Second World War must have felt at times when they were planning the advance across Europe certainly came into mind because it was a tricky situation at times that we had to handle and make sure that we're all pointing in the same direction and, you know sort of going on the same course. And there were, you know, times when it looked as though the course would deviate. And at that point, they had to -- you know, another reckoning, a lot of meetings, a lot of email exchanges. And then we work to
get back on course and go ahead again. But it was very, you know -- it was challenging to get it done. >> Christopher Donohue: And one issue that comes up pretty frequently is the quality assessments and making sure that the sequence is accurate. And you -- could either of you go
through a discussion about how there was all of this quality assessment going on and how that process went and how you decided essentially what the draft sequence standards were and how you would present that? >> Jane Rogers: Bob, you talk about this. It was a new metric every meeting. >> Robert Waterston: New metric, that's right. And certainly, there was game playing to make sure that whatever metric came up was one that favored what you were doing. And some of us are better at that than others [laughs]. And I don't know. It was important. We had different groups. And we were proceeding in different ways. We had to
come up with metrics. We had to come up with fair estimates of costs so that we really knew what things were doing. You know, the Sanger was blessedly outside of this and they were forced to listen to us hash through this stuff among the NHGRI folks in particular because NHGRI wanted to know if its grantees were performing. And so, we had to get, you know -- we had to do these endless reports. I don't know how much more sequence we could have done if we didn't have to do the reports. But maybe -- and so, we had to figure out what the base quality, you know, how much -- what would count toward your production. And I don't know. It was endless.
>> Jane Rogers: One of the final assessments in terms of the quality of the overall sequence that we were producing, we would assess how many contexts each of the BACs were assembled into once we put them together. And I think we -- I mean, ideally, you know, you'd aim for, I don't know, maybe up to 10 gaps in a shotgun sequence of a BAC that was generally a pretty good sequence. Some didn't go together that well at all and maybe you needed some more. But the -- another metric was the depth of the sequence. We were
aiming at full coverage on the draft sequences. Sometimes if a clone was small, we generate a bit more. If the clone was big, sometimes, you know, it got thinner. And again, ideally, you topped it up and put a bit more in there. But I think we measured all of those criteria. And then when we came to generate an assembly of the clones, so we looked at the overlaps in between clones, and that would also give us an idea of the quality of the sequence, you know, how well those went together. But it was
largely dependent on the sequence step. And I know John was very impressed with the idea that, you know, we have to count the cost all the time. So, we have to generate the same cost measures. And also, NHGRI was very interested in how we measured that. So, we had to report as well, Bob.
>> Robert Waterston: Oh no, I know. [laughter] >> Christopher Donohue: No -- thanks for the detailed assessment. That's really helpful because, in particular, some historians are very interested in how things like quality assessments were done and how various metrics were agreed upon, and what scientifically and community-wise went into those types of shared scientific standards at this phase in the Human Genome Project.
>> Robert Waterston: One of the advantages we had, I should get this in here I think, is that Phil Green had developed his base-calling algorithm and that produced a quality metric for each base. And that was automatic and objective. And that was enormously useful and valuable in these quality assessments. We could -- we
could rely on that. And it was used by -- it was used across all centers. If we'd all been doing different quality metrics, you know, it would have been a mess. Everybody would have been accused of fiddling the quality measures to boost up their numbers and so forth. The fact that we had this common and very reliable, very robust metric to use was very good.
>> Christopher Donohue: And some of those metrics like, for example, the error rate had been in common conversation for a number of years. So, that also, I think, probably helped as well. So, this was built on for a number of years before this huge scale-up finishing project. >> Robert Waterston: Well, you know, that was the value of doing things like the worm and yeast and so forth. So, you knew what the answers needed to be.
>> Christopher Donohue: So, one thing we haven't talked about at all are some of the companion papers. So, for both of you, what were the more significant papers that were published alongside the initial sequencing and analysis paper? And again, you know, your thoughts on how those pieces of the major issue came together and your thoughts on their importance? >> Jane Rogers: Well, I suppose that -- I would say that -- I mean, the other big companion paper that stands out is the SNP Consortium paper. >> Christopher Donohue: Sure.
>> Jane Rogers: It's reporting 1.4 million SNPs that have been mapped across the human genome. And the data were public. And the SNP Consortium had started up in, I think, the spring of 1999. And this was a consortium of pharmaceutical companies who had agreed that a public resource single database with this data in there would benefit everybody. And it was pretty competitive. So, Michael Morgan in Wellcome Trust did a good job in, you know, getting everybody together and organizing the project. And
the goal was to identify 300,000 SNPs across the genome. That was what the project aimed at. But by the time that the draft sequence was there, we not only were able to call SNPs on the restriction digest fragments that were the original focus of the project, but we could align that to the clone sequences and call SNPs there and call SNPs in the overlaps. So, the -- you know, putting that first map of variation across the human genome, I think that represented a huge achievement. And I know, you know, it's an -- it was an important, you know, companion to have there. I'm not quite
sure when it was decided that it would actually be able to -- you know, sort of the data were there to come to fruition at the right time, but probably the time was just about right. You know, the sequences were being generated in parallel with the clone sequences. So -- >> Christopher Donohue: Yeah. It was always the one of
the -- at least from Francis' outlines, there was always supposed to be a companion paper on SNPs. And that's fairly early. So -- >> Jane Rogers: Yeah. >> Christopher Donohue: -- the significance was realized immediately -- >> Jane Rogers: Yeah.
>> Christopher Donohue: -- in consortium. Well, go ahead, Bob. >> Robert Waterston: Well, I was going to say. It
formed the -- it was the impetus for the HapMap. >> Christopher Donohue: HapMap, yeah. >> Robert Waterston: And, you know, it really does start to reveal our evolutionary history. It -- it's valuable for human disease, but it's also -- it starts telling us about where we came from. And it was funny. The -- you know, the consortium, the pharmaceutical companies were in on the decisions about how all this should proceed. It was their money. And they wanted this restriction -- limited restriction digest limited set, because it didn't depend on us having a draft sequence. That
was its strength. But its weakness was that it didn't acknowledge that we were going to have a draft sequence. [laughter] And so, it was very inefficient. We had to do lots of sequence to get a few SNPs. And -- but finally, we were able to convince them as the draft was coming along that we could just do random sequences from this panel of individuals. And we could -- every read would reveal a new SNP. And so, suddenly, we
were -- when we were able to convince them that that was a viable strategy, we were able to discover lots of more SNPs very quickly. >> Christopher Donohue: And so, just one follow-up question I've always wondered is, who came up with the phrase, "golden path?" Was it Jim? Or was it somebody else? I know it's not the preferred name, but -- >> Robert Waterston: [laughs] We got -- that got squelched in the writing. I think it was Jim. >> Jane Rogers: I think it was Jim too. >> Robert Waterston: I think it was. >> Jane Rogers: Yeah.
>> Robert Waterston: Because he really did have to choose his way through all these clones. >> Christopher Donohue: I think another very early rejected name for it was "yellow brick road." So, it's -- [laughter] >> Jane Rogers: That was probably a bad day name [laughs]. >> Christopher Donohue: You know, there's a very interesting history of what things are named before there are some consensus of -- on what to name it. >> Robert Waterston: Yeah. >> Christopher Donohue: The HapMap being a great example of SIMMap and other things.
>> Robert Waterston: Yeah. >> Christopher Donohue: Shared Inheritance for Medicine map and all the other things. But one question that I wanted to ask both of you is to say that there's been a -- at least to me, there's been a discussion about, you know, whether -- in order to have a good sequence, you always need good maps. Is that -- is -- even with current technologies, is that the case for more recent technology? >> Robert Waterston: So, that -- I -- that was that -- I should have mentioned that when I was talking about the value of the map, because the map provided a way to get the finished sequence. The map had many fewer gaps in it than the sequence. And so, you could look at the -- where you had a clone, you could pick the next clone automatically, or almost automatically. I mean, it was
there for you. The information was there. And to get high-quality sequence, you need to have the territory refined enough to a small area. So, you're not dealing with all the complexities. You can target your efforts to the problems in that clone. And they're different. That's the -- that's why finishing was so hard to automate. It was because every
clone had its own problem. You know, they -- you could -- you did enough of them, you could recognize common problems, and you could start to devise semi-automated ways of addressing those, and then something new would come up. [laughter] But the clones let you do that. And I -- until recently, there has been no high quality genome that didn't -- wasn't a clone-based map-based genome. You know, the fly was -- the sequence was highly touted for being the first shotgun sequence, but they spent five years, I think it was, finishing it. On a clone to clone basis, they had to go back and build the map.
The mouse was the same way. We did a lot of clone-based sequencing to start with, but then we shifted to the whole genome shotgun. And I think it wasn't until, what, 2008 or '09, something like that --when the mouse was finally finished, whereas the shotgun was out right away. And so --
>> Jane Rogers: Or, no, they've given up. Because the mouse just finished when I was still at Sanger. So, it must have been 2005 to 2006 -- [talking simultaneously] >> Robert Waterston: And -- I don't know. Although then the high-quality genome stopped coming [laughs]. >> Jane Rogers: Yeah. >> Robert Waterston: Because Solexa-Illumina took over. And it was just too cheap to do whole genome, and nobody
finished. And that's -- it's interesting. That's going to change, because the long reads are basically a substitute for the BAC clones. You suddenly have a large segment of DNA that takes -- isolates itself from all the other problems in the genome, and you can focus in on that. And you can assemble that. And they've gotten -- they've actually -- with that kind of stuff, they've been able to do things we couldn't do with the BACs. They've actually gotten through with the centromeres. Yeah. Anyway, I think that's right. The
clones until now have been essential to high-quality genomes of any kind. >> Jane Rogers: The other thing that the clone sequence allows you to do is actually check it as a finished sequence. You can do an in silico digest and compare that with the actual restriction digest that you do as part of the set loading process and actually compare the two. So,
you can check and see that the sequence is assembled correctly with the clones. >> Robert Waterston: Yeah. And that's a step that we did on all our finished clones. >> Jane Rogers: Yeah, we did too. And yeah, it was very nice to produce a high-quality product that people could use. I'm going to -- so many of the other research communities who have had the -- had genomes of the organism of interest sequenced. Nothing -- you know, the cow and the pig,
and so on, you know, there were shotgun sequences. And a lot of the communities don't -- people in the communities don't realize that when they actually want to look at a biological problem that the sequence quality may not be good enough to actually allow them just to extricate what they need and not do any further work before they start. >> Robert Waterston: Well, or they are faced with doing that extra work -- >> Jane Rogers: Yes. >> Robert Waterston: -- if they recognize that they have to turn around in the work that we did on the Human Genome Project. >> Jane Rogers: Yeah.
>> Robert Waterston: There's a lot of differed cost in unfinished genomes. >> Christopher Donohue: Yeah. And it's a very interesting discussion when you have a -- you have different communities that have -- that are looking at sequence and have different sort of quality metrics for each -- you know, a sort of comparative grade sequence or human grade sequence and -- >> Robert Waterston: Yeah. [laughter] >> Christopher Donohue: -- how -- right -- and how -- and for -- a draft sequence may not be very useful if you want to look at evolutionary questions, but it may be useful for other -- for some other questions that another community could ask. >> Robert Waterston: And unfortunately, you know, the effort was really never made to fully automate finishing -- >> Robert Waterston: -- or automate clone maps and so forth. It was just swept away. And so, it's true that if you're working on a -- on an organism that only three other groups are doing, it's too expensive to do a finished sequence, or it has been too expensive. And it's just economics. You have
to -- you -- >> Jane Rogers: Yeah. >> Robert Waterston: You can't get the whole genome. You just have to put the effort into where you need it most.
>> Christopher Donohue: And it's always fascinating to look at the sequence finishing process having been described as artisanal. And other adage is it really is a very different kind of work and a very different kind of specialization. >> Robert Waterston: Yeah. And like I said, the problems are varied. And there's lots of different reasons why a sequence can fail and why assemblies can fail. Just lots of different ones.
[laughter] You think you've heard and seen them all, and then there are some others that comes up. [laughter] >> Christopher Donohue: Jane, my next question is -- to you first, is when the initial sequence paper was published, what sense did you have of kind of the work left to be done and with finishing the sequence? And what were kind of the more -- most pressing sort of issues to be addressed since, you know, 2001 -- going from 2001 to 2003? >> Jane Rogers: So, I think you have to -- I mean, the -- because publication of a paper comes sometime after you've actually, you know, said you're going to hold the data collection, because that's -- you know, you hold it, you then have your data for analysis, it was really at that stage when we finished generating draft sequence that we, you know, thought, "Okay, done that." Now, we have to get onto the job of generating the sequence for -- the finished sequence. So, by the time the paper was published, we had already been on to the -- you know, the finishing problems for six months.
But it's the same -- you know, the same feeling of, "Oh, now, we've got to go and sort things out." So, the problems that we were facing, first of all, completion of the maps, Bob -- Bob's group produced the whole genome map, but at Sanger we had individual chromosome maps, and we worked on finishing the maps on the basis of using the sequence. So, you use that to go in and probe for more clones to complete the maps as you go along. We have a shotgun sequence for a high proportion of the clones. We haven't actually stopped our finishing process through the whole of the generation of the draft sequence, much to the annoyance of Eric on more than one occasion on the D5 codes.
You know, Sanger were sequencing 2D, usually by sequencing more clones. But we had to go back and complete the shotgun sequences. And as part of the scale-up process, we thought we could store all the subclones and go back to those and use them for finishing. So, we invested in
three porta cabins. Does that make sense in American -- three porta cabins essentially of -- three support cabins for which we have to generate clone tracking system and informatics problem so that we could then go and find these things all stored at minus 20, the sort of size of fridge that we were told we'd normally use the frozen peas or something like that. So, we have three of these stacked outside of the Sanger Center. So, working out how to do all of that, how to retrieve the clones, then we were off shotgunning again, and then doing the hardcore finishing, and -- we have a lot to do. And in finishing, you have the problems with the actual technical problems, you know, dealing with the data, getting the -- everything done to get your sequence together. But if
you have -- as we did, several groups of finishers who spent their days in front of computers, we also started to problems -- they had to deal with problems with people having problems with repetitive strain injury and, you know -- >> Christopher Donohue: I see. >> Jane Rogers: -- sort of health type of things. So, you had to, you know, make sure that people, you know, have breaks, didn't work too long, and so on. So, it -- you know, all of these things sort of just added to, you know, what you had to deal with in terms of getting the sequence done. But it was six months before the actual publication of the paper. And, of course, when the paper was actually published, a lot of people thought that was it, that was a sequence. And to then say, "No, we're -- you know, sort of we're
-- we got another two years of working on the finishing," I think there were a lot of people saying, "Well, you know, what are you doing? You know, it's done." In the end, you know, I think, you know, we -- the proof came in the high quality. The -- you know, it stood the test of time. But -- yeah, keeping people motivated, it also became, you know, something of a challenge sometimes. But the -- you know, Bob, there were other -- have I covered the main issues? I mean, sorting out the maps, and then, you know, sort of filling the gaps and dealing with the tricky problems? Because some of the things that were challenging in the draft sequence were not only repeats within clones but also then large repetitive elements in the genome that -- and duplications that were collapsed. >> Robert Waterston: Yes. >> Jane Rogers: So, that took a lot of sorting out.
>> Robert Waterston: And Evan Eichler is still sorting them out [laughs]. >> Jane Rogers: Yes [laughs]. >> Christopher Donohue: These Eichler repeats… [laughter] >> Robert Waterston: That's right.
Well, I think you covered it pretty well. I mean, when -- that was certainly -- you know, we knew we had to finish this sequence. There was a lot of concern that having captured as much information as we had that the drive for finishing would be diminished. And we had to push people
upfront and -- when we agreed to do the draft that people were committed to doing the finishing. And so, like Jane said, I mean, you know, we'd done the draft shotgun. We stored all -- we stored all the subclones to go back and use them. We didn't have to resort to porta cabins. But we had a massive storage and access issues. And so, we did all that. I would say the other thing facing us was the mouse.
>> Jane Rogers: Yes. >> Robert Waterston: Because Celera had quit doing their shotgun -- and I can't remember 4X or 5X or something like that, because they were trying to augment it with the public data, and they figured they had enough between the two, they had turned all their shotgun capacity to the mouse earlier than they had planned. And they were -- I think by the time of the publication, they were already selling access to the mouse sequence. I don't know when that was formally available, but it was around that time that they were advertising anyway. They were promising it and recruiting buyers.
And I remember colleagues at WashU who worked on the mouse and -- saying, "Oh, you know, I feel like a trader. But I'm -- we're going to have to buy it." And again, this is a real threat to what -- just this whole ethos of biology, of science, I would say, not just to the genome, because if you can do it here, you can do it elsewhere. And so, we agreed that we had to make a significant and rapid stand here. And together, the Sanger and we and the [unintelligible] forged ahead at breakneck speed, basically returned the capacity over that we had developed in the shotgun. And Eric convinced us that on this case we did have to do the whole genome shotgun. And we went along with that, because anyway, they were -- I don't even know if they were good. I
don't know. I can't remember what the status of the libraries was even. But we weren't going to do another effort -- a crash effort. And so, we did the whole genome, shotgun, and did a remarkably good job on it, I think. But I -- we talked at the same meeting and something, and he came up to me afterwards and complimented me on the mouse sequence. Because I -- it was -- you know, Celera had
published -- by that time, they had published the sequence of one chromosome, because they were clearly not going to make any promises for availability for anything else. So, they published this one chromosome and made that data available. And ours was much better than that. It just was. We had the -- we already have that fair amount of finished sequence for the mouse or clone-based sequence anyway. >> Jane Rogers: I think we were generating that, because we had a map. >> Robert Waterston: Did you? >> Jane Rogers: We made the map, because -- >> Robert Waterston: Oh, no, you -- >> Jane Rogers: It was made comparatively.
>> Robert Waterston: That's right. We did that together, because we realized you could -- that's right. I have forgotten that. You could use the BAC-end sequences -- >> Jane Rogers: Right.
>> Robert Waterston: -- and their homology to human to put them together. >> Jane Rogers: Yeah. And -- >> Robert Waterston: Yeah. >> Jane Rogers: Yeah.
>> Robert Waterston: Yeah. So, we did -- I'm wrong about the clone map. That's right. But we didn't use it. We did the whole genome and -- but we had combined stuff and we had -- we did have the BAC ends and so forth. So, that was a big, immediate challenge to get that out there. And we did. We
got it out in a remarkable time. And this colleague at WashU came up to me a few months later chagrin that he'd spent the money. [laughter] I warned him that it was going to be -- you know, he wasn't -- it wasn't going to be a good money, or it wasn't going to be a money well spent, but he did it anyway. [laughter] >> Christopher Donohue: Now, I should clarify that. And I think it was maybe April of 2000 when there was the initial freeze for the draft -- or the data freeze that already there was very intense discussions about finishing the sequence. So -- >> Robert Waterston: Yeah. No -- it -- we knew we
had to do it. Yeah. Well, there were intense discussions about finishing. You know, it goes all the way back to our crazy proposal in 1994. We, John and I, put forward a proposal in '94 to basically do a draft sequence and follow up with finishing. We were going to do a better draft sequence,
but we were going to do it more deeply. So, we didn't have to go back and do more shotgun for the individual clones. >> Christopher Donohue: This is finishing the sequence five years early or something? >> Robert Waterston: Yeah. And one of the big criticisms that we had, Maynard Olson in particular said, you know, "If you did this, it would never get finished." So, the
idea of this tension between capturing the information quickly and not completely versus more expensively and with more effort. And I don't know, Maynard had this saying -- or I know a little story or whatever. He said -- you know, he was a -- Maynard -- this is -- Maynard also came in from chemistry, where things are on a much more solid footing than they are in biology. And his impression of biology was that, you know, when
he got into this, it was like quicksand. [laughter] You just kept getting more and more complexity and more and more exceptions. And to him, the genome was a rock at the -- underneath the quicksand that if you had the genome, you could only sink so far. And so, to him, it was important. It was vitally important that that
bedrock be solid and not full of fissures and cracking and falling apart and letting you sink further and further. And so, he was a very strong advocate of finishing. Yeah. >> Jane Rogers: Yeah. >> Christopher Donohue: So, just because we're a little over time -- I want to be careful with everybody's time -- the final question I'm going to ask both of you is just your concluding thoughts about the significance of the sequence and the paper for genome biology and, you know, genome science in general looking back. >> Jane Rogers: I think it was the first time that the -- it's not the general public. It's the general scientific community could see the value of having genome information. And that -- and, I mean, on the sort of minor level
you can say that in -- you know, in an instant PhD projects in human genetics were transformed, instead of spending three years to clone a gene and possibly sequence. You went to the database, and you start your PhD project with the sequence of the gene, you can then start to do biology. So, I think it's -- it was transformative, you know, with that sort of example.
>> Robert Waterston: I would agree. I mean, you know, the yeast and the worm sequence convinced those communities early on about the value of genomics. But for the wider community, it needed the human genome. And then what was
gratifying and surprising to me, I think, was how rapidly that created demand. You know, the zebrafish community had to have their genome sequence. The -- you know, Richard Gibbs had to do the cow for the Texas farmer -- ranchers.
[laughter] And all of a sudden, the revelations that -- the power of having a genome sequence was obvious to so many people. And it just created an immense demand, which has, you know, fueled -- >> Jane Rogers: [affirmative]. >> Robert Waterston: -- the field. And it drove Solexa. It created a market for them. And they could get investors. They could bring their product to the market and have buyers. I don't know, and I think it also let people see
the value of doing some things at scale. You know, biology is a cottage industry, for the most part. >> Jane Rogers: [affirmative]. >> Robert Waterston: This data-driven science is sort of foreign to biology before all this. And it became apparent how much more you could do with a data-driven approach. And so, I think biology, I don't think there was dichotomy that there was. When the Genome Project started, there was a lot of
antagonism towards the Genome Project, because it wasn't hypothesis driven. And today, I think people, I don't know, they've incorporated it. It's part of maverick. And so, you do both. You collect data at massive scales and it's -- well, you know, with Illumina technology, small labs can generate more data than they can analyze pretty quickly.
[laughter] And so, you do both, and you're able to collect huge amounts of data to drive your hypothesis. And I think it's been very good for biology. It's changed things. And I think that really did come from the obvious value of the human genome. >> Jane Rogers: Yeah. >> Robert Waterston: It just -- like Jane said, PhD students could all of a sudden zero in on what they needed to study. And -- I don't know, and all kinds of other things, the -- you know, like the HapMap followed, and now we have sequencing of ancient DNA. And
it's just the amazing thing to -- >> Jane Rogers: The sequencing of individuals to look for COVID susceptibilities, you know. >> Robert Waterston: Yeah. The amazing thing to me is how pervasive genomics became so rapidly. I just didn't -- I
had -- I guess I was too schooled by the opposition that we'd faced in the early days that there were -- that people would be won over so quickly and change the way things are doing -- things were done. It's been amazing. >> Jane Rogers: Yeah. >> Christopher Donohue: The -- yeah, the democratization of sequencing technology really has been remarkable, you know. Any kind of -- many, many labs
produce much more data than, you know, they can easily analyze -- >> Robert Waterston: Yeah. >> Jane Rogers: Yeah. >> Christopher Donohue: -- the current sequencing technology. >> Robert Waterston: Oh, and, you know, we're doing a lot of single cell analysis. And, you know, we throw a sequence at it, because the other part is more expensive. [laughter] So, we try -- we use sequence to extract all the information we can out of the data. And it's just so -- it's so available.
It's not as cheap as I want it, but it's really available [laughs].