NHGRI s Oral History Collection Interview with Bob Waterston and Jane Rogers

Show video

>> Music >> Christopher Donohue: Okay. To start, could you each  tell me your name and your most recent position? >> Robert Waterston: I'm Bob Waterstone. I'm  currently a professor of Genome  Sciences at the University of Washington, School of Medicine.

>> Jane Rogers: And I'm Jane Rogers. I've now  retired but my most recent position was working as a  consultant with the International Wheat Genome  Sequencing Consortium to deliver a sequence for hexaploid wheat. >> Christopher Donohue: Could you each talk about your  respective roles in the Human Genome Project? What are you  each most proud of and what do you consider to be, in  hindsight, the most significant roles? >> Jane Rogers: I worked with John Sulston at  the Sanger Center. John was director of the Sanger Center at 

that time and we worked on building up the sequencing  facility so that it was a large-scale facility. I took  over responsibility for delivery of the Sanger contribution to  the Human Genome Project, the generation of the data, that was  then integrated with the other centers. And the Sanger,  looking back at the history, I think played a number of  significant roles in the actual project ranging from helping to  develop the strategy for the whole project based on the maps  and the clones. And I think we also played a significant role in keeping the  project as an international consortium project. We took the 

lead with Wash U, focus group at Wash U, in making data freely  available as soon as possible after the sequence was  generated. And we also made a very significant amount of  contribution to the finished sequence. And I think it's  probably -- I think all of those are really very important roles.  For me, the contribution that we made to the finished sequence  in generating a very high-quality product that has  provided an excellent foundation for the human biologies since  then is, you know, a major, major contribution. [talking simultaneously] >> Christopher Donohue: Go ahead, Bob.

>> Robert Waterston: Yeah. Well, as you know, I led  the Genome Center at Wash U which along with the Sanger  represented the two arms of an international consortium. And I  think together, as Jane said, we worked to make the project truly  international. And Washington played a leading role in several 

areas. We'll get into them later, I think. And I did  everything from lead the center to make the clone subcon  libraries for a long time. I think most of those libraries I  had a hand in. [laughter] Anyway, what am I most proud of? Obviously, our contribution to  the sequence and the finished sequence, as Jane said, was  really critical. And remarkably that was the basis for I think  HG-37. I can't remember the exact acronym ahead of time but 

37 is the version number. And it's still being used. It's the  primary source for most people today. It's remarkable, 20  years gone and still -- it's going to be replaced very soon  by the telomere-to-telomere. The telomere sequences really  finished a whole genome sequence but it held up very well. But I 

think the thing I'm most proud of is the -- our effort, our  successful effort to get the sequence in the public   domain quickly. And it -- the Bermuda rules came out I think from basically the  history of the worm. John and I advocated strongly that the  sequence had to get out there quickly and without patents. 

And that was based on our very successful experience with the  worm sequence. And I don't think people,   other groups, certainly the human genetics  community is -- that kind of thing is very foreign to them. And the Alberts Committee I think thought of sharing   but they thought of sharing as  sharing materials and sharing things between centers. They  were not thinking of this very rapid public release of the  information. And they didn't -- I don't think they talked about  patenting at all. And with the Bayh-Dole Act and things like  that, I think it was very important that the HGP took a  stand on that. So, I think again, I mean, together with the 

Sanger that was something that I'm very proud of and has that  impact on other science projects today. >> Christopher Donohue: Now, I think the whole data  release is one of the essential legacies of the Human Genome  Project, which really continues to be impactful to this day. >> Robert Waterston: I agree. And I think you really  can trace it right back to the worm. And we did it because the  worm community was a community of sharing. There is the Worm 

Breeder's Gazette that got published and we put in one-page  abstracts of what we were doing on public stuff. There was an  agreement basically that nobody would take advantage of that.  And it was a good community and we shared. And John, when he  started the map, made it an explicitly community-linked  endeavor. And we continued that with the sequencing. >> Christopher Donohue:  So, could you both give me a sense of the significance of the  initial sequencing and analysis paper and do either of you  recalls your initial impressions about the significance of sort  of these efforts in 2000 and 2001? >> Robert Waterston: Well,   I mean, there's the paper and there's the sequence. Those  are both important items. I mean, and the sequence was 

really spectacular. I mean, it was a draft sequence. It was  crappy. We were missing lots of things. There were holes in it  and everything, but it provided a view of the landscape, the  whole landscape for the first time. And I can remember  looking at sequence coming off the machines and popping it up  on my computer and thinking, you know, "This is four billion  years in the making and I'm the first person to ever look at  it." [laughs] I couldn't make sense of it. It was just A, G,  C, and T but there it was, I was looking at it.

And to be reading our genome and have it all there, I mean, well,  not all but almost all there, was spectacular. And the paper  gave us a -- you know, it was the first pass at trying to  understand what's in there. It was -- I was awed by it in the  sense of 65 pages and Nature took it all [laughs]. And, you  know, we'll come to it maybe. You know,   there were mistakes and so forth, but it was a  serious effort at trying to understand what was there with  what -- with all its limitations. >> Jane Rogers:  I can remember also being so impressed at how many analyses  could be done with the draft sequence. I mean, it really 

was, you know, quite amazing to come up with what we did. And,  as Bob said, to have the global view and also to have tools that  began to allow you to access different parts of the genome,  the genome sequence, and to be able to view it at different  levels that are sort of, you know, on a whole genome scale  down to the individual sequence data. >> Robert Waterston: Yeah. On top of it was already  trying to give us access to the sequence in a convenient way. >> Jane Rogers: Yeah.

>> Christopher Donohue: Both of those are really, really  interesting expositions of the significance and I think the  sense of it as imperfect but  still meaningful according to a number of registers, it's really  well backed by posterity in many ways. It's also reflected in  the materials that we have in the archives as well. >> Robert Waterston: But, you know, I went back for  this interview and looked at the paper and I looked at the list  of major conclusions. And they're not very impressive,  actually [laughs]. We talked  about GC rich islands and -- well, that was already known and  we got the number of genes wrong [laughs]. And we had bacterial 

contamination that we all transferred but at the time, it  was really awesome. And I think it was -- I think it really was  not the specifics but the -- but this idea of a global view of  really knowing where we are in the world. >> Christopher Donohue: I think that's a really  wonderful way of putting it. Jane, any further thoughts?

>> Jane Rogers: No. I think I we're -- you  know, I agree with what Bob said. And I suppose the other  thing that really had an impact was the fact that   we were interested. Everybody had an  interest in this potentially because we all have a genome. 

We've all got multiple genomes but it's relevant to all of us  as human beings. And I think this was -- probably, this is  the first time that a genome sequence have had, you know,  this much relevance to, you know, our curiosity about who we  are and where we've come from. >> Christopher Donohue:  So, my next question is around both of your involvement in the  actual development and writing of the paper. I suppose we  could start with, you know, your thoughts about how the paper  took shape and your sense of who is decided to do what sections  and your reminiscences about the discussion of the division of  labor and the writing and the conceptualization of the piece. >> Jane Rogers: This one I think Bob was far  more involved than I was. I was much more involved on the 

generation of the data side of things. And at the Sanger, the  people who were deeply involved in putting the genome together  and certainly doing gene calling and developing the algorithms  for that were the bioinformaticians. So, Tim  Hubbard, Michele Clamp, Ewan Birney, and Richard were  involved in the discussions as part of   the analysis group. But Bob, you are a curator so --

>> Christopher Donohue: Yeah, you were a   manuscript curator. >> Robert Waterston: But let's step back from the  paper that -- you asked for how it took shape. There was  controversy about what the paper should -- what kind of paper  there should be. For the worm, we had done -- we had gotten the  sequence out there. We've done a fairly brief analysis of  things and then we enlisted others to write papers and we  made the sequence available to them. And they wrote papers on 

different aspects of things that they could find. And John was  -- John Sulston was of the view that something like that should  be done with the human and what's more, it was just a  draft. And he wanted to get it over with, get it out there  quickly. He wanted the September publication and have  it be fairly brief and then let the community have at it. And  that's the nature of it. That's how John viewed these things. And Eric, on the other hand, was like a kid in a candy store. He 

wanted to get his hands on it. He wanted to see what he could  see in it. And so, there was some back and forth about what  kind of paper it would be. And John eventually conceded he  could see that with Celera doing their thing and so forth we had  to be there with a decent job of analysis. We had to know what 

was in our product and we had to know if the product was good,  good enough to do this kind of stuff. And so, with that, Eric  took the lead. I think there's something famous -- there's some  quip about John telling him them, "If you want to do it that  way, Eric, you write it." [laughs] And I think that's  right. I think I actually remember that conversation. And so, Eric organized what became known as the analysis  working groups. And we had a phone call, I think, every week.  And somebody would be tasked to look at, you know, the people  that Jane mentioned from the Sanger, people from Wash U,  people from all over the place. Eric invited people and I think 

it was -- I don't think there was an open call for people, but  people -- anybody who thought they could contribute, I think,  was welcome. And then they were tasked with presenting a figure  of what they wanted to talk  about in a paragraph or two about what they had found. And so, Arian Smit talked about -- delved into the repeat  sequences and people wrote Ewan was very involved in gene  predictions. And we had -- we went through each of these  topics, the G.C. rich regions. And each of these topics came up  and they would get discussed. And Eric led the calls. He was 

a vigorous critic. I think I had a good input on some of  those things, too. If I couldn't understand it, the  reader wasn't going to understand it. And so, this  went back and forth for months with things getting refined.  They would be presented once and then they get critiqued and then  they -- all this figure looks crappy, you should do this, how  about this kind of representation of your data  instead of a bar graph, and things went on like that. And so -- and I don't know. In terms of how the specific tasks 

I think it comes in maybe in one of your other questions but  we'll deal with it here. Francis, I think was tasked with  writing the introduction. I think that's right. I was  tasked with writing the -- I don't know, the methods and  summarizing just what the sequence was. And Eric took  over -- took the analysis and I don't know what John's  responsibility was. Do you remember, Jane? >> Jane Rogers: I don't. And I   think it was probably -- >> Robert Waterston: Have you known what John's  responsibility was? >> Jane Rogers:  Probably overall editing I would think.

>> Robert Waterston: Yeah I mean, John is a very good  writer. And anyway, so things were divvied up that way.   And then Eric took all these  different vignettes and stitched them together into a paper. And  then I edited heavily what Eric wrote and Francis had a go at it  too and so forth. And then Phil Green reviewed it and told us 

how we should have done the paper [laughs]. >> Christopher Donohue:  I think most reviewers are like that actually. >> Jane Rogers: And this was -- Bon Waterstone: Oh, no. >> Jane Rogers: 14 pages or something. >> Robert Waterston:  I think it was 17 single-spaced pages. >> Jane Rogers: All right.

>> Robert Waterston: I don't know. It was a very  long and detailed critique of -- and told us, you know, "On this  analysis, you should have done this statistic instead of that."  I mean, it was very, very detailed. >> Christopher Donohue: So, that kind of critique. I  wonder if you still has a copy of it because that would be -- >> Robert Waterston: I probably do.

>> Christopher Donohue: Then actually that would be  really valuable. I would -- if you don't -- >> Robert Waterston: You haven't heard   of that before? >> Christopher Donohue: Not that explicitly. >> Robert Waterston: Or, you know, Phil had  deliberately not partaken in the analysis and so forth because I  don't know. He likes to work alone partly. He's not very  good. He has a history with Eric that might have been part 

of it. I don't know. But -- so, he was an objective reviewer  and we were lucky to have him.  He had very good suggestions for us. >> Christopher Donohue:  Could you just, you know -- as Bob's been talking about the  manuscript, could you just describe what it was like in  general just working with Bob, John Sulston, Eric Lander, and  Francis, and other leaders on the HGP during this time? >> Jane Rogers: The time was not -- I mean, I  think the time that you really are referring to is the time  when the project really became together and it accelerated.  So, this was stimulated by Celera making their  announcement, the HGP agreed on a strategy   and how we would go forward. But at that point, the 

time we had to deliver it was we had short time to deliver it and  we had a number of major obstacles generating maps for  all the chromosomes was one significant hurdle. Scaling up the sequencing was another significant hurdle  applied by systems had come up with new sequences that would  make the job easier and those were the sequences that Celera  was going to use. But for those of us who were already  sequencing, this presented a challenge in having to change  over all of our technologies to adapt to the new sequencers. 

And I have to say this because it was always my beef and the  sequencers as they came did not work properly. So, there was  always the challenge of actually having to get them working  efficiently at the same time as working up everything else to  match with them. So, you know, as far as what would you call it now, you'd  call it a perfect storm now of, you know, lots and lots of  pressure to get things working, deliver, and the sequence had to  be very, you know, efficient. We had targets to meet. We met  on phone calls once a week. And we had to report amongst the --  what we deliberately amongst ourselves called the G5s. So, 

this is the five largest sequencing centers that we had  to report on what our progress was. And you know what was it like?  Well, it was obviously -- it was stimulating. It was challenging  but at times we used to have these calls timed for Friday  afternoons. And certainly, over the summer months in the Sanger  on a Friday afternoon because we're very close to the -- what  is now the Imperial War Museum in Duxford if they were air  shows over the weekend, then we would have the sounds of  Spitfires, Lancaster bombers, you know, the old World War 2  planes in the background of these calls.

And sometimes the negotiations with our collaborators in the  United States, I have to say that the, you know, way that  commanders during the Second World War must have felt at  times when they were planning the advance across Europe  certainly came into mind because it was a tricky situation at  times that we had to handle and make sure that we're all  pointing in the same direction and, you know sort of going on  the same course. And there were, you know, times when it  looked as though the course would deviate. And at that  point, they had to -- you know, another reckoning, a lot of  meetings, a lot of email exchanges. And then we work to 

get back on course and go ahead again. But it was very, you  know -- it was challenging to get it done. >> Christopher Donohue: And one issue that comes up  pretty frequently is the quality assessments and making sure that  the sequence is accurate. And you -- could either of you go 

through a discussion about how there was all of this quality  assessment going on and how that process went and how you decided  essentially what the draft sequence standards were   and how you would present that? >> Jane Rogers: Bob, you talk about this.   It was a new metric every meeting. >> Robert Waterston: New metric, that's right. And  certainly, there was game playing to make sure that  whatever metric came up was one that favored what you were  doing. And some of us are better at that than others  [laughs]. And I don't know. It was important. We had different  groups. And we were proceeding in different ways. We had to 

come up with metrics. We had to come up with fair estimates of  costs so that we really knew what things were doing. You know, the Sanger was blessedly outside of this and  they were forced to listen to us  hash through this stuff among the NHGRI folks in particular  because NHGRI wanted to know if its grantees were performing.  And so, we had to get, you know -- we had to do these endless  reports. I don't know how much more sequence we could have done  if we didn't have to do the reports. But   maybe -- and so, we had to figure out what the  base quality, you know, how much -- what would count toward your  production. And I don't know. It was endless.

>> Jane Rogers: One of the final assessments in  terms of the quality of the overall sequence that we were  producing, we would assess how many contexts each of the BACs  were assembled into once we put them together. And I think we  -- I mean, ideally, you know, you'd aim for, I don't know,  maybe up to 10 gaps in a shotgun sequence of a BAC that was  generally a pretty good sequence. Some didn't go  together that well at all and maybe you needed some more. But  the -- another metric was the depth of the sequence. We were 

aiming at full coverage on the draft sequences. Sometimes if a clone was small, we generate a bit more. If the  clone was big, sometimes, you know, it got thinner. And  again, ideally, you topped it up and put a bit more in there.  But I think we measured all of those criteria. And then when  we came to generate an assembly of the clones, so we looked at  the overlaps in between clones,  and that would also give us an idea of the quality of the  sequence, you know, how well those went together. But it was 

largely dependent on the sequence step. And I know John  was very impressed with the idea that, you know, we have to count  the cost all the time. So, we have to generate the same cost  measures. And also, NHGRI was very interested in how we  measured that. So, we had to report as well, Bob.

>> Robert Waterston: Oh no, I know. [laughter] >> Christopher Donohue: No -- thanks for the detailed  assessment. That's really helpful because, in particular,  some historians are very interested in how things like  quality assessments were done and how various metrics were  agreed upon, and what scientifically and  community-wise went into those types of shared scientific  standards at this phase in the Human Genome Project.

>> Robert Waterston:  One of the advantages we had, I should get this in here I think,  is that Phil Green had developed his base-calling algorithm and  that produced a quality metric for each base. And that was  automatic and objective. And that was enormously useful and  valuable in these quality assessments. We could -- we 

could rely on that. And it was used by -- it was used across  all centers. If we'd all been doing different quality metrics,  you know, it would have been a mess. Everybody would have been  accused of fiddling the quality measures to boost up their  numbers and so forth. The fact that we had this common and very  reliable, very robust metric to use was very good.

>> Christopher Donohue: And some of those metrics like,  for example, the error rate had been in common conversation for  a number of years. So, that also, I think, probably helped  as well. So, this was built on for a number of years before  this huge scale-up finishing project. >> Robert Waterston: Well, you know, that was the  value of doing things like the worm and yeast and so forth.  So, you knew what the answers needed to be.

>> Christopher Donohue:  So, one thing we haven't talked about at all are some of the  companion papers. So, for both of you, what were the more  significant papers that were published alongside the initial  sequencing and analysis paper? And again, you know, your  thoughts on how those pieces of the major issue came together  and your thoughts on their importance? >> Jane Rogers: Well,   I suppose that -- I would say that -- I mean, the other  big companion paper that stands out is the SNP Consortium paper. >> Christopher Donohue: Sure.

>> Jane Rogers: It's reporting 1.4 million SNPs  that have been mapped across the  human genome. And the data were public. And the SNP Consortium  had started up in, I think, the spring of 1999. And this was a  consortium of pharmaceutical companies who had agreed that a  public resource single database with this data in there would  benefit everybody. And it was pretty competitive. So, Michael Morgan in Wellcome Trust did a good job in, you  know, getting everybody together and organizing the project. And 

the goal was to identify 300,000 SNPs across the genome. That  was what the project aimed at. But by the time that the draft  sequence was there, we not only were able to call SNPs on the  restriction digest fragments that were the original focus of  the project, but we could align that to the clone sequences and  call SNPs there and call SNPs in the overlaps. So, the -- you know, putting that first map of variation  across the human genome, I think that represented a huge  achievement. And I know, you know, it's an -- it was an  important, you know, companion to have there. I'm not quite 

sure when it was decided that it would actually be able to -- you  know, sort of the data were there to come to fruition at the  right time, but probably the time was just about right. You  know, the sequences were being generated in parallel with the  clone sequences. So -- >> Christopher Donohue: Yeah. It was always the one of 

the -- at least from Francis' outlines, there was always  supposed to be a companion paper on SNPs. And that's fairly  early. So -- >> Jane Rogers: Yeah. >> Christopher Donohue: -- the significance was   realized immediately -- >> Jane Rogers: Yeah.

>> Christopher Donohue: -- in consortium. Well,   go ahead, Bob. >> Robert Waterston: Well, I was going to say. It 

formed the -- it was the impetus for the HapMap. >> Christopher Donohue: HapMap, yeah. >> Robert Waterston: And, you know, it really does  start to reveal our evolutionary history. It -- it's valuable  for human disease, but it's also -- it starts telling us about  where we came from. And it was funny. The -- you know, the  consortium, the pharmaceutical companies   were in on the decisions about how all this  should proceed. It was their money. And they wanted this  restriction -- limited restriction digest limited set,  because it didn't depend on us having a draft sequence. That 

was its strength. But its weakness was that it didn't  acknowledge that we were going to have a draft sequence. [laughter] And so, it was very inefficient. We had to do lots of sequence  to get a few SNPs. And -- but finally, we were able to  convince them as the draft was coming along that we could just  do random sequences from this panel of individuals. And we  could -- every read would reveal a new SNP. And so, suddenly, we 

were -- when we were able to convince them that that was a  viable strategy, we were able to discover lots of more SNPs   very quickly. >> Christopher Donohue: And so, just one follow-up  question I've always wondered is, who came up with the phrase,  "golden path?" Was it Jim? Or was it somebody else? I know  it's not the preferred name, but -- >> Robert Waterston: [laughs] We got -- that got  squelched in the writing. I think it was Jim. >> Jane Rogers: I think it was Jim too. >> Robert Waterston: I think it was. >> Jane Rogers: Yeah.

>> Robert Waterston: Because he really did have to  choose his way through all these clones. >> Christopher Donohue: I think another very early  rejected name for it was "yellow brick road." So, it's -- [laughter] >> Jane Rogers:  That was probably a bad day name [laughs]. >> Christopher Donohue: You know, there's a very  interesting history of what things are named before there  are some consensus of -- on what to name it. >> Robert Waterston: Yeah. >> Christopher Donohue: The HapMap being a great example  of SIMMap and other things.

>> Robert Waterston: Yeah. >> Christopher Donohue: Shared Inheritance for Medicine  map and all the other things.  But one question that I wanted to ask both of you is to say  that there's been a -- at least to me, there's been a discussion  about, you know, whether -- in order to have a good sequence,  you always need good maps. Is that -- is -- even with current  technologies, is that the case for more recent technology? >> Robert Waterston: So,   that -- I -- that was that -- I should have mentioned that  when I was talking about the value of the map, because the  map provided a way to get the finished sequence. The map had  many fewer gaps in it than the sequence. And so, you could  look at the -- where you had a clone, you could pick the next  clone automatically, or almost automatically. I mean, it was 

there for you. The information was there. And to get  high-quality sequence, you need to have the territory refined  enough to a small area. So, you're not dealing with all the  complexities. You can target your efforts to the problems in  that clone. And they're different. That's the -- that's why  finishing was so hard to automate. It was because every 

clone had its own problem. You know, they -- you could -- you  did enough of them, you could recognize common problems, and  you could start to devise semi-automated ways of  addressing those, and then something new would come up. [laughter] But the clones let you do that.  And I -- until recently, there has been no high quality genome  that didn't -- wasn't a clone-based map-based genome.  You know, the fly was -- the sequence was highly touted for  being the first shotgun sequence, but they spent five  years, I think it was, finishing it. On a clone to clone basis,  they had to go back and build the map.

The mouse was the same way. We did a lot of clone-based  sequencing to start with, but then we shifted to the whole  genome shotgun. And I think it wasn't until, what, 2008 or '09,  something like that --when the mouse was finally finished,  whereas the shotgun was out right away. And so --

>> Jane Rogers: Or, no, they've given up.  Because the mouse just finished when I was still at Sanger. So,  it must have been 2005 to 2006 -- [talking simultaneously] >> Robert Waterston: And -- I don't know. Although  then the high-quality genome stopped coming [laughs]. >> Jane Rogers: Yeah. >> Robert Waterston: Because Solexa-Illumina took  over. And it was just too cheap to do whole genome, and nobody 

finished. And that's -- it's interesting. That's going to  change, because the long reads are basically a substitute for  the BAC clones. You suddenly have a large segment of DNA that  takes -- isolates itself from all the other problems in the  genome, and you can focus in on that. And you can assemble  that. And they've gotten -- they've actually -- with that  kind of stuff, they've been able to do things we couldn't do with  the BACs. They've actually gotten through with the  centromeres. Yeah. Anyway, I think that's right. The  

clones until now have been essential to  high-quality genomes of any kind. >> Jane Rogers: The other thing that the clone  sequence allows you to do is actually check it as a finished  sequence. You can do an in silico digest and compare that  with the actual restriction digest that you do as part of  the set loading process and actually compare the two. So, 

you can check and see that the  sequence is assembled correctly with the clones. >> Robert Waterston: Yeah.   And that's a step that we did on all our finished clones. >> Jane Rogers: Yeah, we did too. And yeah, it  was very nice to produce a high-quality product that people  could use. I'm going to -- so many of the other research  communities who have had the -- had genomes of the organism of  interest sequenced. Nothing -- you know, the cow and the pig, 

and so on, you know, there were shotgun sequences.   And a lot of the communities don't -- people  in the communities don't realize that when they actually want to  look at a biological problem that the sequence quality may  not be good enough to actually allow them just to extricate  what they need and not do any further work before they start. >> Robert Waterston: Well, or they are faced with  doing that extra work -- >> Jane Rogers: Yes. >> Robert Waterston: -- if they recognize that they  have to turn around in the work  that we did on the Human Genome Project. >> Jane Rogers: Yeah.

>> Robert Waterston: There's a lot of differed cost  in unfinished genomes. >> Christopher Donohue: Yeah. And it's a very  interesting discussion when you  have a -- you have different communities that have -- that  are looking at sequence and have different sort of quality  metrics for each -- you know, a sort of comparative grade  sequence or human grade sequence and -- >> Robert Waterston: Yeah. [laughter] >> Christopher Donohue: -- how -- right -- and   how -- and for -- a draft sequence may  not be very useful if you want to look at evolutionary  questions, but it may be useful for other -- for some other  questions that another community could ask. >> Robert Waterston: And unfortunately, you know, the  effort was really never made to fully automate finishing -- >> Robert Waterston: -- or automate clone maps and so  forth. It was just swept away. And so, it's true that if you're  working on a -- on an organism that only three other groups are  doing, it's too expensive to do a finished sequence, or it has  been too expensive. And it's just economics. You have  

to -- you -- >> Jane Rogers: Yeah. >> Robert Waterston: You can't get the whole genome.  You just have to put the effort into where you need it most.

>> Christopher Donohue:  And it's always fascinating to look at the sequence finishing  process having been described as artisanal. And other adage is  it really is a very different kind of work and a very  different kind of specialization. >> Robert Waterston: Yeah. And like I said, the  problems are varied. And  there's lots of different reasons why a sequence can fail  and why assemblies can fail. Just lots of different ones.

[laughter] You think you've heard and seen  them all, and then there are some others that comes up. [laughter] >> Christopher Donohue: Jane,   my next question is -- to you first, is when the initial  sequence paper was published, what sense did you have of kind  of the work left to be done and with finishing the sequence?  And what were kind of the more -- most pressing sort of issues  to be addressed since, you know, 2001 -- going from 2001 to 2003? >> Jane Rogers: So, I think you have to -- I  mean, the -- because publication of a paper comes sometime after  you've actually, you know, said you're going to hold the data  collection, because that's -- you know, you hold it, you then  have your data for analysis, it was really at that stage when we  finished generating draft sequence that we, you know,  thought, "Okay, done that." Now, we have to get onto the job  of generating the sequence for -- the finished sequence. So,  by the time the paper was  published, we had already been on to the -- you know, the  finishing problems for six months.

But it's the same -- you know, the same feeling of, "Oh,   now, we've got to go and sort things  out." So, the problems that we were facing, first of all,  completion of the maps, Bob -- Bob's group produced the whole  genome map, but at Sanger we had individual chromosome maps, and  we worked on finishing the maps on the basis of using the  sequence. So, you use that to go in and probe for more clones  to complete the maps as you go along. We have a shotgun sequence for a high proportion of the clones.  We haven't actually stopped our finishing process through the  whole of the generation of the draft sequence, much to the  annoyance of Eric on more than one occasion on the D5 codes. 

You know, Sanger were sequencing  2D, usually by sequencing more clones. But we had to go back  and complete the shotgun sequences. And as part of the scale-up process, we thought we could  store all the subclones and go back to those and   use them for finishing. So, we invested in 

three porta cabins. Does that make sense in American -- three  porta cabins essentially of -- three support cabins for which  we have to generate clone tracking system and informatics  problem so that we could then go and find these things all stored  at minus 20, the sort of size of fridge that we were told we'd  normally use the frozen peas or something like that. So, we  have three of these stacked outside of the Sanger Center. So, working out how to do all of that, how to retrieve the  clones, then we were off shotgunning again, and then  doing the hardcore finishing, and -- we have a lot to do. And  in finishing, you have the problems with the actual  technical problems, you know, dealing with the data, getting  the -- everything done to get your sequence together. But if 

you have -- as we did, several groups of finishers who spent  their days in front of computers, we also started to  problems -- they had to deal with problems with people having  problems with repetitive strain injury and, you know -- >> Christopher Donohue: I see. >> Jane Rogers: -- sort of health type of  things. So, you had to, you know, make sure that people, you  know, have breaks, didn't work too long, and so on. So, it --  you know, all of these things sort of just added to, you know,  what you had to deal with in terms of getting the sequence  done. But it was six months before the actual publication of  the paper. And, of course, when the paper was actually  published, a lot of people thought that was it, that was a  sequence. And to then say, "No, we're -- you know, sort of we're 

-- we got another two years of working on the finishing," I  think there were a lot of people saying, "Well, you know, what  are you doing? You know, it's done." In the end, you know, I think, you know, we -- the proof came  in the high quality. The -- you know, it stood the test of time.  But -- yeah, keeping people motivated, it also became, you  know, something of a challenge sometimes.   But the -- you know, Bob, there were other -- have I  covered the main issues? I mean, sorting out the maps, and  then, you know, sort of filling the gaps and dealing with the  tricky problems? Because some of the things that were  challenging in the draft sequence were not only repeats  within clones but also then large repetitive elements in the  genome that -- and duplications that were collapsed. >> Robert Waterston: Yes. >> Jane Rogers: So, that took a lot   of sorting out.

>> Robert Waterston:  And Evan Eichler is still sorting them out [laughs]. >> Jane Rogers: Yes [laughs]. >> Christopher Donohue: These Eichler repeats… [laughter] >> Robert Waterston: That's right.  

Well, I think you covered it pretty well. I mean,  when -- that was certainly -- you know, we knew we had to  finish this sequence. There was a lot of concern that having  captured as much information as we had that the drive for  finishing would be diminished. And we had to push people 

upfront and -- when we agreed to do the draft that people were  committed to doing the finishing. And so, like Jane said, I mean, you know, we'd done the draft  shotgun. We stored all -- we  stored all the subclones to go back and use them. We didn't  have to resort to porta cabins. But we had a massive storage and  access issues. And so, we did all that. I would say the other thing facing us was the mouse.

>> Jane Rogers: Yes. >> Robert Waterston: Because Celera had quit doing  their shotgun -- and I can't remember 4X or 5X or something  like that, because they were trying to augment it with the  public data, and they figured they had enough between the two,  they had turned all their shotgun capacity to the mouse  earlier than they had planned. And they were -- I think by the  time of the publication, they were already selling access to  the mouse sequence. I don't know when that was formally  available, but it was around that time that they were  advertising anyway. They were promising it and recruiting  buyers.

And I remember colleagues at WashU who worked on the mouse  and -- saying, "Oh, you know, I  feel like a trader. But I'm -- we're going to have to buy it."  And again, this is a real threat to what -- just this whole ethos  of biology, of science, I would say, not just to the genome,  because if you can do it here, you can do it elsewhere. And so, we agreed that we had to make a significant and rapid  stand here. And together, the Sanger and we and the  [unintelligible] forged ahead at breakneck speed, basically  returned the capacity over that we had developed in the shotgun.  And Eric convinced us that on this case we did have to do the  whole genome shotgun. And we went along with that, because  anyway, they were -- I don't even know if they were good. I 

don't know. I can't remember what the status of the libraries  was even. But we weren't going to do another effort -- a crash  effort. And so, we did the whole genome, shotgun, and did a  remarkably good job on it, I think. But I -- we talked at the same meeting and something, and he  came up to me afterwards and complimented me on the mouse  sequence. Because I -- it was -- you know, Celera had 

published -- by that time, they had published the sequence of  one chromosome, because they were clearly not going to make  any promises for availability for anything else. So, they  published this one chromosome and made that data available.  And ours was much better than that. It just was. We had the  -- we already have that fair amount of finished sequence for  the mouse or clone-based sequence anyway. >> Jane Rogers: I think we were generating that,  because we had a map. >> Robert Waterston: Did you? >> Jane Rogers: We made the map, because -- >> Robert Waterston: Oh, no, you -- >> Jane Rogers: It was made comparatively.

>> Robert Waterston: That's right. We did that  together, because we realized you could -- that's right. I  have forgotten that. You could use the BAC-end sequences -- >> Jane Rogers: Right.

>> Robert Waterston: -- and their homology to human  to put them together. >> Jane Rogers: Yeah. And -- >> Robert Waterston: Yeah. >> Jane Rogers: Yeah.

>> Robert Waterston: Yeah. So, we did -- I'm wrong  about the clone map. That's right. But we didn't use it.  We did the whole genome and -- but we had combined stuff and we  had -- we did have the BAC ends and so forth. So, that was a  big, immediate challenge to get that out there. And we did. We 

got it out in a remarkable time. And this colleague at WashU  came up to me a few months later  chagrin that he'd spent the money. [laughter] I warned him that it was going to be -- you know, he wasn't --  it wasn't going to be a good money, or it wasn't going to be  a money well spent, but he did it anyway. [laughter] >> Christopher Donohue: Now, I should clarify that. And  I think it was maybe April of 2000 when there was the initial  freeze for the draft -- or the data freeze that already there  was very intense discussions about finishing   the sequence. So -- >> Robert Waterston: Yeah. No -- it -- we knew we 

had to do it. Yeah. Well, there were intense discussions  about finishing. You know, it goes all the way back to our  crazy proposal in 1994. We, John and I, put forward a  proposal in '94 to basically do a draft sequence and follow up  with finishing. We were going to do a better draft sequence, 

but we were going to do it more deeply. So, we didn't have to  go back and do more shotgun for the individual clones. >> Christopher Donohue: This is finishing the sequence  five years early or something? >> Robert Waterston: Yeah. And one of the big  criticisms that we had, Maynard Olson in particular said, you  know, "If you did this, it would never get finished." So, the 

idea of this tension between capturing the information  quickly and not completely versus more expensively and with  more effort. And I don't know, Maynard had this saying -- or I  know a little story or whatever. He said -- you know, he was a  -- Maynard -- this is -- Maynard also came in from chemistry,  where things are on a much more solid footing than they are in  biology. And his impression of biology was that, you know, when 

he got into this, it was like quicksand. [laughter] You just kept getting more and more complexity and more and  more exceptions. And to him, the genome was a   rock at the -- underneath the quicksand that if  you had the genome, you could only sink so far. And so, to  him, it was important. It was vitally important that that 

bedrock be solid and not full of fissures and cracking and  falling apart and letting you sink further and further. And  so, he was a very strong advocate of finishing. Yeah. >> Jane Rogers: Yeah. >> Christopher Donohue: So, just because we're a little  over time -- I want to be  careful with everybody's time -- the final question I'm going to  ask both of you is just your concluding thoughts about the  significance of the sequence and the paper for genome biology  and, you know, genome science in general looking back. >> Jane Rogers: I think it was the first time  that the -- it's not the general public. It's the general  scientific community could see the value of having genome  information. And that -- and, I mean, on the sort of minor level 

you can say that in -- you know, in an instant PhD projects in  human genetics were transformed, instead of spending three years  to clone a gene and possibly sequence. You went to the  database, and you start your PhD project with the sequence of the  gene, you can then start to do biology. So, I think it's -- it  was transformative, you know, with that sort of example.

>> Robert Waterston: I would agree. I mean, you  know, the yeast and the worm sequence convinced those  communities early on about the value of genomics. But for the  wider community, it needed the human genome. And then what was 

gratifying and surprising to me, I think,   was how rapidly that created demand. You know, the zebrafish community had to have their  genome sequence. The -- you know, Richard Gibbs had to do  the cow for the Texas farmer -- ranchers.

[laughter] And all of a sudden, the revelations that -- the power of  having a genome sequence was obvious to so many people. And  it just created an immense demand, which has, you know,  fueled -- >> Jane Rogers: [affirmative]. >> Robert Waterston: -- the field. And it drove  Solexa. It created a market for them. And they could get  investors. They could bring their product to the market and  have buyers. I don't know, and I think it also let people see 

the value of doing some things at scale. You know, biology is  a cottage industry, for the most part. >> Jane Rogers: [affirmative]. >> Robert Waterston: This data-driven science is sort  of foreign to biology before all this. And it became apparent  how much more you could do with a data-driven approach. And so,  I think biology, I don't think there was dichotomy that there  was. When the Genome Project started, there was a lot of 

antagonism towards the Genome Project, because it wasn't  hypothesis driven. And today, I think people, I don't know,  they've incorporated it. It's part of maverick. And so, you  do both. You collect data at massive scales and it's -- well,  you know, with Illumina technology, small labs can  generate more data than they can analyze pretty quickly.

[laughter] And so, you do both, and you're able to collect huge amounts of  data to drive your hypothesis. And I think it's been very good  for biology. It's changed things. And I think that really  did come from the obvious value of the human genome. >> Jane Rogers: Yeah. >> Robert Waterston: It just -- like Jane said,   PhD students could all of a sudden  zero in on what they needed to study. And -- I don't know, and  all kinds of other things, the -- you know, like the HapMap  followed, and now we have sequencing of ancient DNA. And 

it's just the amazing thing to -- >> Jane Rogers: The sequencing of individuals to  look for COVID susceptibilities, you know. >> Robert Waterston: Yeah. The amazing thing to me  is how pervasive genomics became so rapidly. I just didn't -- I 

had -- I guess I was too  schooled by the opposition that we'd faced in the early days  that there were -- that people would be won over so quickly and  change the way things are doing -- things were done.   It's been amazing. >> Jane Rogers: Yeah. >> Christopher Donohue:  The -- yeah, the democratization of sequencing technology really  has been remarkable, you know. Any kind of -- many, many labs 

produce much more data than, you know, they can easily analyze -- >> Robert Waterston: Yeah. >> Jane Rogers: Yeah. >> Christopher Donohue: -- the current sequencing  technology. >> Robert Waterston: Oh, and, you know, we're doing a  lot of single cell analysis. And, you know, we throw a  sequence at it, because the other part is more expensive. [laughter] So, we try -- we use sequence to extract all the information we  can out of the data. And it's just so -- it's so available. 

It's not as cheap as I want it, but it's really available  [laughs].

2021-02-16

Show video