john hawks weblog

paleoanthropology, genetics and evolution

genomics

  • The ENCODE project and function in the human genome

    Wed, 2012-09-05 15:13 -- John Hawks

    I wanted to find out more about today's publication of the ENCODE catalog and data, and so I turned right away to lead bioinformatician Ewan Birney, who has an excellent blog post about it: "ENCODE: My own thoughts".

    I recommend the whole thing, which is an extended Q-and-A format like I often do. The most interesting for people reading science news stories will probably be about the claim that a very large proportion of the genome (up to 80%) is functional. Birney's comments put that number into context:

    Q. So remind me which one do you think is “functional”?

    A. Back to that word “functional”: There is no easy answer to this. In ENCODE we present this hierarchy of assays with cumulative coverage percentages, ending up with 80%. As I’ve pointed out in presentations, you shouldn’t be surprised by the 80% figure. After all, 60% of the genome with the new detailed manually reviewed (GenCode) annotation is either exonic or intronic, and a number of our assays (such as PolyA- RNA, and H3K36me3/H3K79me2) are expected to mark all active transcription. So seeing an additional 20% over this expected 60% is not so surprising.

    However, on the other end of the scale – using very strict, classical definitions of “functional” like bound motifs and DNaseI footprints; places where we are very confident that there is a specific DNA:protein contact, such as a transcription factor binding site to the actual bases – we see a cumulative occupation of 8% of the genome. With the exons (which most people would always classify as “functional” by intuition) that number goes up to 9%. Given what most people thought earlier this decade, that the regulatory elements might account for perhaps a similar amount of bases as exons, this is surprisingly high for many people – certainly it was to me!

    Even at 8%, the amount of potential regulatory activity in the genome is very large, and this should factor into the way we study recent human evolution. Birney discusses purifying ("negative") selection as one criterion for identifying functional DNA, but of course substantial functional variation might emerge under random genetic drift of such elements in human populations.

    Also, he writes about the process of inventing a new kind of publication -- "threads" -- which highlight related tracks across a large set of publications. With 30 papers in the current ENCODE publication release, in multiple journals, tracking a single subject would be complicated for anyone. So they tried to help out:

    Threads offer an alternative, lighting up a path through the assembled papers, pointing out the figures and paragraphs most relevant to any of 13 topics and taking you all the way through to the original data. The threads are there to help you discover more about the science we’ve done, and about the ENCODE data. Interestingly, this is something that’s only achievable in the digital form, and for the first time I found myself being far more interested in how the digital components work than in the print components.

    The post has a lot of interesting background information about the ENCODE project, the process of coordinating a project with hundreds of scientists, and the conflicts that arose between ENCODE and groups targeting smaller, narrower subjects related to DNA function.

    UPDATE (2012-09-05): Dan MacArthur has further thoughts about the influence of the publication model in the paper, with its innovative threaded e-structure and the inclusion of a virtual machine which archives many of the computational approaches: "The ENCODE project: lessons for scientific publication". But he adds an additional note related to openness:

    At the same time, it is worth noting the constraints that the standard embargo model of scientific publication have still imposed on the project. Much of the ENCODE data was mature and ready for use 12 months ago, and for those in the know has been a valuable component of functional annotation pipelines. Many of us in the genomics community were aware of the progress the project had been making via conference presentations and hallway conversations with participants. However, many other researchers who might have benefited from early access to the ENCODE data simply weren’t aware of its existence until today’s dramatic announcement – and as a result, these people are 6-12 months behind in their analyses.

    Even though the ENCODE project followed very open data release policies, we still have much progress to achieve on dispersing information rapidly enough to make a difference to researchers outside these big projects.

    Synopsis: 
    A giant project for cataloguing functional gene variation publishes its results.
  • The fused chromosome 2 was in Denisova

    Sat, 2012-09-01 23:16 -- John Hawks

    In my post on the new Denisova paper the other day ("Denisova at high coverage"), I forgot to mention one interesting detail in the new paper by Mattias Meyer and colleagues [1].

    Sometime in our evolution, two separate chromosomes fused into one, giving us a karyotype of 46 chromosomes where chimpanzees, bonobos and gorillas have 48 chromosomes. The high-coverage genome was sufficient to show that Denisova shared the human fusion:

    Of more relevance may be examination of aspects of the Denisovan karyotype. The great apes have 24 pairs of chromosomes while humans have 23. This difference is caused by a fusion of two acrocentric chromosomes that formed the metacentric human chromosome 2 (25), and resulted in the unique head-to-head joining of the telomeric hexameric repeat GGGGTT. A difference in karyotype would likely have reduced the fertility of any offspring of Denisovans and modern humans. We searched all DNA fragments sequenced from the Denisovan individual and identified twelve fragments containing joined repeats. By contrast, reads from several chimpanzees and bonobos failed to yield any such fragments (8). We conclude that Denisovans and modern humans (and presumably Neandertals) shared a karyotype consisting of 46 chromosomes.

    We still have no idea whether this fusion made any difference to any phenotype in ancient humans.

    Many, many people have written me over the years to ask whether this fusion of two ancestral chromosomes might have been important to our evolution. Perhaps, many suggested, if Neandertals had a chromosomal incompatibility with us, that would explain why they became extinct. I have always doubted this, but without information it was impossible to be certain.

    It's nice to now have the information in hand: This fusion happened earlier in our evolution.


    References

  • Grasping the genomic palantir

    Sun, 2012-08-26 22:22 -- John Hawks

    Gina Kolata writes in the New York Times about the conundrum faced by research scientists who inadvertently discover the health risks of their research participants: "Genes Now Tell Doctors Secrets They Can’t Utter". The first case described, which is the clearest in many ways, was one in which the participant was discovered to be free of a mutation that had caused breast cancer in her female relatives:

    [T]he woman, terrified by her family history, also intended to have her breasts removed prophylactically.

    Her consent form said she would not be contacted by the researchers. Consent forms are typically written this way because the purpose of such studies is not to provide medical care but to gain new insights. The researchers are not the patients’ doctors.

    But in this case, the researchers happened to know about the woman’s plan, and they also knew that their study indicated that she did not have her family’s breast cancer gene. They were horrified.

    That case is ethically straightforward compared to others, because the researchers could make a difference to an immediate medical decision. On the other hand, how many risk-free research participants went ahead with prophylactic mastectomies because researchers didn't know about their plans?

    I think the article will be a good one for prompting student discussions in my courses, and I'll likely assign it widely. But I think the central ethical problem discussed in the article is temporary.

    Basically, the problem is that researchers are coming into knowledge about simple, high-penetrance Mendelian variants, where the information about disease risk is very clear, but they are restricted in various ways by privacy agreements related to their research. There is, in other words, an information asymmetry between researchers and their subjects. The article also mentions the problems faced by researchers studying dead research subjects, who may nonetheless have surviving family members who might benefit from knowledge about the deceased's genotypes. The problem arises because genetic sequencing is expensive and rare.

    There will be a time soon when genetic sequencing is cheap and universal, and research participants will be very unlikely to have unknown Mendelian disease alleles. Non-Mendelian risks are much less actionable -- some complex statistical combination of different genotypes may be interesting to a researcher, but is pretty unlikely to give rise to a specific "You must treat this NOW" ethical problem. When the actionable information available to a researcher is already part of a subject's medical file, the information asymmetry that gives rise to the ethical problem will be gone.

    In the medium term, immediacy of results makes a tremendous difference in this ethical situation. The article is pointing at researchers who are making new discoveries about 20-year-old samples. Take a look at fMRI research, another area where research participant could potentially receive information that is directly relevant to health -- maybe at worst, a previously undetected tumor. Many research studies provide their subjects with an MRI image of their brain, as a routine "reward" of participation. What makes this model work is that it is done at the point of participation. An fMRI is not a cheap or easy test, but an image print can be done immediately and given to the participant. It would be super easy to do the same with genotyping data, including routine reports on ancestry and health risks as provided today by 23andMe and other providers, if the genotyping were immediate.

    In fact, a 23andMe-like readout for research subjects would pretty much end the "ethical problem" of this article.

    Since the ethical problem itself arises from the (relatively rare) cases where genetics give rise to actionable predictions, and actionable predictions are one plausible goal of "personalized genomics"...it is interesting to ponder whether the end of ethical problems may also be the end of productive research in this area.

    Synopsis: 
    What to do when you discover your anonymous research subject is going to die?
  • Into Africa

    Fri, 2012-07-27 00:59 -- John Hawks

    I have a lot to say about the new study of African genomes by Joseph Lachance and colleagues [1], which I think is tremendously exciting, along with the new preprint from Joseph Pickrell and colleagues on the arXiv, which includes some similar analyses with SNP data. But I'm on my way to Africa myself today for a week, and don't have time to post all my thoughts about the new papers until I arrive there. So I'll try to post these over the weekend.


    References

  • Modern humans in with a whimper

    Fri, 2012-07-20 16:10 -- John Hawks

    A short, open access review paper by Isabel Alves and colleagues [1] registers two important points:

    Until recently, the out-of-Africa model of human evolution was favoured by most genetic analyses, but this model collapsed when the sequencing of the Neanderthal genome revealed that 1%–3% of the genome of Eurasians was of Neanderthal origin. At the same time, refined analyses of modern human genomic data [1]–[3] have changed our view of evolutionary forces acting on our genome. While most people assumed that the out-of-Africa expansion had been characterized by a series of adaptations to new environments [4]–[6] leading to recurrent selective sweeps [7], our genome actually contains little trace of recent complete sweeps [2], [3], [8] and the genetic differentiation of human population has been very progressive over time, probably without major adaptive episodes [9].

    I disagree slightly with the latter point about selection -- in fact, we have abundant signs of recent positive selection in the genome, but those signs are nearly all very recent partial sweeps in different human populations. Complete sweeps and near-complete sweeps are indeed few, suggesting that there was relatively little directional adaptive evolution associated with the "origin of modern humans." Measuring by genetic change, agriculture was many times more important than the appearance of modern humans throughout the world. The important point with respect to archaic humans is that there are precious few genetic changes shared by all (or even most) humans today, that are not also shared with Neandertals, Denisovans, or plausible other archaic human groups (such as archaic Africans).

    That of course follows from the fact that a fraction of today's gene pool actually comes from those ancient groups. Their variation is (by and large) human variation..

    Most anthropologists do not yet fully understand this genetic picture. We cannot presently define "human" in a genetic sense without including Neandertals.

    Alves and colleagues discuss some important corollaries of the two key observations above. An important one:

    Even though our simulated scenario is unrealistically simple, it is likely that differential admixture should affect population genetic affinities under more complex models of population differentiation. The proper interpretation of human genetic affinities should thus probably be re-evaluated in the light of these results.

    A lot of studies of human genetic variation have assumed no mixture with archaic humans. Such studies are now obsolete. Whole-genome evidence is coming online, and with that evidence we must apply new analytical methods that incorporate more complex demographic hypotheses. These more complex models will require greater attention from anthropologists and population geneticists, but they should give us a more accurate picture of the causes and background of human diversity.


    References

  • Phylo, the genomics game

    Thu, 2012-06-14 09:30 -- John Hawks

    NOVA describes how some genomics problems are being solved using computer gaming: "Gaming and genomics".

    "When a computer tries to solve the problem, it will always try to solve it the same way – the way it has been programmed to solve it," says co-creator Mathieu Blanchette. "Whereas humans, because we don't tell them how to solve it, they'll have different strategies. That will provide us with a variety of different solutions, some of which will turn out to be better than those found by the computer."

    Therefore, asking a lot of people to solve the same problem often gives the best results, and that is what Waldispühl and Blanchette have done. They've crowdsourced the game. They've put it up on the web for anyone to play. About 500 people a day are playing it from all over the world, according to the scientists.

    The challenge for many of us is to find ways to break down problems into these small parts that can be distributed.

  • Sequencing FTL neutrinos

    Sat, 2012-03-17 11:13 -- John Hawks

    A well-written blog account of a current controversy in human genetics, by Joe Pickrell: "Questioning the evidence for non-canonical RNA editing in humans".

    The observation that I personally found most convincing is displayed in the plot at the beginning of this post. What I’m showing is that mismatches to the genome at RDD sites occur almost exclusively at the ends of sequencing reads. All three technical comments include this observation. Importantly, Lin/Piskol et al. take this analysis one step further. They show (in their Figure 2) that this effect is driven by the fact that mismatches to the genome at RDD sites tend to occur at the beginning of sequencing reads that go in the opposite direction of transcription (this effect is masked in my plot).

    It's a bad sign when 90% of your observations may result from sequencing errors. That's something we spend a lot of time trying to understand and work around in the archaic human genomes. We frequently find that, while the genetics ought to follow a mathematical model perfectly well, the sequence data are noisy in ways that interfere substantially with our predictions.

    It's the same thing as using bad wiring in a neutrino experiment, really. If you know about it, you can work around it. Otherwise, it's liable to mislead you.

  • Gorilla genomics and hearing evolution

    Thu, 2012-03-08 00:37 -- John Hawks

    The Nature News story on the gorilla genome includes this section relevant to the evolution of hearing in gorillas and humans:

    Some of these rapid changes are puzzling: the gene LOXHD1 is involved in hearing in humans and was therefore thought to be involved in speech, but the gene shows just as much accelerated evolution in the gorilla. “But we know gorillas don’t talk to each other — if they do they’re managing to keep it secret,” says Scally.

    This weakens the connection between the gene and language, says [Wolfgang] Enard. “If you find this in the gorilla, this option is out of the window.”

    This is one of the genes that I have been working on with reference to its acceleration on the human lineage. It is a mistake to view the evolution of hearing to be directed specifically to language; instead human and gorilla lineages are both adapting to an aural environment different from ancestral hominoids. In both these lineages, there was an increase in body size and reduction in the mean frequency of vocalizations, enough to prompt adaptive changes. In humans, we have had additionally the addition of language as a communication system, which has its own auditory requirements. The connection with language is only indirect, in that human-specific changes to this and other genes provide evidence of adaptive change in the auditory system.

  • Finding the scary genes

    Wed, 2012-03-07 21:39 -- John Hawks

    John Lauerman reports in BusinessWeek on his experience participating in the Personal Genome Project:

    “This is probably the most serious variant that we’ve actually seen to date in the study,” Thakuria said. About two out of 1,000 people have the JAK2 variant, which encourages blood cells to grow and divide. The variant is used to diagnose three rare blood disorders, including primary myelofibrosis, which is potentially lethal. “I don’t want you to fret about this,” Thakuria said, before giving me fresh cause for worry: a study, published in 2010, in which 10,507 people in Copenhagen gave blood samples and were followed for as long as 18 years. The Copenhagen researchers went back and analyzed the blood samples: 18 had the JAK2 variant; 14 of those 18 with the variant developed cancer in their lifetimes, and all 18 died within the study period. How, exactly, was this helping?

    Finding that you carry a harmful genetic variant, and that there's nothing you can do about it, is probably the most frightening outcome when obtaining your personal genetic information. Some say they would rather not know about such genes.

    Several others have commented on Lauerman's piece, including Matthew Herper at Forbes, and the 23andMe blog. Naturally, they have different takes.

  • “He had a sufficiently high opinion of himself"

    Tue, 2012-01-03 23:20 -- John Hawks

    Gina Kolata profiles Eric Lander, director of Harvard and MIT's Broad Institute and advisor to President Obama, in the New York Times. It's a good read for those interested in the recent history of genetics, and where it may be going from the perspective of one of the largest sequencing centers.

    I also learned a lot from the descriptions of Lander in Jamie Shreeve's recent book, The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World. I really enjoyed the book, and if I have time I'll do a full review.

Pages

Subscribe to genomics

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.