HapMap

This week's Nature has over twenty pages of HapMap coverage. Here's the abstract of the main paper:

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

I'm sort of scanning the stuff for interesting quotes now. Here's one, on the concern about false positives in phenotypic associations with SNPs:

Given the potential for confusion if associations of uncertain validity are widely reported (and a persistent tendency towards genetic determinism in public discourse), we urge conservatism and restraint in the public dissemination and interpretation of such studies, especially if non-medical phenotypes are explored. It is time to create mechanisms by which all results of association studies, positive and negative, are reported and discussed without bias.

Following its own advice, the study discusses evidence for local and global selection in remarkably sterile language. The HapMap is not ideal for finding evidence of selection -- the focus on only common variants tends to exclude several of the usual tests of selection, as well as low-frequency selected variants. But some interesting details are in the data, like this:

First we consider population differentiation, generally accepted as a clue to past selection in one of the populations. The HapMap data reveal 926 SNPs with allele frequencies that differ across the analysis panels in a manner as extreme as the well-accepted example of selection at the Duffy (FY) locus (Supplementary Fig. 8c). Of these 926 SNPs, 32 are non-synonymous coding SNPs and many others occur in transcribed regions, making them strong candidates for functional polymorphisms that have experienced geographically restricted selection pressures.

I've decided I admire statisticians who make it their work to figure out how to wring multiple-comparisons tests out of 3 billion base pairs of sequence.

The HapMap is an incredible step forward in characterizing human genetic variation. It's a challenging dataset to work with, though. It's like an old map showing continent margins and little else -- we can see many of the common SNPs, but for most we have no idea which ones are functional or what they might do.

But there's some much more interesting stuff coming before too long.

References:

The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437:1299-1320. Full text (subscription)