Denisovans did not have red hair

At least, the Denisova sequence does not have any of the variants in humans that are associated with red hair. Nor does it share the unique Neandertal variant argued to affect hair color in that group.

It's hard to make very confident predictions about pigmentation phenotypes from our current knowledge of gene associations. But it's fair to say that there's no evidence of anything other than dark hair for this individual. What may be equally interesting is that at least one Neandertal individual (Vi33.26) also appears to lack the unique variant in other Neandertals -- meaning that this group was probably polymorphic in hair pigmentation.

The unique Neandertal mutation observed by Lalueza-Fox and coworkers Lalueza-Fox:2007 is an A to G substitution at position 919 relative to the beginning of the coding sequence -- this mutation changes position 307 of the amino acid sequence from arginine to glycine (abbreviated Arg307Gly). This mutation was not otherwise observed in living people, but Lalueza-Fox and colleagues suggested on the basis of computational modeling that the change would reduce Mc1r activity, having a similar effect to known mutations that correlate with red hair. I wrote extensively about the study at the time ("The flame-haired Neandertals"). Lalueza-Fox and colleagues could not confirm that the sampled individuals (El SidrĂ³n 1252 and the Monti Lessini specimen) were homozygotes for this mutation, but their multiple confirmation showed that the mutation must have been present. Hence they included the concept of "varying pigmentation" in the title of their paper.

The MC1R sequence has very limited coverage in the Neandertal draft genome data. Only one read from one individual (Vi33.26) covers this position of the genome; this read has the normal human allele (A) at this site. The Denisova sequence has reasonably good coverage across this site, with four reads covering it and one ending on it. All of these have the normal (A) allele. So Lalueza-Fox and colleagues were likely right -- this is a polymorphism in Neandertals. And it wasn't shared with Denisovans.

The coverage caveat

The exercise raises a problem, which really has no good solution: How many reads must we see to be confident that an ancient genome has an allele? All the available ancient genomes are very low-coverage, and have a high fraction of sequence errors. The Denisova sequence reads are vastly better than the reads from the Neandertals -- maybe even as good as the sequence reads from the human data provided alongside them. When we look at the living human genomes acquired with the same technology, we find reads riddled with errors. Beyond that, alignment of these short reads with the human reference genome is itself a statistical test that sometimes the computers fail. When a single read of 30 nucleotides is different from the human reference genome in three or four places, we can probably disregard it, even if the reported sequencing quality is high. When the first of last nucleotide is different, we can probably disregard that too, at least unless it is replicated in other reads. But when all we have is a single read, and when it differs from the human reference in a reasonable location -- or if it shares an allele with some known humans or chimpanzees -- what are we to make of it? Restricting ourselves to known polymorphisms -- either within humans or between humans and other species -- helps us to ignore the majority of spurious differences in these ancient sequences, but it does not eliminate the occasional error, and it may miss many interesting sites.

I tend to ignore sites where only a single read shows a difference between the ancient and consensus sequence. Where two reads overlap (as long as they don't look like multiple clones of the same read), I note the differences, and say that they might be interesting, but we really need more coverage. Where sequence differences occur in multiple reads, we can have a bit more confidence. If a site is a known polymorphism in humans or primates, I'm willing to believe the allelic state in a single read, but I feel better when there are several reads.

We won't be able to do anything with the genotypes of these ancient genomes until we have substantially higher coverage, and that makes interpreting the data very difficult. Remember, we're diploids, and we'll need even more reads to start considering genotypes and heterozygosity within these ancient genomes. At present, the statistical variance of this strange hybrid consensus genome is not what we expect from an actual single copy of the genome. That is our burden for relying on high-throughput sequencing these days, we just have to deal with it.

Known human polymorphisms

Now, if we really want to know about the function of Mc1r in the Denisova individual, we need to consider all the known human polymorphisms and their effects on the phenotype. Our knowledge of pigmentation variation attributable to MC1R in humans is not complete. Some combinations of alleles are shared by only very few people, and promoter polymorphisms which might affect Mc1r expression are tightly linked to coding polymorphisms, making it hard to assess their effects Makova:Norton:2005. But we can certainly run down the known coding polymorphisms and see which alleles are in the Denisova and Neandertal sequence reads.

Harding and colleagues Harding:2000 provided an early assessment of sequence variation in the MC1R coding region among living humans. Chimpanzees differ from all sampled humans at 15 nucleotides. The Denisova individual is represented by at least one sequence read for every one of these substitutions, matching the human consensus sequence for all of them. The Neandertal sequence data do not have nearly as good coverage over this interval; they also match the human sequence where they are represented.

Where human SNPs are concerned, Harding and colleagues placed the root of the human genealogy between two haplotypes that differ by a mutation at site rs2228478, near the end of MC1R coding sequence. This is a synonymous mutation with no effect on the amino acid sequence, and it is a common variant in most human populations. The Denisova and Vindija 33.26 sequences both share the ancestral G allele for this SNP, meaning that they do not share the oldest derived variant present in most populations.

The ancestral G is more common (around 50%) in Africa than in Eurasia (10-25%). Its present geographic distribution of this variant doesn't tell us much about its early evolution, in part because the variant today is linked to nonsynonymous substitutions that may have been selected in Eurasian populations.

Neither Denisova nor any of the Vindija sequences possess any other derived SNPs found in human populations. That includes the variants known to be associated with pigmentation. Moreover Denisova does not present any sequence differences from the hg18 reference sequence that are represented by more than a single read. The sequence has reasonable coverage (3-4x) across much of this interval, so the lack of differences is somewhat informative. The Neandertal coverage is very low but also has no differences from hg18 that are represented by more than one read.

So, no novel polymorphisms in these individuals that we can confirm, and no derived SNP variants shared with any other humans. For most of the human SNPs, that's no surprise -- most of them occurred on chromosomes that carried the derived variant at rs2228478, while the Denisova and Neandertal sequences have the ancestral variant. The three derived SNP variants linked to the ancestral variant at rs2228478 give some resolution inside this branch of the MC1R genealogy, but Harding and colleagues found these derived alleles to be very low frequency within the samples they studied. The comparison therefore isn't surprising, but it is illuminating.

I started scanning the noncoding region upstream of MC1R, which was sequenced in a sample of humans by Makova and colleagues Makova:2001, but I didn't get too far into it. It's sort of rough comparing older sequence data to a genome assembly, because people often numbered across gaps in their sequences without noting them. At that time, the exact sizes of gaps were often unknown, particularly if they included length polymorphisms. So, using old data means realigning, which isn't what I'm up to right now.

I'll get back to this region, though, because it has rather an old coalescent with a fairly deep root outside Africa.