Nuclear insertions of mitochondrial DNA from Denisovans

A paper last week by Robert Bücking and coworkers trawled through the recently-sequenced Indonesian Genome Diversity Project dataset looking for snippets of mitochondrial DNA (mtDNA) that have been inserted into the nuclear genome. These snippets, called “NUMTs”, arise every so often as a result of DNA transfer from the mitochondrion into the chromosomes.

No, I don’t think you pronounce this “numpty”. These insertions are a cool indication of ancient population diversity, because they sometimes preserve ancient mtDNA variation that has become extinct.

The paper, “Archaic mitochondrial DNA inserts in modern day nuclear genomes”, is in BMC Genomics.

NUMTs occur in many kinds of organisms. They are not typically functional, making them one of the many components of junk DNA. In the nuclear genome, they evolve very much like other noncoding sequences, which means they are often subject to mutations including insertions and deletions. But once in the nuclear genome, the rate of change by mutations is quite a bit slower than in the mitochondrial genome, which means the NUMTs can act almost like a “fossil record” of ancient mitochondrial variation.

From the background section of the paper:

In the human reference genome, a total of 755 NUMTs have been identified [7]. In addition to these NUMTs, many more polymorphic NUMTs have been detected in various human populations around the world [8] and the analysis of additional populations is expected to reveal many more polymorphic NUMTs.

Most of the 755 in the draft reference genome are fixed in human populations, but a small fraction are polymorphic. This polymorphic/fixed ratio is a reflection of the high rate of genetic drift throughout most of human evolution, up until the expansion of modern human populations. The citation [8] above is to a 2014 paper by Gargi Dayama and coworkers, “The genomic landscape of polymorphic human nuclear mitochondrial insertions”, which surveyed the 1000 Genomes Project samples for polymorphic NUMTs, finding an additional 141, which suggested that more would eventually be found by sampling more populations.

Nuclear insertions of mitochondrial DNA are tricky to find. They can represent any part of the roughly 16,000 base pairs of the mtDNA, and many of them are less than 300 base pairs. Short-read sequencing methods tend to align reads of NUMTs with the mtDNA, so it takes some close study of the flanking sequences to confirm that these are present in the nuclear genome. NUMTs that are longer than the short reads of the sequencing platform could not be fully examined in this paper; instead they considered only around 1000 base pairs from each end.

Here are the conclusions of the new paper:

We modified an existing method to detect NUMTs in next-generation sequence data, and applied the method to whole genome sequences from Indonesians and Papuans, in order to detect NUMTs arising from archaic human mtDNA. In high coverage genomes, an average of 16 NUMTs per individual is detectable. Most of these NUMTs seem to be population specific, indicating their insertion in recent human history. This finding further supports previous findings of an ongoing transfer of mtDNA to the nucleus in humans and suggests that the analysis of additional populations would lead to the discovery of many more NUMTs. A Denisovan NUMT could be identified in 16 samples from Indonesia and Oceania. Analyses of the flanking region of this NUMT reveals that it is part of a Denisovan haplotype. This suggests that the insertion of the NUMT most likely happened in a Denisovan individual and then introgressed into modern humans within nuclear DNA. Our pipeline can be applied to newly sequenced genomes in the future, which could reveal additional archaic NUMT insertions and new insights into the nature of interbreeding events.

The paper caught my attention because of the discovery of a Denisovan-origin NUMT. The analysis suggests that the NUMT was originally part of the mtDNA of a Denisovan individual and that it was incorporated into the nuclear genome in an ancient Denisovan sometime before their mixture with modern humans. This insertion is designated as NUMT 3_1384 in the paper.

NUMT 3_1384 is present in 15 samples from eastern Indonesia and New Guinea (Additional file 1: Table S1). A sequence of 251 bp was generated, which is identical to two Denisovan mtDNAs. It forms a clade with Denisovans and Sima de los Huesos, distinct from all other humans (Fig. 3e) and falls outside of all modern human and Neanderthal variation (Fig. 4c). The alignment contains 13 variable positions within hominins (Additional file 3). For five of these positions, Denisovans and the NUMT share an allele which differs from all modern humans. This suggests that it originated from Denisovan mtDNA rather than from mtDNA of a modern human or an ancestor of Denisovans and modern humans (Additional file 1: Figure S3).

It’s not very long at 251 bp. Across that sequence, the NUMT is identical to two ancient Denisovan mtDNA seqeunces and one nucleotide different from the other two. The closest Neandertal differs by four nucleotides, the closest modern human mtDNA by five. It’s interesting that the mtDNA that now exists as a NUMT in Indonesian individuals is so close to the Siberian ancient genomes—in other words, that it does not seem to reflect much clade diversity within the Denisovan population—since other evidence from across the nuclear genome suggests this population was very diverse. But that’s not too meaningful over this short part of the mtDNA genome.

I’m interested in the broader picture of NUMT variation. Here, one aspect is that Denisovan-origin NUMTs are not the only components of archaic variation. More ancient parts of the modern human mtDNA tree and deeper ancestral populations are also represented among these NUMTs. The paper identified three polymorphic NUMTs that appear to be outgroups to the present-day variation of human mtDNA but closer to modern than Neandertal or Denisovan mtDNA sequences. These insertions into the human nuclear genome are fossils of ancient African mtDNA variation. They may represent the diversity of ancestral African groups that contributed to the modern human gene pool but did not survive within our bottlenecked mtDNA variation. Or they may represent archaic populations of Africa that, like Denisovans, contributed only a small fraction of the genetic variation found today. Unfortunately, these NUMTs are short and don’t give a great deal of information that would enable possible identification of the time when they entered the nuclear genome.

There is a lot of promise for this approach to highlight additional mtDNA variation from past populations. This paper did not look at NUMT sequences that originated from within the known modern mtDNA tree, but those may have a lot of information about the connections between the mtDNA tree and nuclear genomes in past populations. After all, any mismatch between a NUMT found in a population and its present mtDNA variation suggests ancient population contacts and partial replacement of maternal genealogical lines.