I've gotten the same question a few times, and have seen it elsewhere, so I thought it would be worth a short post to explain it. And for those readers who've also been asked this question, I thought that being able to provide a simple explanation might be a great help.
How can we say that today's non-Africans derive 1-4% of their genomes from Neandertals, when we are 99.86% genetically similar to Neandertals? Or 98% similar to chimpanzees? I mean, how do we have 4% to work with to make this estimate at all?
Let me explain with another example.
You are approximately 99.9% similar to any other random human today. You're just a bit closer to your relatives, because you got some of your DNA directly from them, or you both share DNA from an immediate ancestor.
Your great-grandmother on average gave you 1/8 of her genes, making up on average 12.5% of your genome.
You are more than 99.9% similar to your grandmother, on average, yet she contributed only 12.5% of your genome.
In other words, these percentages are different things -- the fraction of your total ancestry you can trace to her, versus the fraction of base pairs you actually have identical to hers. Your genome is much more identical to hers than can be explained solely by your descent from her -- this is because you share other ancestors in common with her, and because mutations don't happen very often.
Or, think about it the opposite way. Suppose that the 12.5 percent of your genome you inherited from your great-grandmother meant that you were only 12.5 percent genetically similar to her. Where did you get the rest of your genes from? A turnip? No, you got them from other people, all of whom are roughly 99.9 percent like your great-grandmother. You're 12.5 percent more like your great-grandmother than you are like randomly chosen people in the population.
For the Neandertals, we have to separate these two kinds of similarity, sorting out the genes that we must have inherited from them, from the ones that we share because we share a more distant ancestry.
Now, suppose we don't know that this woman was your great-grandmother, that it's only a hypothesis. It's kind of thing a forensic anthropologist might want to figure out, if your great-grandmother was Anastasia. We can answer the question in this way: Test the hypothesis that she's unrelated to you, by examining whether you are equally genetically similar to her as you are to the average, randomly chosen individual from your population.
This is a statistical test. In fact some of the people in the population share more genetic similarity with you than others, and our statistic has to account for that variation. We can put whatever level of statistical confidence on it we like. If your putative great-grandmother shares substantially more with you than all but some very small fraction of people, we may conclude that she is your relative.
We might do substantially better -- if the variation in the population doesn't work against us, we might even conclude that she is in fact a third-degree relative who contributed between 10 and 14 percent of your genome. Or even better.
Our conclusion has to depend on the structure of the population. If randomly chosen people tend to look like you, for some reason of population structure, we'll have to model that population structure directly. This is, of course, what was done in the case of the Neandertal genome -- a specific population model was significantly favored by the data, and alternatives that did not include population mixture were demonstrated to be so unlikely as to be essentially impossible.
And as I pointed out the other day, if Neandertals had not donated any genes to later populations, then the most recent common ancestors of human and Neandertal genes would all be earlier than the divergence of those populations, more than 250,000 years ago. It is the observation of chromosomal segments that are identical or very near some living human chromosomes that shows that, for some genes in some living people, the Neandertals are not different enough. We have to have some of their genes.