The Mayflower criminal registry

Of some interest with respect to DNA databases and privacy concerns: "DNA links 1991 killing to Colonial-era family".

The DNA sample was taken in the death of 16-year-old Sarah Yarborough, who was killed on her high school campus in Federal Way, Washington, in December 1991. The King County Sheriff's Office has circulated two composite sketches of a possible suspect -- a man in his 20s at the time with shoulder-length blonde or light brown hair -- but had been unable to put a name to the sketch.
In December, though, the department sent the DNA profile to California-based forensic consultant Colleen Fitzpatrick. Fitzpatrick compared the profile to others in genealogy databases and found the closest match was to the family of Robert Fuller, who settled in Salem, Massachusetts, in 1630 and had relatives who came over before him on the Mayflower.

This is a Y chromosome match based on the genealogical research of people who may be completely unknown to the "suspect". Fitzpatrick offers that a Y-chromosome match may be expected to share a surname, which is probative in the forensic situation. Obviously there are many possible scenarios in which such information will not lead to discovery of a suspect: the chance of non-acknowledged paternity events across 200 years is very high. I don't view the result as strongly actionable, but I do think it raises important questions about the future of genealogy databases.

We are near the time when whole-genome sequencing will make this kind of identification much more likely because unique genetic matches to 3rd and 4th degree relatives will be plausible. Finding a handful of rare mutations shared between a crime scene sample and an individual in a whole-gneome database would be a strong indication of a relationship. It's possible that the databases for whole genomes will grow faster than the technology will allow reliable whole-genome sequencing from a crime scene sample. So in this case, the issues with database use may be primary.

It would be an interesting exercise to estimate the fraction of unknown samples from crime scene Y chromosome and mtDNA that could be matched to a 10th-degree relative in the Genographic (or any other large) dataset.