testing

Popular Science has a short article covering recent research into European population structure:

To the delight of genealogy buffs like me, scientists recently announced in the journal Nature that they can trace European ancestry to within 192 miles by analyzing tiny inherited markers in DNA. That means someday we'll need look no further than our own genes to locate our motherlands.

The study, and another much like it in a recent issue of Current Biology, harnessed the stream of human genetic data now being gathered by pharmaceutical companies. Using modern “gene chip” technology, researchers can screen 500,000 units of DNA at once. The companies use the data to investigate the genetic basis for adverse drug responses. But population geneticists are taking advantage of the high-resolution databases too, scouring them for trends in human evolution that are otherwise hard to find.

That's me, I guess. They're calling it a "DNA GPS", which is pretty clever.

Filed under

If you're interested in athletic performance and genetics, read Daniel Macarthur on ACTN3, sprinting, and Jamaica:

At this point I probably should confess to having a more than casual interest in this story: I was one of the authors on the first study showing an association between this gene and elite athlete status back in 2003, and this gene has been the central focus of my research for a good part of the last six years.

...

It is almost certainly true that Usain Bolt carries at least one of the "sprint" variants of the ACTN3 gene, but then so do I (along with around five billion other humans worldwide). Indeed, I'm fortunate enough to be lugging around two "sprint" copies - but that doesn't mean you'll see me in the 100 metre final in London in 2012.

Daniel Macarthur, of Genetic Future, reviews the amount of information required to store genomic information. Naturally, you'd probably think it was around 12 billion bits (2 bits per base pair), but sequencing technologies and the availability of references from other people make things a little more complicated.

This interesting quote about the raw image files generated by the Illumina platform presents some of the range of complications:

Almost as soon as these images are generated they are fed into an algorithm that processes them, creating a set of text files containing the sequence of each of the fragments. The image files are then almost always discarded. Why are they discarded? Because, as you will see in a minute, storing the raw image data from each run in even a moderate-scale sequencing facility quickly becomes prohibitively expensive - in fact, several people have suggested to me that it would be cheaper to just repeat the sequencing than to store these data long-term.

An accurate read requires lots of redundant bits, which adds up to lots and lots of data storage. If these are winnowed down to a real "best" sequence, then you're back to 12 billion bits (=1.5 gigabytes), more or less. Of course, most of that sequence is redundant and may be significantly compressed. And if you compare with a reference sequence, really a small amount of information is sufficient to distinguish your genome compared to the reference. Anyway, all this is explained at the link.

The road to prophylaxis

Jane Brody writes about hereditary cancers, and genetic testing. It's sort of a self-education kind of piece. The theme is the extreme: radical surgeries that can nearly eliminate the chances of cancers that otherwise would be near-certain:

Dr. Coit described a family in which the father and his father both developed thyroid cancer linked to the RET mutation. The younger man's 6-year-old son was tested and found to carry the same damaged gene. Because the boy was certain to develop thyroid cancer, most likely at a young age, his thyroid was removed. Although the boy will need to take thyroid hormone for the rest of his life, the surgery reduced to zero his chance of developing this often fatal cancer.

The last part of the article is the warning section. Most interesting: Your relatives might sue you if you fail to tell them about your positive genetic test result.

Also, there's a warning about bogus genetic tests, for "what kinds of food to eat" and other stuff. I'll have a bit more about that later -- there are a lot of charlatans out there making hay out of the current rise in genetic testing.

I should point out that the description of these extreme cases does little to educate people about the much more common situation, where a "risk variant" may confer a very slight increased risk of a condition.

Filed under

"Blood Matters" review

The NY Times is running a review by Jennifer Senior of the new book, Blood Matters, by Masha Gessen. The book details Gessen's journey through modern-day genetics and medicine. She learns that she carries a mutation to the BRCA1 gene conferring a very high breast cancer risk, and is then faced with figuring out what to do about it.

Gessen suddenly became part of "a cancer caste" (her term), a "previvor" (the community's term) with terrible choices to make posthaste. Specifically, she had to decide whether to keep her breasts and ovaries or have them prophylactically removed....So Gessen ditches the counselors and the doctors and instead tries to collect information from a wider, more eccentric variety of sources. A sympathetic nurse scientist at the Dana-Farber Cancer Institute concedes that close surveillance might be a reasonable alternative to surgery for some women. Nancy Etcoff, a psychologist at Harvard Medical School and the author of "Survival of the Prettiest," a study of the evolutionary importance of beauty, points out that while breasts are central to female attractiveness, attractiveness and happiness are barely correlated. Most memorably, an instructor in a psychology and economics class at Harvard attempts to "express life in numbers" for Gessen on an Excel spreadsheet, assigning values to living with cancer, living without cancer and living with the stench of a cancer threat. Though she rejects some of his findings -- I will not say what Gessen ultimately chooses to do -- she leaves his office feeling light, unburdened: "I jumped on my bicycle and sped home, making currents in the puddles, getting soaked, feeling strong and a little silly and generally like my life had a utility of 100 a year, possibly even more, now that I also felt that much more competent for being able to put a number on the value of riding in the rain."

I'm looking forward to reading this book.

Filed under

"I'd rather spend my money on my genome than a Bentley"

Amy Harmon profiles Dan Stoicescu, a Swiss-living millionaire who has become the first paying customer of the genome-sequencing company, Knome.

Mr. Stoicescu said he worried about being seen as self-indulgent (though he donates much more each year to philanthropic causes), egotistical (for obvious reasons) or stupid (the cost of the technology, he knows, is dropping so fast that he would have certainly paid much less by waiting a few months).
But he agreed to be identified to help persuade others to participate. With only four complete human genome sequences announced by scientists around the world -- along with the Human Genome Project, which finished assembling a genome drawn from several individuals at a cost of about $300 million in 2003 -- each new one stands to add considerably to the collective knowledge.
"I view it as a kind of sponsorship," he said. "In a way you can also be part of this adventure, which I believe is going to change a lot of things."

"Sponsorship" seems like a good way to look at it, as long as they don't start including companies' names in the sequence, like "Pepsi" on a high school scoreboard!

Filed under

DNA testing and health insurance

Amy Harmon brings several patients' stories to this article, "Fear of insurance trouble leads many to shun or hide DNA tests."

In some cases, doctors say, patients who could make more informed health care decisions if they learned whether they had inherited an elevated risk of diseases like breast and colon cancer refuse to do so because of the potentially dire economic consequences.
Others enter a kind of genetic underground, spending hundreds or thousands of dollars of their own money for DNA tests that an insurer would otherwise cover, so as to avoid scrutiny. Those who do find out they are likely or certain to develop a particular genetic condition often beg doctors not to mention it in their records.
Some, like Ms. Grove, try to manage their own care without confiding in medical professionals. And even doctors who recommend DNA testing to their patients warn them that they could face genetic discrimination from employers or insurers.

According to the article, many people are choosing to pay out of pocket for genetic tests to avoid insurance or medical involvement. If this precedent becomes more common -- people paying for single-disorder tests -- then companies that offer genome-wide SNP typing may have an easy time growing their market.

This one I hadn't heard about:

When the Equal Employment Opportunities Commission sued the Burlington Northern Santa Fe Railway for secretly testing the blood of employees who had filed compensation claims for carpal-tunnel syndrome in an effort to discover a genetic cause for the symptoms, the case was settled out of court in 2002.

That is creepy.

It seems likely that the insurance risk fear will be addressed soon by legislation:

The Genetic Information Nondiscrimination Act, which passed the House of Representatives by a wide margin last year, would prohibit insurers from using genetic information to deny benefits or raise premiums for both group and individual policies. (It is already illegal to exclude individuals from a group plan because of their genetic profile.) The bill would also bar employers from collecting genetic information or using it to make decisions about hiring, firing or compensation. But it has yet to reach the Senate floor.

The article deals with both kinds of fears -- the fear of insurance consequences, and the fear of testing itself. It ends with a woman who feared being tested for the BRCA1 mutation so much that she chose surgery to remove her ovaries. Before a double mastectomy, she had the testing anyway -- and learned that she did not carry the risk allele after all.

UPDATE (2008-02-24): Hsien-Hsien Lei picks up the story also, and adds a perspective from Britain:

Two years ago, Cancerbackup found in a survey of regional genetics centers that waiting time for appointments to receive a BRCA genetic test can be as long as nine months with a further wait of 1 to 2 years for results. In some ways, this could be construed as discrimination in that other forms of testing are probably taken more seriously and performed more speedily.

She also provides a raft of links to other blogs that have posted on the Harmon story.

Filed under

Hunting for your child's DNA doppelganger

Maybe you believe you have an identical twin somewhere. Or if not a twin, at least someone who looks a lot like you, a doppelganger. Someone who looks like you, sure. But maybe also someone whose life has curiously paralleled your own.

There are parents who would like nothing more than such a doppelganger for their children. Amy Harmon reported last month on some of them -- parents of children with rare novel mutations.

Every person is born with a handful of new deleterious mutations. Chances are you will never notice yours. Their effects may be small, and many are recessive, needing two copies to show their bad effects -- and of course, since they're new, you only have one copy. If you have children, you will pass half your new mutations to each of them, and they will have their own new ones.

Such new mutations do not cause a large proportion of the recognized Mendelian genetic disorders. The most common disorders are caused by mutations that have been passed down in families over many generations. Some, like the X-linked hemophilia inherited by the descendants of Queen Victoria, have been recognized from pedigrees of relatives. A few are much older. For example, nearly all cases of variegate porphyria in South Africa result from an allele that arrived in Cape Town in 1688. Three hundred years of Afrikaner porphyria have been tracked through pedigrees to one woman, Ariaantje Adriaanse, who carried the mutation.

If your child has a well-known Mendelian disorder, chances are you can find a support group -- possibly locally, but certainly on the internet. These support groups have become very important resources, with everything ranging from dietary advice, results with distinct combinations of treatments, to forums for commiserating about other people's reactions to their kids' unique needs. Some of the recognized Mendelian disorders are often caused by new mutations: for example, half of all cases of neurofibromotosis (roughly 1 in 10,000 people) result from new mutations.

But there are a large set of new mutations that only occur in a tiny fraction of births. Many have only recently been discovered, as genome-wide screens for variation have become possible. Harmon's report focuses on a few of these kids.

A decade ago, these kinds of mutations were unrecognizable, and kids were lumped into broad categories like "developmentally disabled." The most obvious such category is "autism spectrum," which as the name "spectrum" implies, includes a variety of developmental challenges and a range of outcomes. As Harmon's article points out, geneticists are slowly unraveling the different causes of autism. A small fraction of cases result from these rare new mutations, such as 16p11.2.

These genetic results have started to allow families to find other children who are genetically and phenotypically similar to their own. Even if treatments or interventions are not available, this can give parents some idea about the course of their childrens' future development. That's a step forward for some families who can't find their ground easily in the large pool of children on the autism spectrum. What things may be effective with some developmental pathways may have no effect at all on others.

For three families, the impulse to find others in the same situation was immediate.
A few months before the Lanes crossed the state to meet Taygen's chromosomal cousin, Jennie Dopp, a mother in Utah, was scouring the Internet for families with "7q11.23," the diagnosis that explained her son's odd behavior and halting speech.
"I want someone to say `I know what you mean,'" Ms. Dopp told her husband, "and really mean it."
Noa Ospenson's parents flew from Boston to South Carolina for a meeting of 100 families with children who, like Noa, are also "22q13." Hoping for more information about their daughter's diagnosis, they emerged as lifetime members of what they call "Noa's tribe."
For each of them, a genetic mutation became the foundation for a new form of kinship.

This formulation of kinship is more real than many that anthropologists study, because here there is a real connection between gene sequences, although not a "genetic" one in the traditional sense of origins. But there are many forms of analogical kinship in human societies. It is natural, perhaps, the extent to which we are adopting new gene-centered forms of kinship -- from adoptees using genetics to search for their biological relatives, to genealogy buffs sending their DNA to find their distant genealogy buff kin.

For the families described in this story, it is really a case of trying to find a lost part of their own child's biological story. The very things that make their children different from their kin may make them similar to someone else's children. In many cases, finding such connections can be an enormous relief.

The results are not always happy, though:

And then they went to the biennial meeting of 22q13 families in July 2006. But that first day, in Greenville, S.C., they wondered if they had made a mistake.
Few of the children, even the handful of teenagers, were toilet trained. Some had never gained the use of their hands, which had stiffened into a claw-like shape. Many were chewing on rubber tubes or "chew rags," to keep them from shredding their clothes.
Ms. Perlson, a communications consultant, and Mr. Ospenson, a computer analyst, attended sessions on one of the genes that Noa is missing, which codes for a protein crucial to neurological development. They learned about the health problems, like seizures and kidney failure, that Noa might face in her 20s. The window onto her future was hard to digest.

It's a good article, following a number of these stories. I think it's an important view of today's human genetics -- not only are we increasing our knowledge of the origins of common mutations, but we are also increasing the number of rare ones that we know about. Ten years ago, few imagined that these structural gene variants could be an important element of human variation. Now we know that insertions and deletions of genes may account for an important fraction of phenotypic variation in humans.

Filed under

Human genetic diversity named top breakthrough

This week's Science includes an article by Elizabeth Pennisi naming "Human Genetic Variation" as the science breakthrough of the year.

Less than a year ago, the big news was triangulating variation between us and our primate cousins to get a better handle on genetic changes along the evolutionary tree that led to humans. Now, we have moved from asking what in our DNA makes us human to striving to know what in my DNA makes me me.
Techniques that scan for hundreds of thousands of genetic differences at once are linking particular variations to particular traits and diseases in ways not possible before. Efforts to catalog and assess the effects of insertions and deletions in our DNA are showing that these changes are more common than expected and play important roles in how our genomes work--or don't work. By looking at variations in genes for hair and skin color and in the "speech" gene, we have also gained a better sense of how we are similar to and different from Neandertals.

This is a very wide story, encompassing distinct studies that really have little to do with each other. For example, the restless leg syndrome gene association study doesn't connect in any simple way to the study of selection on amylase copy number variants, both mentioned in the article.

But what has actually changed in the last two to three years is the availability of large-scale genotyping of marker arrays. These underlie the HapMap and have enabled genome-wide association studies. The article puts these methods together with large-scale sequencing projects, like ENCODE and the Personal Genome Project. These sequencing efforts haven't yet given rise to a clear picture of diversity, but mainly because they haven't been around as long.

Probably the most important aspect accelerating research is the public accessibility of these datasets. Once a whole lot of people are using the same kinds of data, tremendous new synergies become possible. Of course, that also raises a frightening specter to many people -- if anyone can use the data generated by these projects, that will include people with a diversity of objectives.

Hence, discovering and characterizing human genetic diversity comes as a two-edged sword to many people. Demonstrating significant health impacts of human diversity has the potential to bring a truly individualized treatment of disease and reduction of risk. But recognizing that people are really different demands that we develop a more sophisticated approach to teaching human genetics. The "99 percent chimpanzee" and other factoids about human similarity are no longer sufficient in the age when anybody can scan freely-accessible genomes.

In a related article, Jocelyn Kaiser covers the prospects for personal genomics:

A glimpse of one's genome is already within the reach of ordinary people, thanks to several companies. They include 23andMe, which has financing from Google and may let users link to others with shared traits; Navigenics, which will screen for about 20 medical conditions; and deCODE Genetics in Iceland, a pioneer in disease gene hunting. For $1000 to $2500, these companies will have consumers send in a saliva sample or cheek swab, then use "SNP chips" to scan their DNA for as many as 1 million markers. The companies will then match the results with the latest publications on traits, common diseases, and ancestry.

A million SNPs is enough to do a lot. Not everything you would want to do, possibly. But almost certainly everything that it would be economical for the pharmaceutical industry to develop treatments or other approaches to address.

The articles don't make much of the evolution of all this human genetic variation, but as things develop there will be a heavy hand of recent selection in the results. Long, common associations are mostly there because of selection. These personal genomic approaches assume much about the process of human genetic diversification -- the more that recent variants have been selected, the more useful a "personal genomics" is likely to be.

Will the Watson "gotcha" moment bring down public genomics?

Another thing I didn't expect to see today: DeCode Genetics went looking through James Watson's genome sequence for evidence he is secretly black:

A new analysis of Dr. Watson's genome shows that he has 16 times the number of genes considered to be of African origin than the average white European does -- about the same amount of African DNA that would show up if one great-grandparent were African, said Kari Stefansson, the chief executive of deCODE Genetics of Iceland, which did the analysis.
...
Dr. Stefansson's company is one of several marketing genome scans that promise to reveal anyone's genetic propensities for disease, origins and more, for a price. Dr. Watson had already placed his own genome information online, as has another genetics pioneer, J. Craig Venter. Dr. Stefansson said he simply ran the data through his company's analytical system.
Dr. Stefansson said that because his company had not produced the original data, "I am reluctant, personally, to make much of the analysis." He added, however, that "on my face, it would elicit smiles."

I find this incredibly strange. Not that Watson may have a mixed ancestry -- ultimately, everyone's ancestry is mixed.

No, I find it strange that the leader of one of the major genetics firms in the world is cheerily showing one of the worst possible abuses of personal genomics, in the most high-profile way possible! I find it just flabbergasting.

Sure, you can argue that Watson deserves the abuse he's gotten, and that his genetic ancestry is legitimately related to the story about his race comments. I don't agree, but there's a sense to which all this couldn't happen to a better person.

But the entire reason why many people think public genomics is a bad idea revolves around privacy and informed consent. People want to believe that their genes won't be used against them -- that information about risk alleles won't be used to deny employment or insurance, for example. Information about one's ancestry clearly falls in that category: most people want to keep such information private.

Informed consent is a problem in public genomics because your genes are not only yours -- they are also the genes of your parents, children, and other relatives. When you make your gene sequence public, you are taking with it information about your kin, who may not want such information out there. At present, they have no way of stopping you -- they have to live with your decisions. Which has created ticklish situations: a number of anonymous sperm donors have been tracked down by their children, using the donors' relatives' DNA sequences available from genealogy testing services.

If you want to advance the field, then you want to find ways to build confidence that genomic data won't lead to these "gotcha" moments.

I mean, what is the purpose really of spreading a news story that Watson may be 1/16 African, without adding the context of how common this degree of genetic mixture has been in American history in particular, and between populations generally? Why would a geneticist working with humans not realize the ethical problem? It has exactly the same salacious quality as a story about a political candidate's ancestry -- remember the story about former senator George Allen's Jewish mother? I can't believe that a credible researcher would want to bring this to genomics.

Maybe this is a play to discredit public genomics and advance the idea of some kind of data security system. From Stefansson's quotes, it seems possible he is trying to make his company look good and other ideas, like George Church's sequencing project, look bad.

But somehow I doubt it was that closely thought out. Probably their zeal to "get" Watson carried them away, to the detriment of the field.

Filed under

Wherein the New York Times says Hawks was right

Nearly two years ago now I wrote a column for Slate arguing that DNA genealogy tests were misleading people. Here's what I wrote:

From a practical point of view, that is the biggest problem with today's genetic genealogy tests. In many cases, they can't tell you what you don't already know. And unlike DNA fingerprinting tests with error rates of one in a billion or less, the chance of misidentifying ancestral groups in these genealogy tests may be 5 percent or higher. With this chance of error, the test won't be wrong about a full Native-American grandparent, but it might be wrong about a great-great grandparent. In addition, SNPs that separate central Africans from northern Europeans aren't nearly as good at separating Ethiopians from Arabs. So, in the test results of some African-Americans, European means Europe, while in others, it may mean East African, or Arab, or Indian. Depending on where his African ancestors came from, Gates' apparently European origins might lie somewhere else entirely.

Now, here's what I find today in the Times by writer Ron Nixon:

Mr. Gates says his concerns [about genealogical testing] date back to 2000, when a company told him his maternal ancestry could most likely be traced back to Egypt, probably to the Nubian ethnic group. Five years later, however, a test by a second company startled him. It concluded that his maternal ancestors were not Nubian or even African, but most likely European.
Why the completely different results? Mr. Gates said the first company never told him he had multiple genetic matches, most of them in Europe. "They told me what they thought I wanted to hear," Mr. Gates said.

It's entirely predictable from the samples and methods these companies are using. It's also a case where there are a lot of vested interests in being able to give people the results they want to hear. Nixon quotes Troy Duster:

"My concern is that the marketing is coming before the science," said Troy Duster, a professor of sociology at New York University who was an adviser on the Human Genome Project and an author of the Science editorial.
"People are making life-changing decisions based on these tests and may not be aware of the limitations," he added. "While I don't think any of the companies are deliberately misleading customers, they may have a financial incentive to tell people what they want to hear."

Don't make life-changing decisions based on these tests! They can't tell you what tribe your ancestors came from. Period. Mitochondrial lineages have widespread distributions across Africa, and are not -- in most cases -- limited to any small region. That's the science.

In the meantime, I have been hearing from a number of readers who have paid for the Genographic Project and are dissatisfied. I'm collecting these stories, as well as stories from people who are happy with the results. Feel free to send me an e-mail if you're in either group.

Filed under

DNA tests split immigrant families

I missed this story about immigration and DNA testing when it was printed earlier this year. The story looks at some personal stories of immigrants who have had their DNA matches to family members outside the U.S. tested, as part of their attempts to win them entry to the country. Seeing the article linked at Eye on DNA, I found it really heartbreaking:

For Isaac Owusu, a widower, the revelation has forced him to rethink nearly everything he had taken for granted about his life and his family.
It has left him struggling to accept what was once unthinkable: that his deceased wife had long been unfaithful; that the children he loves are not his own; and that his long efforts to reunite his family in this country may have been in vain.
The State Department let his oldest son, now 23, come to the United States last fall, but said the others -- a 19-year-old and 17-year-old twins -- could not come because they are not biologically related to him.

The article claims that such non-matches among immigrants who undergo testing are very common:

But Mary K. Mount, a DNA testing expert for the A.A.B.B. -- formerly known as the American Association of Blood Banks -- estimates that about 75,000 of the 390,000 DNA cases that involved families in 2004 were immigration cases. Of those, she estimates, 15 percent to 20 percent do not produce a match.

Some part of that proportion is explained by women who have been raped as refugees; others are the usual story -- men who were always sure they were the father, except they weren't.

Immigrants are not required to take these DNA tests, and negative results do not preclude family members from entering the country -- adoption being one solution. But the stories are poignant, with people discovering they are not always who they thought they were.

Filed under

Full frontal genomes

In Erika Check's Nature article on celebrity genomes, she includes a passage in which Francis Collins points out a problem with public access to private genomes:

But it's not clear that all of the genome pioneers are acting altruistically. Watson said at the Cold Spring Harbor meeting on 10 May that he has not asked either of his grown sons for permission to publish his genome sequence, which 454 has said will be publicly posted in some form. That has raised questions about the responsibility of sequenced individuals to family members who share their DNA.
"This will be a challenging question, because if you're planning to put this information in a truly open database, there are issues of risk not just to you, but to your relatives," Collins says. "Jim clearly felt those risks were not such as to cause him to take action on them."

Putting your genome information online is not only about you: it includes half the genome of each of your children, half the genome of your parents, a fourth that of your grandchildren, nieces, and nephews, and so on.

I wrote about this problem two years ago, linking to a New Scientist article that described how a young man had tracked down his biological father -- using DNA samples put online by the man's relatives.

The boy paid FamilyTreeDNA.com $289 for the service. His genetic father had never supplied his DNA to the site, but all that was needed was for someone in the same paternal line to be on file. After nine months of waiting and having agreed to have his contact details available to other clients, the boy was contacted by two men with Y chromosomes closely matching his own. The two did not know each other, but the similarity between their Y chromosomes suggested there was a 50 per cent chance that all three had the same father, grandfather or great-grandfather.

OK, so this particular situation must be pretty rare. But it is a good example of a case where a parent and child may have divergent interests with respect to genetic information. On the obvious level, the son wants to discover his father's identity while the father may want to conceal it. On the not-so-obvious level, a grandfather may want to find children that his son may have fathered, irrespective of the father's wishes. The father in question might even be dead, might have specified in his will his wishes for all sperm donations to remain private, but a grandfather can easily circumvent those wishes through the simple expedient of publicizing his DNA profile.

Families with inherited genetic conditions are already dealing with these privacy issues, such as mothers who don't want a Huntington's test and daughters who get it anyway, revealing the mother's status (my post earlier this year, referring to Amy Harmon's NY Times article). Whole-genome scans for most people will not reveal the same, tragic, level of risk, but will generate hundreds of smaller questions -- like a load of tiny skeletons-in-the-closet.

This week in Science, Collins and coauthor William Lowrance expand on the problem. Their "Policy Forum" article notes existing U.S. federal law and regulations concerning personal data and the problems that genomic information is likely to generate in the current legal context.

Until recently, most genomic research used data and biospecimens obtained fairly directly, from the data subjects themselves or clinical repositories or specialized research collections. This will continue, as it has many advantages. But now, in efforts to increase the range and quantity of data, large-scale research platforms are being built that assemble, organize, and store data, and sometimes biospecimens, and then distribute these to researchers (see figure). The advantages of such platforms, in addition to scale, are that they can be a robust staging-point for screening data quality, fostering uniformity of data format, and facilitating analysis. Some platforms accumulate data directly (as the Framingham Heart Study does); others assemble them from a variety of sources (as The Cancer Genome Atlas, the Genetic Association Information Network, and the Wellcome Trust Case Control Consortium do and U.K. Biobank will) (7). Among the design and governance issues are whether and how to de-identify the data and at what stages to conduct scientific and ethics review.
These new data flows, genomewide analyses, and novel arrangements such as the Informed Cohort scheme recently proposed by Kohane et al. (8) are relatively uncharted territory with respect to human subjects and privacy considerations. Precedent doesn't provide sufficient guidance. For example, the Human Genome and HapMap Projects have geno-typed DNA from only a few hundred carefully selected people who prospectively consented to the analysis and to open publication after thorough explanation, discussion, and community consultation. The projects have been scrutinized closely all along. But when the data relate to more people (by orders of magnitude) or to retrospective analysis of biospecimens, then for pragmatic reasons such painstaking selection, consent negotiation, and scrutiny can't generally be achieved (Lowrance and Collins 2007:600).

The article does not really arrive at any conclusions about what should be done -- Lowrance and Collins limit themselves to a fairly dry listing of potential problems and conditions leading to them. Throughout, they emphasize the reliance of the current regulations on "de-identification" -- that is, the removal of most identifying information from sequences or samples. Under today's U.S. guidelines, data that have had identifying information removed may be used quite broadly without further consideration of human subjects protections:

Construal of genomic "human subject." If data have been de-identified but include large amounts of genetic information, are the individuals still considered "human subjects"? The answer has important implications for consent, ethics review, and safeguards. McGuire and Gibbs have urged that "genomic sequencing studies should be recognized as human-subjects research and brought unambiguously under the protection of existing federal legislation" (22), but this could be unnecessarily extreme. In the United States, the Office of Human Research Protections considers that data or biospecimens collected for one purpose but then key-coded and used secondarily for research are not "individually identifiable," and therefore the research is not human-subjects research (7). This is a strong incentive to support de-identification and to de-identify data (Lowrance and Collins 2007:602).

Lowrance and Collins mention that "de-identification" is by no means as simple as applied to substantial parts of genomes, particularly when accompanied by phenotypic data such as redacted medical histories. Routine data-mining techniques would be sufficient to identify individuals within medical research studies; matching individual genome profiles to a name may be accomplished without need to match data to a "key" if the information is unique enough.

I favor the protection of individual privacy over greater research access to research data, particularly since DNA sampling and data retention by governmental agencies has become increasingly routine. In a post directly before her Personal Genome Project Q&A, Hsien-Hsien Lei wrote "Police want to collect abandoned DNA from everyone," noting that UK police will soon have authority to collect DNA with the same legal standing as trash -- if you throw it away, it's not private. We have to assume that governments will keep multiple databases of DNA barcodes for people, that these will include other personal information, and that they will be insecure. One may argue that most of the privacy threat actually comes from these other databases, and that personal genome information adds relatively little. Nevertheless, it would be better to add nothing at all, or to generate new models accentuating security.

Since I've been thinking about information theory a lot lately, I can't help but think that some kind of cryptographic solution should be applied -- so that nobody can read a person's sequence data without her private key. A person might choose to opt-in to research studies or other projects that require genotyping data, but still the sequence would be secured by encryption.

The objection to such an approach is that large-scale, long-term studies of health attributes require samples of many thousands -- even tens or hundreds of thousands of people. Today, these datasets are routinely deindividualized and dispersed around the world to researchers involved with many different projects. There is little chance of centralized control over this information after it is dispersed -- and Lowrance and Collins describe the potential problems with changing the system. With so many participants, the genotype data are a tempting target for black-hats. Any very large-scale study, in which hundreds of researchers have access to deindividualized data, there are many chances for unscrupulous researchers to steal information or put it in situations where theft by outsiders may occur.

But practices can be implemented to reduce the risk of data loss or theft. For one thing, the main reason why those studies need so many participants is because they are waiting for people to have rare adverse health events, and don't want to wait so long for results. So they really only need to know genotype data for the small group of people who have these conditions. If decryption is restricted to such small groups of study participants, the risk of unauthorized data access would be greatly reduced.

No system is perfectly safe, but in this case the agglomeration of data from thousands or millions of individuals in single databases leads to risks that scale nonlinearly with database size. So reducing the size of data chunks available to any one person may be a significant protective step.

References:

Lowrance WW, Collins FS. 2007. Identifiability in genomic research. Science 317:600-602.doi:10.1126/science.1147699

Check E. 2007. Celebrity genomes alarm researchers. Nature 447:358-359. doi:10.1038/447358a

It's nada until they have Larry King

Back in May, Nature ran an article (non-free) titled, "Celebrity genomes alarm researchers," by Erika Check. The article's premise:

Genome researchers are questioning the plans of some of their number to stage high-profile releases of their very own genome sequences.

The article lumped together at least four distinct sequencing efforts, including Venter's sequencing of his own genome, 454 Life Sciences sequencing of James Watson's genome, the "privately funded" Personal Genome Project, and the Archon Genomics X Prize. The first two are already complete; the others are still ongoing, and details about progress have been relatively quiet.

The "freakshow" aspect of Check's Nature article was supplied by the Archon X Prize. This $10 million award is only a proof-of-technology test: sequence 100 genomes in 10 days, for less than $10,000 per genome, and you win the prize. But that's really too dry to make headlines -- nothing so interesting as the spaceflights that won the Ansari X Prize, so they've instituted a super bonus round:

The prizewinner can claim a $1-million bonus by sequencing a list of 100 individuals, including people nominated by disease advocacy groups, and celebrities such as television journalist Larry King, cosmologist Stephen Hawking, Google co-founder Larry Page, Microsoft co-founder Paul Allen and former junk-bond trader Michael Milken.

Will the winning group go for the bonus? Who knows? If their technology hits the $10,000-per-genome price point, the $1 million bonus just pays for the 100 bonus genomes. So it's not exactly a bonus. Still, a company that saw the headlines 454 got for delivering Watson's DNA on DVD will probably salivate at the chance to do the same for Stephen Hawking.

Michael Milken, not so much.

But it's all quite obvious that when complete genome sequencing is first made available, rich people will be among the first to have them. And since many rich people are also famous, we'll be hearing about the rich and famous. But we won't be hearing about them too soon, because it will be a while before the technology gets to the X Prize level.

Which leaves us with the more interesting project in the short term -- the Personal Genome Project (PGP). This has its detractors also, because of the decision to sample well-known geneticists as volunteers, instead of anonymous donors or, well, non-geneticists.

Check's article lumped this criticism together with the celebrity angle in her article -- one of the reasons I didn't link it at the time. For instance, the article included a quote from Michael Ashburner that clearly applies to the X Prize:

"I'd hate the availability of single-genome sequencing to be based purely on money and fame," says Michael Ashburner, a geneticist at the University of Cambridge, UK. "Just doing famous or very rich people is bloody tacky, actually."

While a quote from Francis Collins appears directed toward the PGP:

"If all the sequences obtained over the next year or two are done on scientists with strong financial positions, that will send a message quite contrary to what the genome project aimed to achieve," says Francis Collins, head of the US National Human Genome Research Institute (NHGRI) in Bethesda, Maryland.

That's confusing. There seems to be a general feeling that it's unseemly not to sample ordinary people, since the hope is that everyone will benefit from genomics; but disdain toward celebrity sequencing only applies to a small part of the overall situation.

Plus, Collins is concerned with a policy question himself, since the NHGRI is going to sample its own set of 100 people:

The NHGRI is now planning to sequence about 100 individual genomes at its three publicly funded sequencing centres over the next couple of years. Collins says the institute will ask for scientific advice on who should be sequenced first. One question is what pool of sequenced individuals will yield the most useful information.

So that means at least two directly competing whole-genome sequencing projects going on right now, with a large prize waiting for the first private company that can lay a claim on it by sequencing DNA fast enough and cheaply enough.

The volunteers

So, why did I choose to write about this now? This week, the Personal Genome Project announced its first 10 sequencing volunteers. Nine of them are listed along with their bios on Blaine Bettinger's Genetic Genealogist blog. One volunteer did not choose to be listed publicly.

These are not celebrities. It is probable that if you're not a geneticist, you haven't heard of any of them. On the other hand, they are all accomplished people with substantial resumes -- some academic, many in business.

Esther Dyson went public with an op-ed before this week's announcement, listing her reasons for volunteering:

But what about the people who are less fortunate than me? I want to push questions about those less lucky to the fore -- and get us all to think about them. It's not just who gets health care and how it gets paid for, or whether employers can discriminate against people with certain conditions or just a greater-than-average propensity for them. What of someone who has a particular susceptibility to, say, alcohol? Does he pay an extra tax on booze? Or does he get a tax credit for behaving well, while a less susceptible person is denied the opportunity to benefit by behaving "properly"? (Subsidies and penalties cut both ways.) Should people have the right to refuse subsidized medical care and live as they wish? These questions may sound far-fetched, but they won't be once society knows enough information to start asking them.

From her description, it appears that the volunteers are not only donors but also stakeholders in the project -- in terms of directing its handling of results and protocols. Project leader George Church did an interview last year with MIT Technology Review that discusses his ideas at the beginning of the project:

TR: Are you recruiting participants for the pilot project? Who will be the pioneers?
GC: It took a year for us to get permission for the project from our institutional review board. The recruiting process will go in stages. The board asked that I start with myself because I am well-informed and could stop the project if I saw a problem. We will expand to two more people in March; and once we've worked out a mechanism to show that the benefits outweigh risks for the first three people, we can recruit more people. We have 140 people who would like to participate. The total number of participants [at this phase] will be limited by funds and by the review board's assessment of how it went. We are trying to get funds for a large number of people.
The initial participants will probably be tenured human geneticists, because they know the risks and other issues. Eventually, we want a broad, diverse set of people from different social and economic groups, and both healthy and unhealthy people. But they will need to be specifically up to speed on how genetics works. This could be something very big once people tune into it. Not many know people know about it so far.

Hsien-Hsien Lei has been following this story, and she has given some reasons why geneticists may be the best subjects for the initial project:

I don’t look upon the PGP-10 as people of privilege who got access to something that everyone wants but few people get like iPhones. They are actually guinea pigs doing something that few of us dare! Those commenting on the PGP-10’s money and fame come off green with jealousy. In their world, whole genome sequencing might be something of great value, but a general population survey will surely find more fear than desire.

It is clear from Church's description that there is really no alternative to people with substantial genetics training as volunteers, because informed consent on a project of this scale is extravagantly difficult to demonstrate. It is essential to the project that the subjects be public, because otherwise they cannot truly assess the risks of public genome information.

But the skeptic in me has to point out that not only are these volunteers trained in genetics, almost all of them are poised to profit if personal genomics takes off. Many are investors or founders of companies in the new field. Those who aren't are in a position to be at any time they choose. And some of them occupy academic positions with substantial power to influence potential critics. So collectively, they have a level of safety that other people typically lack, as well as a strong pecuniary interest in the project's success.

Kind of like that dude from Blade Runner with the Coke-bottle glasses. Which is not the best image for your friendly personal genome project....

I don't think it matters a bit if the first public genomes are all famous people. I mean, we've been looking at Venter's sequence for quite a while now. Heck, if we could get the genomes of all the Hollywood tabloid starlets, we could probably do some good by identifying genes that make them have unusual affinities for teeny-weeny dogs.

But if Paris Hilton and Ivanka Trump went to Las Vegas to help Steve Wynn with a secret project involving hotel design, we would probably figure their interests were not purely altruistic.

So, I actually think it will be a little comforting to see them churning out real celebrity genomes, because it will mean that the project is already successful. I assume that Oprah will be out there first -- I mean, not only was she early on the whole-body scan bandwagon, but she has already had her DNA taken for ancestry testing.

Hey, she can afford it now...maybe it's already on her fall TV schedule!

That would make the whole thing a write-off.

Filed under

Doggie doo DNA detectives

A few months ago, a particularly egregious neighbor dog left a gift on our lawn -- while my fascinated girls watched out the window. Naturally, I ran outside, shooed off the dog, used a plastic bag to pick up the steaming pile, and knocked on the neighbors' door. I think I interrupted the neighbor kids from their Playstation or something; in any event, the visits from the big brown dog abated for a while.

Now, Hsien-Hsien Lei tells me that technology may help with future dog-related problems:

Perhaps the animal control officers in Port Phillip, Australia would be able to help me out. They're being provided DNA kits for cases where a dog has attacked a human or a pet. They'll be collecting DNA evidence from fur, saliva, blood, and excrement. In 2004, the first Australian animal mauling case to use DNA evidence resulted in two dogs being destroyed for killing a Pomeranian. Their owner was fined $7,244.

Yes, it sounds more serious when you think of attacks or bites. No, I don't suppose too many people will spring for a DNA test on a fecal sample. But those poochie pyramids make me pretty irate -- and we have a toddler running around the yard who treats the local pine cones and rocks like a tasting bar!

Before we knew where our dog pest lives, we had to do the Nancy Drew thing to find out -- we just followed it one day. But now, a less spry person with a little cash to spare could just stock up on some tissue collection darts and set up a blind.

Oh, so what? That's not how you hunt dogs?

Filed under

Whose genes are doped for Beijing?

Gretchen Reynolds reports in the NY Times on the gene therapy treatment Repoxygen as a means of athletic enhancement:

It was a single line from a longer e-mail message. But when read into the record by prosecutors at the drug trial last year of the German track coach Thomas Springstein, it caused a sensation. "The new Repoxygen is hard to get," Springstein had written. "Please give me new instructions soon so that I can order the product before Christmas."
Until that day in the courtroom, Repoxygen was an obscure gene-therapy drug developed at a pharmaceutical lab in Oxford, England, to fight anemia. The lab shelved the product when it seemed unlikely to be profitable. Once it was mentioned in court in January 2006, however, Repoxygen vaulted to celebrity-drug status in Europe. Newspapers and Web sites ran dozens of stories about the imminent danger of the therapy. "The moment that e-mail was presented in open court," a columnist wrote in the weekend paper Scotland on Sunday, was when the "era of genetic doping . . . arrived."

I wrote about gene doping late last year, noting that the advent of these methods is essentially inevitable.

Filed under

$10,000 genomes? Don't get sick.

This is from the Nicholas Wade article on James Watson's genome:

Some scientists believe that it will be medically useful to sequence patients' genomes when the cost of sequencing falls to around $10,000 or less. Dr. Egholm said that with improvements already under way, the 454 sequencing machine would soon be able to sequence a human genome for $100,000. The cost of sequencing has been dropping so fast in the hands of groups like 454 Life Sciences and Solexa Inc., a subsidiary of Illumina Inc., that some technologists predict the $10,000 genome will be attained in a few years.

Doesn't $10,000 seem like an interesting price point? I've written a couple of times about the idea of $1000 whole-genome sequencing. Here's what I wrote last year:

My question is, why are they shooting for $1000? It seems to me that if you can go from $2.2 million to $1000, it won't take very much longer to go to $100, or even less. The materials cost and computational resources certainly won't cost that much in volume.
They are framing the cost in terms of the cost of a personal computer, but it wasn't so long ago that the "accepted" cost of a PC was over $3000, and now most buyers spend a lot less than $1000. So that's arbitrary too.
My guess is that the magic $1000 figure that keeps getting quoted is an attempt to prime insurers to expect that billing amount when the process becomes common. The question is not how much you would pay for a genome, but how much an insurance company would pay on your behalf. A lot of diagnostic procedures approach that billing amount, so it is a convenient pricing hook.
If I'm right, then you can place the $1000 genome in the same category as MRI scans and X-rays, neither of which is priced at what it is worth in materials or energy, but in terms of amortization of equipment and expert interpretation.

But $1000 won't be practical for quite a long while. So what are the implications of the $10,000 genome?

Remember that most young people just don't care very much about their genomes. Here's what I found of my students in 2005:

The results: only two would pay more than the price of a CD, around $16.00. Most didn't want the information at all --- they didn't see what possible use it could have for them.

In contrast to my undergraduates, an insurer would probably find a $1000 genome pretty useful, particularly for current customers. That's too expensive for screening potential clients, but well in the price range of procedures that they normally cover.

$10,000, on the other hand, is not in the usual range of diagnostic procedures. We have to think about what would make a genome worth that much more than a typical diagnostic procedure. It seems pretty obvious that you would only pay that much for a procedure if it had the potential of preventing something much more expensive. But for this much money, it can't be a mere long-term potential, it must be an immediate potential.

So we are looking for medical bills far in excess of $10,000 that would be prevented by a genome sequence. Read that carefully: bills that would be prevented, not diseases that could be cured.

It seems like the main application of a $10,000 genome sequence would be to prevent people from having expensive surgeries, particularly transplants.

Suppose you are an insurer that might normally approve a $250,000 transplant surgery, with a 30 percent failure rate. For $10,000, suppose you could gain some better prediction of long-term survival or organ rejection rates. So you implement a required whole-genome screening before approving surgery, and require that the patient's genetic "risk" factors don't exceed some threshold. If you eliminate 10 percent of surgeries, your investment in whole-genome sequencing yields a 250% return -- assuming the cost of care without surgery approximates that after failed surgeries.

Now, I have my doubts about whether this scenario will come to pass. For one thing, SNP screening is going to be a lot cheaper than whole-genome sequencing for a long time, and probably will be just as informative. The benefit of the whole-genome sequence -- that it finds the rare variants that no one else has -- also makes it much less medically useful, since nobody knows what your unique rare variants actually do.

Filed under

Start your WatsonVenter chimera now

Nicholas Wade writes about the sequencing of James Watson's genome:

A copy of his genome, recorded on two DVDs, was presented to Dr. Watson yesterday in a ceremony in Houston by Richard A. Gibbs, director of the Human Genome Sequencing Center at the Baylor College of Medicine, and by Jonathan M. Rothberg, founder of the company 454 Life Sciences.
"I am thrilled to see my genome," Dr. Watson said.

...which is approximately the same size and shape as the boxed set of Ishtar!

This bit is very interesting:

Some 3.5 percent of Dr. Watson's genome could not be matched to the reference genome. One reason may be that the project scientists had to amplify human DNA by growing it in bacteria and may have lost many regions of human DNA that are toxic to bacteria, said Dr. Egholm, 454's vice president for research. The 454 sequencer skips the bacteria stage entirely and is free of this source of bias.

I wonder if it's true. Then there's this:

Dr. Venter said his new genome had been assembled from scratch. There were many more differences than he had expected, including in single units of DNA that were extra or absent. "It's clear we have grossly underestimated the extent of human variation," he said.

Both the sequences are diploid, so you can look at variations between the two homologous chromosome sequences for the entire genome. This is actually a pretty good way to sample human variation: a single diploid genome contains a large (and predictable, depending on history) fraction of the total variation in the human species. Of course, lots of sequences together include even more information about variability. It's hard telling exactly what Venter means here (he is also shopping around an article about his genome to journals, so

There's some of the usual hand-wringing about how "someone" might misuse the now-public information:

Amy L. McGuire, a medical ethicist at the Baylor College of Medicine who was involved in the Watson sequencing project, said Dr. Watson and Dr. Venter were following the medical tradition of making oneself the first subject of a new experiment and would incur unknown risks.
"I think that both have been motivated by their commitment to the science and genomic medicine and advancing the field," Dr. McGuire said.

There's not much chance that anyone will do anything nefarious directly to Watson or Venter. What are they going to do? Send Jason Bourne out to Venter's boat to administer a Venter-specific toxin?

But then, there are the non-directly-nefarious-but-still-kinda-shady things.

Like, what if their genomes became a template for genetic alterations of other individuals or species? That is, suppose you wanted to add a garden-variety human gene to something -- like the insulin gene into a breed of corn. Now, that gene sequence has to come from somebody -- and you're more or less likely to pick it straight out of Genbank, so it's somebody anonymous. And the gene is probably functionally just like everyone else's, so the only thing that is at all "unique" or "distinguishing" about it is the silent nucleotides and introns, which may or may not be just like anyone else's.

Now, suppose you want to add a set of human genes -- like maybe, six or seven. And you think the genes might interact with each other in some way. To avoid unwanted interactions, you might pick all the genes from one individual, where you know that they didn't interact badly. Sure, you will test them in various models to make sure. But it's easy to pick all the genes from the same person. Make it Watson.

Heck, maybe you're cloning your child but want to correct a few genetic issues you don't care for. Or maybe you noticed what Venter said about variation, and want to plug in a few copy number variants that your genome is missing. Fill in the gaps with a few Watson genes. He can be the 0.1 percent daddy.

Oh, and how about those people trying to grow meat in a lab? Mmmmm... Venterlicious!

Filed under

Filling in the blanks

AP reporter Matt Crenson has a story on the "twisted path" of one man's DNA-aided search for his biological father.

Nobody ever told [Martin] Marshall how to approach people to ask for their DNA. Nobody ever explained how to tell a complete stranger that maybe, just possibly, the man who raised him — the man who played catch with him in the yard, who taught him to drive, who sent him off to war and welcomed him home — may have cheated on his mother.
"What are the procedures," Marshall asks. "Where's the handbook for how you go about doing this kind of research?"

The story goes through Marshall's search for his father from the very beginning, leading down several dead-end trails and failed attempts at DNA matches. The last one is the most poignant -- both because he tries the hardest to make it happen, and because of the result.

I linked to a similar story earlier this month. A few people e-mailed me, letting me know that reporters had been frequenting genealogy mailing lists looking for stories -- which aren't, of course, characteristic of most people's experiences.

What I like about the current story is that it illustrates both positives and negatives. Marshall received helpful advice and cooperation from genealogy groups and many possible relatives, but at the same time became so attached to his quest that one possible relative felt he was being "stalked". The story gives enough detail to understand both Marshall's point of view and the opposite perspective.

Filed under

Modern vampires of genealogy

This is a great story by Amy Harmon in the NY Times:

Stalking Strangers' DNA to Fill in the Family Tree
They swab the cheeks of strangers and pluck hairs from corpses. They travel hundreds of miles to entice their suspects with an old photograph, or sometimes a free drink. Cooperation is preferred, but not necessarily required to achieve their ends.
If the amateur genealogists of the DNA era bear a certain resemblance to members of a "CSI" team, they make no apologies. Prompted by the advent of inexpensive genetic testing, they are tracing their family trees with a vengeance heretofore unknown.

You have to read these stories to believe them -- people stalking possible distant relatives to collect their DNA from discarded cups, scheduling DNA collection visits to older relatives with dementia, and plucking hairs from corpses at funerals.

My mom's here visiting, and she does a lot of amateur genealogy. She says, "There are a lot of nuts out there!"

(via Dienekes)

Filed under
Syndicate content