Legacy of a candidate gene and replication in genomics

During the 1990s and early 2000s, many human geneticists and other scientists (especially psychologists) tried to study the genetics of human traits by following a candidate gene approach. In this approach, researchers studying a phenotype identified a genetic polymorphism and tested it within a sample of individuals to see whether it correlated with their phenotype.

The criteria for identifying a polymorphism as a “candidate gene” varied from study to study. The single most widespread criterion was that the polymorphism had to be easy to genotype using early 1990s-era genetic approaches. Length polymorphisms and microsatellite (STR) loci were especially common as polymorphic markers. Later, when microarray approaches became cheaper and more reliable, many researchers continued to rely upon the length polymorphisms and STR markers because they were comparable with older literature. To do this, researchers worked to find which SNP haplotypes were linked to the length polymorphism, enabling them to impute length polymorphism alleles from SNPs.

Ideally, researchers hoped to identify genes with protein products that had a plausible biochemical connection to a trait. The idea was that biochemistry and cell biology could identify networks of genes that had a structural or regulatory role in generating a phenotype, and that systematic investigation of the variation in those specific genes would enable researchers to discover the genetic causes of variation in the phenotype.

One of the most famous of the “candidate gene” length polymorphisms for psychological and behavioral phenotypes is the 5-HTTLPR polymorphism. This is a length polymorphism that lies in or near the 5’ promoter region of the gene SLC6A4. Serotonin, also known in the biochemical literature as 5-HT, is transported into neurons by SLC6A4. The serotonin transporter protein became known as the 5-HT transporter, or 5-HTT.

Armin Heils and coworkers during the mid-1990s found a tandem repeat polymorphism in the promoter region of 5-HTT, which they found to be connected to gene expression activity: “Allelic Variation of Human Serotonin Transporter Gene Expression”. They named this polymorphism 5-HTTLPR, and hypothesized that it may be related to variation in behavior:

We have recently characterized the human and murine 5‐HTT genes and performed functional analyses of their 5′‐flanking regulatory regions. A tandemly repeated sequence associated with the transcriptional apparatus of the human 5‐HTT gene displays a complex secondary structure, represses promoter activity in nonserotonergic neuronal cells, and contains positive regulatory components. We now report a novel polymorphism of this repetitive element and provide evidence for allele‐dependent differential 5‐HTT promoter activity. Allelic variation in 5‐HTT‐related functions may play a role in the expression and modulation of complex traits and behavior.

This kind of assertion was very exciting to psychologists who were looking for ways that genetics might influence behavior. The 5-HTTLPR polymorphism was easily genotyped with mid-1990s-era approaches. It wasn’t cheap, but it was cutting edge science. A possible large-effect variant affecting behavior would fit well with behavioral psychology approaches that relied on samples of dozens of individuals. One lab after another began to test whether 5-HTTLPR was related to behavior.

Later, a similar polymorphism was discovered in the 5-HTT gene of rhesus macaques. The monkey polymorphism enabled experimenters to test how the gene might influence responses to maternal deprivation, alcohol exposure, and many other conditions.

All of this has been a long-winded way of introducing a recent blog post by Scott Alexander, who looked into the legacy of this research on the 5-HTTLPR polymorphism and psychological and behavioral phenotypes: “5-HTTLPR: A pointed review”.

To make a long story short, twenty years of research into this candidate gene appear to have been largely a waste of time and effort. Today, well-powered studies involving thousands of research subjects have shown that SLC6A4 makes little to no difference in clinical conditions or normal behavior.

Alexander reviews this result and emphasizes the depth of the problem in a way that I’ve seen few state so clearly:

First, what bothers me isn’t just that people said 5-HTTLPR mattered and it didn’t. It’s that we built whole imaginary edifices, whole castles in the air on top of this idea of 5-HTTLPR mattering. We “figured out” how 5-HTTLPR exerted its effects, what parts of the brain it was active in, what sorts of things it interacted with, how its effects were enhanced or suppressed by the effects of other imaginary depression genes. This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.
This is why I start worrying when people talk about how maybe the replication crisis is overblown because sometimes experiments will go differently in different contexts. The problem isn’t just that sometimes an effect exists in a cold room but not in a hot room. The problem is more like “you can get an entire field with hundreds of studies analyzing the behavior of something that doesn’t exist”. There is no amount of context-sensitivity that can help this.

Today, there are geneticists who criticize the GWAS approach. GWAS identifies statistical associations between SNP alleles in a sample and phenotypes, but most of these associations have not led to better knowledge of the biochemical or developmental pathways by which genes affect phenotypes.

Indeed, the genetic variations that actually cause phenotypic variations are invisible to GWAS. The method’s reliance on the phenomenon of genetic linkage between common SNPs and causal variants means that findings from any one population may have little application to other populations that share a different history that gave rise to different linkage patterns.

Still, many geneticists who began their careers within the last fifteen years may not know the history of the 1990s and early 2000s-era human genetics. At that time, some geneticists strongly pushed the idea that gene discovery must be supported by clear biochemical evidence demonstrating the mechanism by which gene variants affect phenotypes.

In those days, I had many conversations with human geneticists who were endlessly frustrated that they couldn’t get their work published because it was based upon genome-wide analyses. Some reviewers insisted on biochemical work to support statistical evidence of gene-phenotype associations.

5-HTTLPR was strongly pushed by those who wanted this kind of biochemically-informed approach to genomics. The variant was almost the perfect candidate gene.