# Genetic variation and the Hardy-Weinberg formula

The fundamental information about genetics for any individual is her genotype. The genotype for a single genetic locus is simply a list of two alleles, whether they’re the same or different. Within a population, we can count alleles in other ways also, giving us ways to measure the genetic variation among many individuals.

A population consists of individuals, so a geneticist may count the number of individuals with every possible genotype. Comparing these numbers with the total number of individuals, the geneticist may calculate the genotype frequencies, the proportion of individuals who have each possible genotype.

For example, cystic fibrosis (CF) is a very rare disorder. Among Americans of European ancestry, only around 1 in 2500 people will develop CF during her lifetime. The disorder is even rarer among people of non-European origin. Geneticists have surveyed many people to discover how many of them carry the disorder, and today a number of states screen newborns for cystic fibrosis as a way of directing affected children to early medical treatment (AAP Newborn Screening Task Force 2000). From this information, geneticists have determined that while only around 0.04 percent of the population is affected by cystic fibrosis, with a genotype of ff, approximately 4 percent of people are carriers of the allele, with a genotype of Ff. This leaves nearly 96 percent of the population with the genotype FF. These proportions are the frequencies of the three possible genotypes for this gene.

*The frequency of an allele is the proportion of the total number of gene copies in the population that are that allele. *

When geneticists know the frequencies of genotypes in a population, they can determine how many copies of each allele are in the population as a whole.

For example, one recent study used genetic techniques to assess the genotypes of a group of colorectal cancer patients for the two possible alleles (A and B) of the *p73* gene on chromosome 1 (Pfeifer et al. 2004). In this sample, 113 people (63 percent) were AA, 54 people (30 percent) were AB, and 12 people (7 percent) were BB.

Considering that each AA person has two copies of the A allele and each AB person has one copy, the total sample includes 280 copies of the A allele and 78 copies of the B allele. The allele frequency of the A allele is then 280/358, or 78 percent. The frequency of the B allele in this sample is 22 percent.

While geneticists can determine the frequency of an allele from the proportions of genotypes in a population, they may also do the same calculation in reverse — figuring out the expected proportions of genotypes from the frequencies of alleles.

For example, if the cystic fibrosis f allele is found at a frequency of 5 percent in a population, then the chance that an individual will have two copies of this allele is simply the 5 percent chance for *each* of the two alleles, multiplied by each other. Five percent times five percent is 0.0025, or twenty-five chances out of 10,000.

*The Hardy-Weinberg genotype proportions are p^{2} + 2pq + q^{2} for a two-allele gene. </em>*

The proportion of both homozygotes in the population may be determined in the same way. The probability that an individual will be a homozygote for either allele equals the frequency of the allele times itself, or squared. Thus, in the example above, the chance of f homozygotes is equal to the frequency of f squared. The proportion of heterozygotes is equal to the chance that one allele is F times the chance that the other allele is f, plus the chance that the first allele is f times the chance that the other allele is F.

Because these probabilities are the same, the total chance of an individual being a heterozygote is 2 times the frequency of one allele times the frequency of the other allele. Thus, the proportions of genotypes are *p*^{2} and *q*^{2} homozygotes, and 2*pq* heterozygotes.

These proportions are called the Hardy-Weinberg proportions, after the British mathematician G. H. Hardy and German physician Wilhelm Weinberg, who independently formulated the relation in 1908. The proportions come from simple probability of sampling copies from a population with given allele frequencies. These proportions are expected to form an equilibrium — that is, they should stay the same over time — as long as the allele frequencies stay the same. When individuals mate without regard to the alleles they carry, every generation of a population should have genotype frequencies in approximately the Hardy-Weinberg proportions.

The Hardy-Weinberg proportion is important for two reasons. First, the proportions of genotypes in a population may diverge from the expected ones for many reasons, including natural selection, division of populations into different subgroups, or mating that is not entirely random. Comparing the expected and observed proportions of genotypes allows biologists to determine whether these evolutionary forces may be contributing to a population.

*The heterozygosity of a population is the expected proportion of heterozygotes, given the allele frequencies in the population. *

Second, the proportions lead to a natural definition of genetic variation in a population: the heterozygosity. A population’s heterozygosity is the expected proportion of heterozygotes from the Hardy-Weinberg formula, 2*pq*.

Two populations may be compared by their heterozygosities: the one with the higher heterozygosity has a higher chance that any single individual will have two different alleles, which means the population is genetically more variable. Variation is a consequence of evolutionary history, including the patterns of selection and genetic drift, and the amount that individuals have moved from one population to another in the past. Thus, the Hardy-Weinberg proportions give an important way to study the evolution of populations over time.