The basic measure of genetic difference between two populations is the statistic, F_{ST}. In genetics, the term F generally stands for ``inbreeding'', which tends to reduce genetic variation in the population. Genetic variation can be measured by heterozygosity, and so F generally expresses a reduction in the heterozygosity in the population. F_{ST} is the reduction in heterozygosity in subpopulations compared to the total population of which they are part.
To estimate F_{ST}, take the following steps:
- Find the allele frequencies for each subpopulation.
- Find the average allele frequencies for the total population.
- Calculate the heterozygosity (2pq) for each subpopulation.
- Calculate the average of these subpopulation heterozygosities. This is H_{S}.
- Calculate the heterozygosity based on the total population allele frequencies. This is H_{T}.
- Finally, calculate F_{ST}=(H_{T}-H_{S})/H_{T}.
Don't forget that the H_{S} term is the average across all subpopulations.
Example: The gene SLC24A5 is a key part of the melanin expression pathway, which contributes to skin and hair pigmentation. A SNP that is strongly associated with lighter skin pigment in Europe is rs1426654. The SNP has two alleles, A and G, with G being associated with light skin, at a frequency of 100% in Utah European-Americans. The SNP varies in frequency in populations in the Americas with mixed African and American Indian ancestry. A sample in Mexico had 38% A and 62% G; in Puerto Rico the frequencies were 59% A and 41% G, and a sample of African-Americans from Charleston had 19% A with 81% G. What is the F_{ST} in this example?