statistics

Mailbag: Statistics and future evolution

I was trying to find out more
about recent research predicting a relative convergence of racial features in
future generations (but I don't know anything about "rapid evolution by drift"
or things like that). I'm aware of debunked claims (inc. your debunking) from
media reports, but I'm not aware of research that actually contains enough
scientific merit to make a valid prediction. I decided to write to you after reading
your review of a lecture by UCL geneticist Steve Jones.

If there is any reference you can give to someone like me who has very little genetic
training (past Mendel, anyway) I would greatly appreciate it.

I'll be glad to help if I can. Population genetics shouldn't be too much of a challenge for you; it's basically statistics (e.g., evolution by genetic drift is modeled by repeated binomial sampling).

We have a very high rate of gene flow between "racial" or geographic groups today compared to the past, and so we can predict that gene frequencies should converge in the future. But there are two issues -- first, the rate of change by chance in very large populations is very slow; and second, some genes may be (or recently have been) subject to selection processes that maintain diversity. That second is a complicated problem because selection pressures may be different for every gene.

People often complain that R. A. Fisher wrote in a hard-to-read style; unnecessarily verbose and indirect. Either I don't tend to mind, or I find that the style makes me read with greater care. In either case, there are select passages from his writings that stand out as very clear to me. His description of epistasis and dominance as deviations from additivity, in his famous 1918 paper (p. 404), is one of them:

The steps from recessive to heterozygote and from heterozygote to dominant are genetically identical, and may change from one to the other in passing from father to son. Somatically the steps are of different importance, and the soma to some extent disguises the true genetic nature. There is in dominance a certain latency. We may say that the somatic effects of identical genetic changes are not additive, and for this reason the genetic similarity of relations is partly obscured in the statistical aggregate. A similar deviation from the addition of superimposed effects may occur between different Mendelian factors. We may use the term Epistacy to describe such deviation, which although potentially more complicated, has similar statistical effects to dominance. If the two sexes are considered as Mendelian alternatives, the fact that other Mendelian factors affect them to different extents may be regarded as an example of epistacy.

The terms we use today are familiar by use. A biologist doesn't necessary consider how idiosyncratic is the genetic use of term "additive". When I read a passage like this, it brings to mind a long-ago time when the select group of people using a term all had read the same papers. I wonder how many geneticists still read Fisher during their training. I can tell you this: the bound volume of the Proceedings of the Royal Society of Edinburgh in our library didn't look like it's been picked up for 30 years. I mean, serious dust on the cover.

I wrote last month about how Fisher invented "variance", and noted the very useful property that the variance is a sum of contributions from different causes. It seems remarkable that Fisher could arrive at statistical framework for identifying the interactions of multiple genes on a trait, at a time when only a relative handful of "Mendelian factors" had yet been found.

Now that we are able to find Mendelian factors in whole-genome association studies, it's remarkable that Fisher's framework is so often forgotten!

References:

Fisher RA. 1918. The correlation between relatives on the supposition of Mendelian inheritance. Proc R Soc Edinburgh 52:399-433.

A reader sent along this NY Times article about the town in Brazil with an unusual concentration of twins. Naturally, it's a Boys from Brazil type of scenario:

Some researchers have suggested the darker possibility that Josef Mengele, the Nazi physician known as the Angel of Death, was involved. Mengele, residents say, roamed this region of southern Brazil, posing as a veterinarian, in the 1960s, about the time the twins explosion began. In a book published last year, an Argentine journalist, Jorge Camarasa, suggested that Mengele conducted experiments with women here that resulted in the higher rate of twins, many of them with blond hair and light-colored eyes. The experiments, locals said, may have involved new types of drugs and preparations, or even the artificial insemination Mengele claimed to know about, regarding cows and humans.

But neither Mr. Camarasa nor any other adherent of the Mengele theory has been able to prove the escaped Nazi conducted any experiments here. Mengele, who died in Brazil in 1979, was notorious for his often deadly experiments on twins at Auschwitz, ostensibly in an effort to produce a master Aryan race for Hitler.

Because everyone knows that's where twins come from. Nazi experiments.

The most interesting observation is that the unusual number of twins (10 percent of births from 1990-1994) is accompanied by an unusual fraction of identical twins. However, I'd like to see a simple plot showing all similar-sized towns in Brazil. 10 percent of births across a limited time span is not very exciting if we have thousands of towns and pick out the most extreme value.

Statistics, people. Oh yeah, I suppose the Nazis invented that, too!

Filed under

If you do much statistics and haven't worked with R, you should try it out. The NY Times profiled the software yesterday:

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

...

What makes R so useful — and helps explain its quick acceptance — is that statisticians, engineers and scientists can improve the software’s code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

The graphs are pretty, and it's free software. The article describes it as a "lingua franca" for grad students. Maybe not, but I wouldn't invest my time learning anything less powerful.

Filed under

An interesting post from Justin Wolfers about statistical outliers and sprinters, referencing a New York Times story about Usain Bolt, along with a key graphic showing Bolt's and Michael Johnson's records versus the 249 other fastest 200 meter sprint times in history. Wolfers:

Not only does this not look like a normal distribution, it doesn’t even look like the tail of any standard distribution I’ve ever seen.

It should be clear from this chart why few thought that the previous world record would be broken anytime soon.

Syndicate content