Some genetic drift graphs with Mathematica

The first thing to come up in my lectures is genetic drift. Pretty much everyone who lectures about drift needs a figure showing the results of simple Monte Carlo simulations of sampling drift in a finite population. You start a population with two alleles, sample it randomly in each generation until one or the other alleles disappears. I tend to start with a “population size” of 1000 gene copies, and an initial allele frequency of 50%.

We can do this kind of simulation in Mathematica pretty easily. We’ll work with three variables:

  • popSize, which we'll set equal to 1000;
  • a, the number of gene copies with one of the two alleles, initially equal to 500;
  • frequencyList, which will hold the value of a for each generation. Initially, frequencyList holds the first value of a, 500.

Now, we’re ready to code the simulation. We set the three variables, and then set up a While[] loop that samples a random binomial deviate in each generation, based on the previous generation’s allele frequency (a/popSize):

a = N[500]; popSize = 1000; frequencyList = {a}; While[a < popSize && a > 0, a = N[RandomInteger[BinomialDistribution[popSize, a/popSize]]]; AppendTo[frequencyList, a]]

The last line appends the value a to the list. Nesting BinomialDistribution[] inside RandomInteger[] gives us a binomial deviate, based on popSize trials and frequency a/popSize. The loop executes until a reaches either zero or 1000, at which point the simulation stops.

OK, that gives us a list, frequencyList, which contains the number of gene copies with one of the two alleles over time. Now, we want to plot that list. We can try:

ListPlot[ frequencyList, PlotJoined -> True]

…which gives us:

Monte Carlo simulation of genetic drift

That’s servicable, but a bit simple. And it’s confusing where the axis cuts across. It would be better to have the plot bounded at 0 and 1000, the min and max possible. Also, we need a better font than Times.

Let’s try:

ListPlot[ {frequencyList}, PlotJoined -> True, Frame -> True, AxesOrigin -> {0, 0}, PlotStyle -> Thick, FrameLabel -> {"Time (generations)", Frequency}, TextStyle -> {FontFamily -> "Myriad Pro", FontSize -> 12}, Filling -> 500]
Monte Carlo simulation of genetic drift

That’s a bit more like it. The Filling -> 500 gives us shading everywhere between our frequency line and the 500 mark, a nice cue reminding us where the simulation started.

Now, we can add a second run in the same plot:

Monte Carlo simulation of genetic drift, two populations

Oh, well that shifted the x-axis. No harm, though. And we continue…

Monte Carlo simulation of genetic drift, two populations

Not a bad representation of the process, with fixation times ranging from very short to quite long, and one going to fixation at zero. The different color shading tends to confuse the picture a bit, and doing it in a larger size for the projector will take some tweaks, but so far it’s much more attractive than the Excel version.

We could run a few more simulations to substitute, just in case one made the overall picture more clear (for example, by moving the lines apart in the early phase). But here we have the benefit of honesty — these are the first four sims I ran.

UPDATE (2009-09-13): I’m revisiting this post as I work on a Mathematica Demonstration of genetic drift. It’s much simpler to implement the central drift algorithm with a NestList or a NestWhileList. For example:

NestList[RandomInteger[BinomialDistribution[1000, #/1000]] &, 500, 4000]

gives you the trajectory over 4000 generations. This:

NestWhileList[ RandomInteger[ BinomialDistribution[1000, #/1000]] &, 500, # < 1000 && # > 0 &]

replicates the output above.