A new approach to the Prisoner's Dilemma

3 minute read

Daniel Lende has described some evolutionary and anthropological import of a recent paper in PNAS on game theory: “Prisoners Dilemma and the Evolution of Inequality Does Unfairness Triumph After All?”.

The paper, by William Press and Freeman Dyson Press:Dyson:2012, proves that a range of strategies exist for the classic “iterated Prisoner’s Dilemma” game that actually allow one player to dominate and determine the payoffs for the other player over the long term. A long history of theory had argued that symmetrical outcomes were stable because one player could always punish another who was trying to impose an unfair outcome. The difference in the current result comes from the mathematical recognition that one player could completely determine the payoffs for the other, over the long term.

What is surprising is not that Y can, with Xs connivance, achieve scores in this range, but that X can force any particular score by a fixed strategy p, independent of Ys strategy q. In other words, there is no need for X to react to Y, except on a timescale of her own choosing. A consequence is that X can simulate or spoof any desired fitness landscape for Y that she wants, thereby guiding his evolutionary path. For example, X might condition Ys score on some arbitrary property of his last 1,000 moves, and thus present him with a simulated fitness landscape that rewards that arbitrary property. (We discuss the issue of timescales further, below.)

The paper deserves a longer commentary, and Lende has provided an interesting one. After considering some ways in which iterated Prisoner’s Dilemma has been applied in evolutionary biology, such as life history theory, he suggests:

In other words, zero-dimensional strategies are a way to think about facultative adjustments that organisms can make in reproductive and life history strategies.
As just a thought to throw out there, might zero-dimensional approaches shed new light on the epidemiological transition? Has it made sense, where fitness pay-offs are high for offspring through investment and development, to invest more as a parent and thus set the highest set of pay-offs for a child?

Much more at the link, which provocatively connects the short-term versus long-term strategy discussion in the paper to the emergence of wealth inequality in complex societies.

Edge has a question-and-answer post with study author William H. Press: “On ‘Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent’”. The entire interview is very interesting, here’s an excerpt that highlights the connection between the reward-payoff game of Prisoner’s Dilemma and actual flesh-and-blood evolution:

Yes, Virginia, you can fool evolution. People do it all the time, nowadays, with directed evolution experiments that fool microbes into doing unnatural things. The trick is to keep adjusting the environment so that the more fit organism is the one that bends most to our (unnatural) goal. So, its not a surprise that these tricks exist in principle. What is a surprise is that they are so easily exemplified, mathematically, in a game as simple as Iterated Prisoners Dilemma and that this was mathematically obscure enough to escape notice. Do these tricks exist in all mathematical games? Do they exist in reallife competitive scenarios? When both players have a theory of mind (that is, are not just evolving to maximize their own score), are all games, in some deep way, actually Ultimatum Games? These now seem to be interesting questions.

Personally, I think the Prisoner’s Dilemma has been overemphasized in the discussion of the evolution of human cooperation, as many kinds of social interactions in ancient hunter-gatherers would not have fit that dynamic. Nevertheless, we should revisit the literature and revise the assumption that cooperation emerged according to the Prisoner’s Dilemma dynamic. In this regard, the most interesting aspect of Press and Dyson’s work may be the clear demonstration that short-term and long-term strategies bear a different relation than traditionally thought. Cognitive resources for individual discrimination, tracking of reputation, and memory of previous interactions have evolved over millions of years in primates, and their elaboration in humans may have happened in a very different context than imagined before last month.