information theory

Mutual information between strings of loci

Fourth in a series on mutual information and genetic linkage. If you’re happening upon it for the first time, you can find the entire series or the first post, “Information theory: a short introduction”.

After the last post, you might wonder what the big deal is about these information theoretic measures of linkage. After all, we’ve got lots of other measures of linkage to choose in population genetics, with many years of theory behind them. The basic conclusion about genetic drift was that it adds mutual information to samples over short regions, but that recombination over longer areas washes it out. If the net effect is no linkage, why would we bother to come up with some non-standard linkage measure?

One answer: If the existing linkage measures were so great for testing neutrality, then we might expect some of the recent genome-wide selection scans to have used them. But they didn’t – instead we have several partially incompatible methods, all of which eschew the usual measures of linkage.

When genetic drift reduces entropy

This is the third in a series on information theory and tests for recent selection. The first post, “Information theory: a short introduction”, covered some of the basics of entropy. The second post, “Information theory and mutual information between genetic loci”, showed that mutual information between independent sites will be distributed as a χ2.

We tend to think of genetic drift as a random process. Random processes operating repeatedly over time are called “stochastic,” and changes in gene frequency under genetic drift are certainly that.

Since entropy is a measure of uncertainty, it might seem natural to think that stochastic changes in gene frequency would increase the entropy in a population. After all, the gene frequency in a population under genetic drift will be more and more uncertain over time. So, considering the frequency of a single allele as the system, genetic drift appears to increase entropy over time.

But even this simple system isn’t quite so simple as it might appear. Sure if you start out knowing the allele frequency, then genetic drift will increase your uncertainty over time. You will become less and less able to say that it lies in any given interval. But what if you don’t start out knowing? What if all you know is that the locus has been subjected to t generations of genetic drift?

As t increases, the probability of fixation of the locus also increases. The net effect is to reduce the entropy in the system – going from uncertainty about the allele frequency to more and more certainty that it will be either one or zero. The only thing that will stop this process is some other evolutionary force – mutation, migration from other populations, balancing selection. Each of these will have its own distinctive effects on the entropy of the single-locus system.

I'm reading through the English translation of The Culture Historical Method of Ethnology by Wilhelm Schmidt -- one of the practitioners of the Vienna School in the early 20th century. This is in pursuit of my project describing early 20th-century diffusionism, the next logical point is the Kulturkreise, or "culture circle" theory.

While I formulate my thoughts on that topic, I thought it might be worth highlighting this passage, in which Schmidt considers the problem of reconstructing culture relations among prehistoric hunter-gatherers -- the kind with which Paleolithic archaeologists would now be concerned. The passage lies in a broader section about external causes of culture similarities, and the basic point seems to be that the vastness of time allows few distinctive similarities among the very ancient cultures of Paleolithic people:

On the other hand, the comparison of ethnological with the prehistorical periods, as is carried out by O. Menghin in his Weltgeschichte der Steinzeit, shows how long those periods of the food gatherers must have lasted, which without any noticeable "progress" (in external culture) cover thousands (and even tens of thousands) of years. We pass over in silence the astounding speed of the historical events of modern times, for neither could that profusion of events of "early historical" times have happened in the epochs with which we are here concerned, because there existed only very small human groups who lived thinly distributed over the earth, which was still so sparsely peopled. At any rate, the external course of events could by far not have been rich and developed, and the resonance of the individual events could also not by any means describe such broad circles and, if so, then only in a long course of time.

Of course, that does not decide the question of te plenitude and richness of psychic events of the peoples of that time, although also the small number of individuals in the single groups and the great distances separating the single groups restricted these to certain limits. The smallness of a group could especially increase the danger that the group received none of the leading individuals, who are of particular importance for the awakening and preservation of intellectual life (Schmidt 1939:251-252).

I find the last paragraph here to be quite modern in its conception of information transfer and storage within individuals. In the first paragraph, Schmidt concerns himself with the pace of culture change, with culture viewed as a reactive system in the face of external events. The long span of Paleolithic time has few extrinsic changes to which cultures might have obviously adapted, so that contrasts between cultures on an extrinsic basis might be expected to be minor.

Julian Steward and the logic of diffusion

I've had a tremendous response to the last entry in the diffusion series, which discussed the treatment of cultural diffusion by the Boasian school. I really appreciate the pointers, which have taken me in some interesting directions that I might have missed. Meanwhile, I'm continuing on with my review.

In the last post, I pointed out that Boas already had described the main elements necessary for a formal description of diffusion. Indeed, diffusion was of special importance to the Boasian view of culture, because it provided a mechanism by which culture history could come to be partially independent of race history. But despite its importance, neither Boas nor his students brought themselves to a reductive theory of diffusion. That is no surprise, in the context of early American anthropology, which was in many respects unwilling to generalize rules from the particular descriptions generated by ethnographers.

But one of Kroeber and Lowie's students, Julian Steward, did attempt a relatively formalized description of cultural diffusion. In a short 1929 article, "Diffusion and independent invention: a critique of logic," the 27-year-old Steward showed frustration with anthropology's failure to grapple with the diffusion problem in a concrete way.

There exists a large proportion of anthropological data which admits of no clear-cut methodology but is usually handled according to inference and common sense logic. While this method may be soundly rational, the possibility of an enormous subjective element and fallacious logic is ever present and is demonstrated by the existence of the diffusion controversy. This controversy is made possible not only by the personal bias of the investigator but also by confusion of the principles upon which the solution is based.

It is not my purpose to present a rule-of-thumb method for the settlement of the diffusion controversy but to inquire into its logical implications and discover whether these are not capable of formulation. While this will but formulate the principles implicit in most work, it will also reveal the possibility of certain confusions and inconsistencies (Steward 1929:491).

Why was there a "diffusion controversy"? Steward describes the controversy as emerging from two extreme viewpoints about culture history -- "extreme diffusionists" and "evolutionists." In Steward's description, this conflict comes down to a logical error of comparison.

He describes this error with an example: "inverted speech," which is "a custom of clowns and others of saying the reverse of what is meant."

This may be illustrated by inverted speech which occurs in North America in the Plains area, California, and the Southwest, and also occurs in Australia. Shall we account for these four occurrences by diffusion or independent invention. The solution depends upon inference from the assembled facts, but what is the logic of our reasoning? We ask: How probable is communication between these areas? How difficult an achievement is inverted speech? It is tempting immediately to postulate diffusion between the North American occurrences but independent invention for Australia. This would be solely on a basis of distribution and by this we should be prone to judge the uniqueness of the element (Steward 1929:491-492).

Steward uses "uniqueness" as a synonym for improbability: an improbable feature shared by two societies is likely to result from a unique occurrence that had diffused to both. Continuing with inverted speech, Steward notes that the assumption of diffusion among the North American instances leads naturally to the conclusion that inverted speech is improbable -- after all, it only occurred once in the entire continent. But the assumption of independent invention of the trait in Australia

...lead us to regard inverted speech as not so difficult an invention after all, for it clearly has been invented a second time. What logical justification would there be for the assumption that independent invention is inherently less possible for the Plains, California, and the Southwest than for Australia because the first three happen to be geographically more accessible (492)?

Steward claimed that this comparison amounts to question-begging. Sure, if we assume that communication between two societies is likely, we will be predisposed to interpret diffusion. But this does not give us any real information as to whether independent invention happened.

To address this problem, Steward suggests three "principles" to guide the interpretation of shared culture elements. I'm going to list these, which Steward presents as statements about probabilities, and recast them in terms of information theory.

(1) The probability of independent invention is directly proportionate to the difficulty of communication between the localities

In other words, if messages may pass easily between two locations, then we predict that the information comprising culture elements may pass easily also. Steward further describes two signs that may indicate the difficulty of communication. If the two localities share many culture elements, then we can infer that a regular communication was probably present. And if the localities communicated over a very long time, they may share many more things than if they have communicated only over a short time.

(2) The probability of independent invention is directly proportionate to the uniqueness of the element.

Steward adds:

The uniqueness of a culture element -- that is, the probability of its being invented -- is the most difficult problem to determine. This will be decided by the investigator upon his experience and knowledge of the cultural setting and circumstances under which it may have been invented. But his decision must not depend upon either of the other two principles stated here. To the probability of an element of culture arising in a particular culture, the existence of this element in other localities and the difficulty of communication between the localities are totally irrelevant.

This is probably the most important point in the essay: It would be desirable to maintain a separate test of the diffusion hypothesis with reference to the "uniqueness" of the element in question, so that this test could reinforce the test based on communication. As it was, each of these tests appeared to obviate the other, resulting in a circle. But Steward's description of this point is totally unclear, and benefits from an information theoretic analysis.

If we recast "uniqueness" as "reduction in Shannon entropy" (e.g., "information" in the technical sense), and understand that the entropy is a function of probabilities of the components of a culture element, then we can reformulate this principle: Independent invention is less probable when the shared element has a high information content. Still, this cannot be absolute: high information content is difficult to communicate. Communicating a long message across a noisy channel (like a culture contact) requires some work to maintain the fidelity of the information -- and that work requires an incentive.

In this sense, the "uniqueness" criterion must really break down into at least three separate properties: (1) the information content of the trait, (2) its value, and (3) synergy with the existing culture system. Value may refer to function, such as status marking or foraging utility. A culture element that conflicts with existing knowledge is less likely to be transmitted accurately; whereas one that is synergistic with existing knowledge might be picked up easily from a distant culture. In effect, "synergy" refers to the extent that the information content of a culture element is already familiar within the culture background.

(3) The probability of independent invention is inversely proportionate to the probability of derivation from a common ancestral culture.

This is a phyletic view of shared culture elements, that they come from original populations that split into daughter cultures. Again, when many elements are shared, this increases the probability of sharing for any single element.

Yet after this discussion of principles, Steward comes to an uncomfortable conclusion. We are still left utterly unable to assess whether inverted speech was invented independently in different North American regions, or whether it diffused from one origin. Certainly it helps to know that we don't know that, but that doesn't give us a method, just a critique.

What the problem really requires is some kind of measurement of the probabilities of invention and transmission. In the case of genetic transmission, we have well-defined probabilities, because actually there is very little information entropy in the system of a single gene with a finite (and small) number of alleles.

But for culture elements, the information content may be much greater. With more information, we can imagine more ways for the information to be apportioned, as well as more different ways that transmission of the information might be disrupted.

As to the "diffusion controversy," that will be discussed further in the next post, where I cover Leslie White's record on diffusion.

References:

Steward JH. 1929. Diffusion and independent invention: a critique of logic. Am Anthropol 31:491-495.

Information theory and mutual information between genetic loci

This is the second in a series on information theory and tests for recent selection. The first entry, "Information theory: a short introduction" reviewed the basic concepts of information measures and their background.

The International HapMap is a massive project to determine the genotypes for up to 3 million single nucleotide polymorphisms (SNPs) in samples of people from 11 population samples around the world. The current data release (Phase 3) includes genotypes for a subset of over 1.5 million SNPs in 1,115 people. The 11 population samples include people of African ancestry from the US Southwest, Utah residents of Northern and Western European ancestry, Han Chinese from Beijing, people of Chinese ancestry from Denver, people in the Houston Gujarati Indian community, Japanese people from Tokyo, Luhya and Maasai people from Kenya, people of Mexican ancestry from Los Angeles, Italians in Tuscany, and Yoruba from Ibadan, Nigeria.

As impressive as this effort is, we may wonder why exactly SNP genotyping of so many people is a valuable enterprise in itself. The project’s homepage includes this short statement:

The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs.

There are theoretical and practical objections to this simple explanation (as I discussed here last month). However, what no one involved with the project seems to have expected is the extent to which the data would demonstrate the importance of recent adaptive evolution in human populations.

Here, I am describing some of the ways that we can test hypotheses about natural selection by using the SNP genotypes from the HapMap. This is a theory-centric description, with some digression into practical aspects of handling the genotype data. First, I consider how we might use information theoretic concepts to test the hypothesis of independence between two genetic loci.

Information theory: a short introduction

I lectured this week in my Biology of Mind course about information theory, and in particular the concept of Shannon entropy. I’ve typed up a few notes for my students, and I’m cross-posting them on my own blog because they are relevant to another topic I’ll be writing about: discovery and testing of natural selection in the human genome. You see, the kind of data that are presently being collected as part of the International HapMap , single nucleotide polymorphisms (SNPs), are naturally treated by information theoretic measures. So first, it may help to define the essential concepts of information theory.

Superstition explained? Signs point to no...

I happened to be lecturing about cause-effect relationships in my Biology of Mind course today, and will continue the subject next time. So I was interested when I saw this story from New Scientist writer Evan Callaway:

The tendency to falsely link cause to effect – a superstition – is occasionally beneficial, says Kevin Foster, an evolutionary biologist at Harvard University.

For instance, a prehistoric human might associate rustling grass with the approach of a predator and hide. Most of the time, the wind will have caused the sound, but "if a group of lions is coming there’s a huge benefit to not being around," Foster says.

OK, so this seems fairly obvious on the surface. Sure, if you could avoid the lions by jumping at every sound in the grass, then this kind of "superstition" might pay off, and would be unlikely to hurt. But then, being afraid of wind in the grass is really not a superstition. There is reliable mutual information connecting grass sounds and lions. Jumping at every grass movement might lead to lots of false alarms, but will really help avoid the lions, too.

That's a different kind of scenario than what we usually mean by "superstition". Like never washing your game socks so that your team won't lose on Saturday. Or paying the witch doctor so that the demons will spare your little girl.

We can still say that these false beliefs have little fitness cost. After all, the difference between a witch doctor and a real doctor may be important in today's cosmopolitan society, but throughout most of human existence, faith healers, witch doctors, shamans and thaumaturges really didn't face any competition.

But still, the simple idea of false association doesn't seem to cover these kinds of superstitions. I mean, many of these beliefs are so complicated, with involved series of rules and taboos that must be observed. Human superstitions may involve false hypotheses of causation, but they are not only that. So it seems like there must be some active principle at work, prompting people to learn arbitrary sequences of behaviors on the basis of fictions.

Moreover, it is not easy to list cases where animals make false inferences of cause and effect in this way. To be sure, we can manipulate animals experimentally, getting them to learn arbitrary correlations (like Pavlov's dogs) and then switch them so that they're false. But that's not the same thing at all! That's like the case where other teams all collude with each other to make sure that your team always wins except when you wash your socks.

The research study is available in Proceedings of the Royal Society B, by Kevin Foster and Hanna Kokko. The paper defines "superstitious behavior" as behaviors that result from "the incorrect establishment of cause and effect". That definition doesn't include most of what people mean when they refer to "superstitions" -- it's really quite strictly limited to formally correct inferences based on spurious associations. The authors then derive conditions under which such errors may actually increase fitness.

Foster and Kokko assume a model in which the fitness of an individual is determined by some events in a series that includes many other events that do not affect fitness. If some of those benign events are correlated with fitness-determining events (that is, if there is mutual information linking them), then an individual might increase its fitness by exploiting this mutual information.

Of course, there is a problem of detecting the correlations. Suppose we observe for three consecutive months that it rains after the full moon. We might be tempted to think there is some causal connection here---that the full moon causes rain in some way. Maybe the moon goddess has an affinity for water. In any event, our observation may induce us to believe that other full moons will also be followed by rain. That's inductive reasoning.

To that end, Foster and Kokko's model might appear to provide an evolutionary account of the problem of induction. Their model entails a trade-off: A greater degree of precision in making correct inferences will have benefits, but the necessary effort will have costs. "Superstition" reigns below a certain threshold -- the point at which the costs of more accurate inferences are balanced by the benefits of rapid inference-making.

That's a simple model, and the trade-off is a mathematical necessity, given the assumptions.

Is that all "superstition" is?

It seems to me that "false inference" is not a sufficient definition for "superstition." For one thing, superstitions persist even in the face of considerable evidence against them. The player who loses a game may still keep wearing those dirty socks. The family whose first child dies may still employ the witch doctor when the second child falls ill.

Another thing is that superstitions do not really consist of false inferences. They consist of highly distinctive signs. Those dirty socks, or a rabbit's foot, or oracle bones. Walking under a ladder, broken mirrors, which direction the horseshoe is hanging. "Gesundheit" after a sneeze. You don't learn about them by inference from natural observations; you learn about them from other people who explain the "boundaries" of the signs -- what counts and doesn't count as part of the belief.

We might compose some sort of analogous argument to that presented by Forster and Kokko. Maybe learning about superstitious beliefs from other people is adaptive because we also learn about valid, true inferences from them, and it's too costly for us to try to tell the difference. There may be some truth to that idea. We accept superstitious beliefs because everyone around us accepts them, and how could so many people be wrong?

But there are other aspects we should consider. We can't freely take or leave the beliefs that are common in the society around us. Some of them are enforced at the point of a gun. Others are instilled by ritual repetition from early childhood onward. We can call it "social pressure" or "tradition," or simply "culture," but whatever we call it, we have to recognize that we are compelled to accept some beliefs based on what other people around us believe.

If we call a belief a "superstition," chances are it's already out of style. Belief in ghosts may have been the norm a hundred years ago; today it's a smaller niche. Or a "superstition" may be simply idiosyncratic in some way -- one person's eccentric belief. There are many, many equally silly beliefs that we call much more respectful things, like "laws" or "traditions" --- the difference being that large communities of people share those rituals.

Any account of superstition has to take into consideration those elements of human learning and sociality. It is far from obvious that any non-human animals have "superstition," even in the sense of false inferences. If we look at a truly human use of the term -- arbitrary sign sequences believed to alter natural phenomena and maintained by social learning -- then it is doubtful that any non-humans have such beliefs. Still, phenomena like the chimpanzee "rain dance" might fit the bill, if we could suitably define the concept of "belief" in a non-linguistic context.

At any rate, the kind of logic about costs and benefits of inference-making don't seem to describe the kind of phenomenon we really mean when we talk about "superstition." And that's disappointing. If we take the course of defining the term in an overly simplistic way, ignoring its evident social and semiotic components, then we rob ourselves of the depth of human cognitive creativity.

References:

Foster KR, Kokko H. 2008. The evolution of superstitious and superstition-like behaviour. Proc Roy Soc Lond B (online) doi:10.1098/rspb.2008.0981

Randomness

From a passage on the statistical behavior of aggregates and probability theory, p. 64-65 in Entropy for Biologists by Harold J. Morowitz, Academic Press, New York, 1970:

The notion of randomness is a very important one in physics, yet difficult to describe. (Randomness has become so significant that one of the outstanding scientific publications of recent years was a book of one million random digits.) Often a process is so complicated or we are so ignorant of the boundary conditions, or of the laws governing the process, that we are unable to predict the result of the process in any but a statistical fashion. For instance, suppose we have a collection of radioactive phosphorus atoms, P32, and take an individual atom and question how long it will take to emit an electron. Here we do not know the boundary conditions, i.e., the detailed state of the nucleus, nor do we know the exact laws coverning radioactive decay. The time can take on any value. We may obtain an aggregate of such values as is done in experiments on radioactive half-lives and deduce certain features of the collection, but we may only make probability statements about the individual atom. Randomness is in a certain sense a consequence of the ignorance of the observer, yet randomness itself displays certain properties which have been turned into powerful tools in the study of the behavior of systems of atoms.

Information measures

Pp. 66-67 in Entropy for Biologists by Harold J. Morowitz, Academic Press, New York, 1970 (emphasis added):

The logic of our approach may be difficult to follow since information is not a physical quantity in the sense that mass, charge, or pressure are physical quantities. Information deals with the usefulness of a set of symbols to an observer. Since information does not measure anything physical, we are free to choose any information measure we please. The definition is therefore at first arbitrary and the choice is based on a common sense estimate of the usefulness of a set of symbols. The original definition arixing from the needs of the communications industry was, to use P. W. Bridgman's words, "of such unblushing economic tinge." What in the end turns out to be surprising is that the definition which was introduced is found to relate to the entropy concept in interesting and very fundamental ways.
Syndicate content