john hawks weblog

paleoanthropology, genetics and evolution

information

  • Mailbag: Cultural evolution

    Tue, 2011-09-20 16:44 -- John Hawks

    I just finished listening to your lectures of rise of humans and it was thoroughly a very nice and complete coverage of recent understandings of this matter. THANK YOU, But there is a burning question and issue in my mind that I like to share and ask you.
    The genetic evolution has been clearly the engine of evolution before human kind but arguably the recent cultural evolution-I call it intellectual evolution- by far is the engine of changes in the history of our species. As you mentioned in the last lecture of that series.
    But intellectual selection that instead of gathering genes-packs of information- on DNA, has gathered information first on Nervous system- from primitive reflexes all the way to complex memory systems in human brain- and later the information packages in language, writing,computer net works, the selection method that its rate of change is the determining factor of our present and future events, has not found its importance and detailed definition and applications in the mind of students and even scholars yet?? What is missing in this picture?

    Thank you so much for the kind words. I agree, cultural evolution has been very powerful but we as yet have no clear way of describing or predicting its progress. Partly it comes down to the model. With genetics, we know certain regular aspects of inheritance that allow us to make strong predictions about how evolution will occur. With culture, it is difficult to define the basic aspects of information that are transmitted, or to describe their dynamics. Humans change information as they transmit it, in ways that are not analogous to genetic changes. So, the topic is very complicated but naturally very interesting.

  • Boas goes low

    Sat, 2011-02-05 12:43 -- John Hawks

    While researching another question, I have been reviewing some Franz Boas. In 1936, American Anthropologist ran a piece by Alfred Kroeber which reviewed some of Boas's ideas and work. Boas was not thrilled by Kroeber's description and wrote a reply with what we would today describe as a rather pissy tone. I suppose he earned it, considering he had trained Kroeber himself.

    In the piece, there is a short little discussion of "common" versions of myths and stories compared to more ideal versions. Boas had spent a great deal of effort cataloging myths and stories from various groups, and trained several of his students (including Kroeber) to do likewise.

    May I remind Dr Kroeber of one little incident that illustrates my interest in the sociological or psychological interpretation of cultures, an aspect that is now-a-days called by the new term functionalism. I had asked him to collect Arapaho traditions without regard to the “true” forms of ancient tales and customs, the discovery of which dominated, at that time, the ideas of many ethnologists. The result was a collection of stories some of which were extremely gross. This excited the wrath of Alice C. Fletcher who wanted to know only the ideal Indian, and hated what she called the “stable boy” manners of an inferior social group. Since she tried to discredit Dr Kroeber’s work on this basis I wrote a little article on “The Ethnological Significance of Esoteric Doctrines” in which I tried to show the “functional” interrelation between exoteric and esoteric knowledge, and emphasized the necessity of knowing the habits of thought of the common people as expressed in story telling. Similar considerations regarding the inner structural relations between various cultural phenomena are contained in a contribution on the secret societies of the Kwakiutl in the Anniversary Volume for Adolf Bastian (1896) and from another angle in a discussion of the same subject in the reports on the Fourteenth Congress of Americanists, 1904 (published 1906) ; the latter more from the angle of the establishment of a pattern of cultural behavior. These I should call contributions to cultural history dealing with the ways in which the whole of an indigenous culture in its setting among neighboring cultures builds up its own fabric.

    Of course, Boas wrote that in the "Don't say I never did you any favors" vein, but the bold-faced line struck a chord. Stories are built of language in iterated social exchanges very much like stone tools are built of flaking decisions. Hardly an original thought, I know, but pertinent to the transmission of early-stage reduction versus formal end-products.

    A version of a story that everybody sort-of knows certainly follows a different social learning dynamic than the canonical version of a story, as told by some famous storyteller -- the "Homeric apogee" of a story, we might say. The canonical version may well be more conservative, depending on tradition and technology. Shakespeare's Hamlet is kept whole by tradition and technology (writing and printing), because we consider the form of the parts essential to the whole. In that sense, any quotidian rendition of Hamlet is going to include many of the specific elements ("To be or not to be") which percolate out of the widely-distributed canonical version. We're never more than three or four interlocutors from the text.

    That is broadly true even in non-literate traditions, as elite storytellers maintain canonical versions of some stories with great fidelity using meter, rhyme and both internal and external references. "Low" culture is more than a game of "Telephone" removed from canonical stories; it promotes its own sensibility that resonates with the broader cultural setting. By considering the evolving dynamics of everyday parlance, "low" culture, we may find windows into semantic guides for learning.

    Boas later accuses Kroeber of "Epicureanism", for wanting elegant stories about historical relations of cultures without insisting on solid evidence. But building a "systematic" understanding is not easy; it's not even obvious what the endpoint of such an effort should look like.

    I have for some time had Brian Boyd's book, On the Origin of Stories: Evolution, Cognition, and Fiction, but haven't had time to really delve into it. It deals with similar issues, in particular he proposes that fiction as a form of art is a side-effect of various human cognitive adaptations. The missing element, I think, is the developmental aspect: How do children learn to create and engage with narratives around them? The shared environment of social learning creates a foundation for more extensive stories of all kinds -- from fiction to science.

  • Jebel Faya and early-stage reduction

    Sat, 2011-01-29 21:58 -- John Hawks

    Simon Armitage and colleagues [1] describe archaeological remains from Jebel Faya, in the United Arab Emirates. The assemblages come from a rock shelter in the mountain, which is around 100 km south of the Straits of Hormuz, entry to the Persian Gulf. Below Bronze Age and later remains, are three Paleolithic units. The oldest (assemblage C in the paper) is dated by OSL to the last interglacial, around 125,000 years ago. My comments here are more note-like than usual; this topic opens a window into some work we've been recently doing.

    The authors' main conclusion is that the oldest assemblage displays technical similarities to East African archaeological assemblages, which are not present in the archaeology of the Levant either before or after this time. We have to dig into the supplementary material to the paper to get a good account of the technical similarities:

    Technologically, this assemblage has general links to East Africa (S3 S4) while showing none of the technological traits characteristic in the contemporaneous Levantine Mousterian (S5). As in the early Middle Stone Age (MSA) of East Africa, Assemblage C exhibits three profoundly different reduction strategies: bifacial, volumetric blade, and radial Levallois. This combination is unknown in the Levant after about 200 ka, where there is no bifacial reduction and the Levallois method is largely limited to unidirectional converging. The latter produced large numbers of Levallois points, which are absent from Assemblage C.

    For a layman's description of the result from coauthor Anthony Marks, I can recommend Katherine Harmon's account at Scientific American's website.

    I like the observation, but I think we should be cautious about it. The basic idea is that African assemblages display three different strategies early in the reduction sequence, none of which are evident in Levantine assemblages of equivalent age.

    Reduction sequences and conservatism

    Yesterday I talked over this concept with my graduate student Marc Kissel. I find it very interesting that the authors focused on initial reduction stages as elements of technical similarity. They thereby assume much about the cultural transmission of the reduction sequence.

    It seems reasonable that the initial steps of a reduction sequence -- from quarrying through early core shaping -- should be conservative. Early stages necessarily constrain the later steps toward finished tool production, so that a skilled toolmaker who wants to carry out the later stages of a reduction sequence has first to get the early steps right.

    Paradoxically a naive learner may be ill-equipped to attend to the importance of these first steps, compared to later steps where the preform is more readily identifiable by its physical configuration. Within a social group, the early steps of reduction may well be carried out by other people, including less-skilled artificers. The best toolmaker may go to the quarry himself, but often he may call on someone less skilled to carry out the initial reduction, or may be forced to work with partially exhausted cores from earlier attempts.

    I'm willing to hazard a guess that the social learning that enables tool manufacture would exert a bias toward low error rates early in the reduction sequence. We can consider a biological analogy -- early embryonic development is more strongly conserved across taxa (and phyla) than later development. Changing something early in a developmental sequence may make later events impossible. If I'm right, the argument by Armitage and colleagues should have some force -- finding that the early stages of the reduction sequence are shared among sites should be a better indicator of relationship than most archaeological indicators.

    But Armitage and colleagues' conclusion has force just to the extent that we accept two proposals: (1) that we understand the technical variation in the Levant, and (2) that independent development of the early-stage reduction strategies in the Jebel Faya assemblage is unlikely.

    These proposals hang together. The Levant is richly documented across the period before and after the last interglacial, moreso after OIS 6 (around 130,000 years ago) than before. These assemblages were directed toward convergent removal of Levallois points. I'm not immediately in a position to discuss the variation within these assemblages, but the question strikes me as crucial. Although the archaeological record from this area is relatively dense, like all places it samples only a small fraction of the actual groups that must have existed at the time -- to use a genetic comparison, the record has high coverage over a very small fraction of the regional behaviorome.

    Was independent invention of these early-stage reduction strategies likely? The answer depends on whether a particular early-stage reduction strategy is merely rare in the large Levantine sample, or entirely absent. If such a strategy (in this case, foliate reduction) occurs at all, we can infer that its invention was possible, if not likely. With assemblage C at Jebel Faya, we are considering the cultural tradition represented by 500 artifacts. If we treated these as a random sample of the Levantine record, they are exceedingly unusual, no doubt. But random sampling across an entire record isn't the correct comparison; we want some equivalent sampling of the cultural information in terms of time and space.

    The paper's conclusion that Jebel Faya represents an incursion of African-derived technical traditions into the Arabian peninsula depends on these assertions. I don't have strong feelings about them, but I think we should work to get a better statistical understanding about the issue. I am singularly unimpressed when archaeologists assert that one assemblage "resembles" another on purely typological grounds. Typological similarities may result from many constraints other than cultural information, and rare appearances actually carry a lot of information about them.

    Out of Africa early

    Now, what about this "southern route" business? I say it's a year behind the times. The entire reason for the "southern route" hypothesis was to explain how Africans could have left Africa 70,000 years ago without being stopped by Neandertals in the Levant. Sail them around the southern coast of Asia, and you can get them early into SE Asia and Australia without mixing with those darned Neandertals.

    We obviously don't need to rule out Neandertal interbreeding anymore. We know it happened, most likely in West Asia. Putting Africans into the Levant during the last interglacial isn't a bug, it's a feature. We need contact between moderns and Neandertals in this area to explain the genetic data.

    The dates may seem like more of a stumbling block. If we accept that a major out-of-Africa movement was underway by 70,000 years ago, we are going to have a hard time explaining why the Levant seems to have been entirely uninfluenced by it.

    But a 70,000-year-long chronology, based on estimates of mtDNA haplogroup divergences, is already out of kilter with the majority of evidence. Nuclear DNA suggests a substantially longer timescale, which would derive non-African and sub-Saharan populations from common ancestors before 140,000 years ago. Depending on the amount of mixture among these populations and the mutation rate we adopt, these populations may have begun to differentiate very early in the Middle Stone Age.

    It's hard to account for the diversity of people outside of Africa with a short migration timescale. People outside Africa are around 20 percent more inbred than sub-Saharan Africans, but they don't look like they underwent any sudden severe bottleneck. Even accounting for the mixture with archaic people like Neandertals and Denisovans, much of the variation of Middle Pleistocene humans (still present in Africa) just didn't get into non-Africans.

    I would propose a movement of MSA Africans into West Asia before the last interglacial as a model that provides a good fit to these data. An early movement followed by long interactions in this limited area would explain so much of the population structure and morphological variation of MSA Africans wasn't represented in the people who peopled Eurasia. A substantial delay between the initial entrance into West Asia and the dispersal to Europe and the rest of Asia would explain why the later archaeological transitions in those regions have no sign of immediate technical or cultural links to the MSA. It would also explain why the initial "modern" humans outside Africa share few if any derived morphological features with Africans after 100,000 years ago.

    The anatomy of the Skhul and Qafzeh samples suggests that an African incursion into the Near East did occur before 100,000 years ago. Many paleoanthropologists have supposed that this early incursion did not persist, even locally. The later Levantine sample includes individuals with more Neandertal resemblances, chiefly Amud and Kebara. But each of the later specimens shares several traits with early modern humans from Skhul or Qafzeh. Indeed there is no clear constellation of derived traits that sorts the Skhul-Qafzeh sample cleanly from Tabun 1 and the later Levantine specimens. I just don't think this skeletal record poses any problem for the idea of a long interaction of populations in this area -- especially if we extend the focus from the Levant into the Arabian peninsula and Persian Gulf region.

    The strongest reason to suppose that an African incursion was extinguished is not the skeletal record but instead the mtDNA timescale. I can refer readers to the paper by Endicott and colleagues [2], which discusses a range of mutation rate estimates and their effects on the origin of macrohaplogroups M and N, the key ancestral non-African lineages. Current estimates unanimously suggest that these clades originated within the last 75,000 years. By itself, this would suggest that the mtDNA common ancestors of non-Africans and sub-Saharan African populations diverged shortly before that time.

    I keep coming back to this, because the mtDNA just seems so out of line with the autosomal and X-chromosome picture. I regard this as a serious sticking point and hesitate to just wave it away. As I suggested to Charles Choi, the resolution may involve a time of isolation outside Africa during which the ancestors of non-Africans lost heterozygosity (and became enriched for the later mtDNA clades M and N). Or maybe we just have the mtDNA clock wrong -- the large revisions of the Neandertal-human mtDNA divergence in the light of developing evidence don't inspire confidence about the timing of internal nodes to the human mtDNA tree.

    The early archaeological assemblage from Jebel Faya strikes me as consistent with a model of early dispersal from Africa, but not especially good evidence for it. The outstanding question is whether the early reduction strategy is a behavioral trait that provides good evidence about biological relationships. I see the logic but think that it is tenuous.

    The model obviously is relevant to the question of an early presence of African-derived modern humans in India. If we combine the presence of an African-derived population in eastern Arabia with the large exposed Persian Gulf region during the last interglacial, this begins to look like a large habitable region with easy land connections to the Indus River valley. But the Indian subcontinent would potentially have been home to a very large population of ancient humans. I doubt that an occupation across the large area of West Asia plus the Indian subcontinent would have enabled the substantial reduction of heterozygosity that we see in present-day non-Africans.


    References

    Synopsis: 
    A 125,000-year-old site on the Arabian peninsula presents similarities with African MSA sites.
  • Bitwise consciousness

    Mon, 2010-09-20 19:51 -- John Hawks

    Carl Zimmer writes about theories of consciousness in today's Science NY Times, and describes the work of my Wisconsin colleague, Giulio Tononi.

    But Dr. Tononi’s theory is, potentially, very different. He and his colleagues are translating the poetry of our conscious experiences into the precise language of mathematics. To do so, they are adapting information theory, a branch of science originally applied to computers and telecommunications. If Dr. Tononi is right, he and his colleagues may be able to build a “consciousness meter” that doctors can use to measure consciousness as easily as they measure blood pressure and body temperature. Perhaps then his anesthesiologist will become interested.

    That's fortuitous because I'm lecturing about information theory tomorrow in my "Biology of Mind" course. The article goes on about how to measure consciousness using information theory terms. I'm not sure it's a practical theory of conscious experience, yet, but I think the information theory concepts are fundamentally important to understanding the adaptive evolution of brains on a more basic level.

    I'm always impressed reading back through Darwin, who a hundred years before information theory began to consider what we might describe as transmission properties of animal communication.

    As far as Tononi's ideas -- there is a logic here that is very appealing. Information is about encoding and transmission. Cryptography, for example, requires that we study the transmission properties of a channel to try to understand the encoding. That is, in a sense, what Tononi is proposing. Where most people have considered only the encoding properties, he proposes understanding the transmission properties.

  • Book notes: Free, by Chris Anderson

    Tue, 2009-09-22 11:22 -- John Hawks

    I read Chris Anderson's book because it was, well, "Free". The book's thesis is simple: Sometimes people profit by giving things away.

    I have been, for several years now, making scientific knowledge available for no cost to any readers who care to come by my site. In academic circles, this practice is ordinarily considered to be insane. Therefore, whenever I come across anything explaining why blogging isn't such a stupid idea, I put it right into my files. That's for Luddites on future promotion committees.

    How do I review the book without making points like a Slashdot comment thread?

    Somewhere in the book, Anderson wrote his plan for making money from Free: Get businesses to pay for the Chris Anderson "Free" seminar. The short business profiles and catchy anecdotes in the book were pretty well crafted as advertisements for the seminar. But beneath the chrome, there are some interesting -- sometimes wacky -- ideas about the nature of human economic interactions.

    Anderson describes the $10,000 wager between economist Julian Simon and Paul Ehrlich. Simon believed that commodity prices would not rise over the long term, betting that the "substitution effect" would spur people to find new technological solutions to replace expensive raw materials, thus lowering the commodity prices. Ehrlich believed that resource shortages were inevitable as population and economic pressures grew. In 1990, Simon collected on the bet, as the five commodity metals chosen by the two had all fallen, many substantially.

    Others have cast this story as a parable about bad predictions, or the inherent fallacy of future pessimism. Anderson gives the story a sociobiological spin:

    Humans are wired to understand scarcity better than abundance. Just as we've evolved to overreact to threats and danger, one of our survival tactics is to focus on the risk that supplies are going to run out. Abundance, from an evolutionary perspective, resolves itself, while scarcity needs to be fought over. The result is that despite Simon's victory, the world seemed to assume that Ehrlich, on some level, was still right.

    As [Wired's Ed] Regis noted, "Simon complained that, for some reason he could never comprehend, people were inclined to believe the very worst about anything and everything; they were immune to contrary evidence just as if they'd been medically vaccinated against the force of fact." Ehrlich's gloomy predictions continued (and continue) to have influence. Meanwhile Simon's own observations seem to be of interest only to commodities traders (49-50).

    If we really want to explain the phenomenon of "Free", we need to turn to psychology and sociology. At several points in the book, Anderson does connect to these fields -- mentioning the "Dunbar number" in the context of MySpace "friends", Lewis Hyde's work The Gift in the context of non-monetary economies, and Abraham Maslow's "Theory of human motivations" in the context of why bloggers write for free. But Anderson's goal is not to explain, but to popularize. So his use of academic sources is, well, eclectic. The "Dunbar number" is mostly an anthropological urban myth. There's a very deep literature on the gift in ethnology. Sure, there's no market for Mauss seminars on Anderson's lecture circuit, but there are some entertaining classic stories about confusion, gifts, and cross-cultural contacts.

    Anderson's theme simplifies this complexity of social interactions into a binary:

    There is a reason why economics is defined as the science of "choice under scarcity": In abundance you don't have to make choices, which means that you don't have to think about it at all (50).

    There's an anthropological claim -- that humans are "wired" as a "survival tactic" to perceive scarcity. It makes a good story. But is it true?

    Human lives are long. Sure, there are some essential resources so abundant that we don't need to think about them -- air, for example. But most resources vary over time or space. People have always needed to consider whether to stay or move, whether to hunt today or wait until tomorrow, to gather more firewood or risk the cold. It's the ant and the grasshopper.

    We might imagine a version of the Ehrlich-Simon bet during the Pleistocene. Imagine humans occupying a few abundant habitats with plenty of food. As the population grows, they put pressure on these habitats. What happens? On the Ehrlich side, resource scarcity might trigger a demographic crisis, with hunger, warfare, and a population crash. On the Simon side, people might expand their behavioral niche, moving into less favorable habitat with more complex cultural adaptations.

    Humans are tricky creatures, and our potential for increasing complexity depends on the level of complexity we've already reached. Only some parts of a complex system may be amenable to measurement. Anderson points out this problem from the standpoint of business information technology:

    When your phone company tells you that your voice mail box is full, that's artificial scarcity -- it costs less than a nickel to store one hundred voice messages, and the average iPod could store thirty thousand of them (voice messages are recorded at lower quality than music, so they take less space). By forcing subscribers to take the time to delete voice mails, the phone companies are saving a little money in storage costs by spending a lot of consumer time. They managed the scarcity they could measure (storage) but neglected to manage the much larger scarcity of their customers' goodwill. No wonder phone companies are second only to cable TV companies in the "most hated" rankings (191).

    Reading this made me think of behavioral science in the role of the stupid phone company. Natural selection optimizes fitness, that much is algorithmic. But how does this optimization process affect any given behavioral trait? That depends how the trait is connected to fitness, how heritable it is, and whether anything else correlated with the trait exerts its own independent negative effect on fitness. It's a mess, and we generally can't figure it out. So, we measure what we can. The fallacy is that what we can measure may have little connection to the important output -- for the phone company, profit per customer; for the biologist fitness.

    Today with an embarrassment of abundance of food and goods, people still agonize over choices. From one point of view, this is just a waste of time -- why argue over arbitrary markers of status, when the essential resources are super-abundant?

    But from a social point of view, seeking out limited information may be in our nature. This leads me to question whether Anderson is right about the value of information itself. Does it really trend toward free?

    Humans evolved to be users (and broadcasters) of social information, but in the past our communication was limited by many of the same constraints that other animals face. Animal communication is not free -- it comes with direct and indirect costs. The direct costs are energetic and developmental -- animals have to build and maintain the organs of communication, and supply the power to run them. The indirect costs are the perils of advertising: an animal that reveals itself runs a greater risk of predation. Worse, the peril of honest advertising is that potential mates may see what a loser you really are.

    The Internet might have enabled more complex systems of information presentation, but for the most part, people use it for old-fashioned reading. At its best, it has enabled a social transformation, empowering millions of people to use very simple means of information transfer -- from short-form blogs to messaging in World of Warcraft.

    That's "messaging", not "massaging".

    If the cost of information appears to be trending downward, it may be that's because the production of information is increasing with a lot higher slope than the production of money that might purchase it. Consider genomics: What would you be willing to pay for your genome today? Whatever your answer, you can expect that you would be willing to pay even less 10 years from now, unless the health value of that information radically increases.

    [T]he more products are made of ideas, rather than stuff, the faster they can get cheap. This is the root of the abundance that leads to Free in the digital world, which we today shorthand as Moore's Law.

    However, this is not limited to digital products. Any industry where information becomes the main ingredient will tend to follow this compound learning curve and accelerate in performance while it drops in price. Take medicine, which is shifting from "we don't know why it woks, it just does" (there's a reason it's called drug "discovery") to a process that starts with the first principles of molecular biology ("now we know why it works"). The underlying science is information, while observed efficacy is just anecdote. Once you understand the basics, you can create an abundance of better drugs, faster.

    DNA sequencing is falling in price by 50 percent every 1.9 years, and soon our individual genetic makeup will be another information industry. More and more medical and diagnostic services will be provided by software (which get cheaper, to the point of being free) as opposed to doctors (who get more expensive) (84).

    Once upon a time, the only diagnostic service was a doctor. Doctors offloaded some of their diagnostic effort as lab tests became more and more important. Nowadays, one of the major reasons for the increase in health care costs is the routine ordering of expensive tests. Doctors order these tests because they reduce risk -- risk of bad outcomes, and risk of malpractice suits. Risk is money.

    The least satisfactory chapter for me was about science fiction and abundance. Anderson argues throughout the book that humans are "wired" for scarcity; that we just don't understand abundance. In chapter 15, he turns to fictional worlds -- from E. M. Forster to Cory Doctorow -- in which some machine (or other invention) had created endless abundance. Invariably, these works describe how society degenerates when freed from scarcity -- freed from "striving", people are robbed of purpose.

    Anderson misses a darker connection. The people who worried about the degeneration of human moral purpose in the face of abundance also worried about our genetic degeneration. The eugenics movement was born in the same post-industrial society as science fiction, and for the same reason. Later in the book Anderson references the similarity:

    [R]eplace "free" with "steam" and you can imagine the Victorian concern about flabby muscles and minds (229).

    I wonder whether there is something inherently dystopian about a society where genetic information is too cheap to meter. In a world where risk is money, and very small risks are increasingly quantifiable, it is not hard to imagine an inexorable process toward removing freedom and imposing control. Certainly that has been the theme of many science fiction works.

    But rather than end on that depressing note, I'll point instead to a happy consequence of free information exchange: the creativity expressed in online communities:

    RuneScape, yet another Web-based world of orcs and elves, counts more than 1 million subscribers (out of more than 6 million users) paying $5 a month, creating a $60 million annual business. As a point of reference, that's about the same size as the subscriber user base and annual revenues of the Wall Street Journal's subscription-based Web site, which is the biggest paid site of all the world's newspapers. It's also larger than the New York Times's paid online subscriber base was before the paper dropped the model in favor of Free in 2008. It appears that people would rather pay to cast pretend spells than to read Pulitzer Prize-winning news. (I'll leave whether that's a good thing or a bad thing to others.) (150).

    People are using their power to make new things -- sometimes frivolous, fictitious things, but things that make them happy. It's possible that genetic information can serve this purpose, too -- a point I'll return to on a different topic tomorrow.

  • Data warehousing in genomics interview

    Tue, 2009-07-14 14:39 -- John Hawks

    Software publisher O'Reilly is running an interview with David Dooling, data chief of the Genome Sequencing Center at Washington University: "Sequencing a genome a week". If you want a little background on the current challenges in genomics, the history of genome sequencing technologies, and the infrastructure that allows modern bioinformatics, it's a really nice interview. Dooling is a speaker at the upcoming Open Source Convention (OSCON).

    A sample:

    James Turner: It sounds like there are a lot of informatics challenges with genomic data. There's the computational challenge of doing the sequence, which you mentioned. There's a challenge of managing the resulting data and finding meaning in it. And then there's the challenge of applying that understanding to a larger population. First of all, did I miss any of the challenges? And second of all, what are the unique problems in each set of those?

    David Dooling: Well, let me talk a little bit about each of the ones you did mention, and maybe that'll bring up some that you didn't. So as far as just analyzing the data and generating the data, that is computationally intensive because essentially what you're getting off of these new instruments is pictures, images. And you need to apply algorithms to detect features in those images and then translate those features accounting for different vagaries of chemistry, and then resulting in a sequence, a series of basic Gs, As, Cs and Ts, the building blocks of DNA. Once you have that information, there's a whole host of secondary analysis, or analysis of biological relevance if you want to think of it that way, that need to happen, and those are project-specific. So for some sorts of projects, for example a cancer project, you would want to find all of the ways that the DNA that you sequenced differs from the reference and then take -- for the tumor, let's say. And then for the normal, do the same thing. And then for all of those variants, find out which ones are unique to the tumor genome as compared to the normal genome.

    It's not an interview about biology; it's about technology and how people are working to enable us to test more and more detailed biological questions.

  • Learning, population size, and "modern human behavior"

    Fri, 2009-06-12 15:04 -- John Hawks

    I'm a big booster of the idea that human demographic expansion helped drive our recent evolution. So you might expect me to like the new paper by Adam Powell, Stephen Shennan and Mark Thomas, titled, "Late Pleistocene demography and the appearance of modern human behavior." Yet, I see a lot of weaknesses in the paper. I think the paper tries to sidestep several issues about "modern human behavior" that ought to be tackled head-on. In the end, the model in the paper can't describe the data the authors want to consider. Maybe they should have adopted a different model; maybe different data.

    I've taken a lot of notes about this -- too many for me to share, but I wanted to review the basic exposition of the paper, including why the authors think demography may determine technological change during the Late Pleistocene. I might post other notes later on the issue of genetic modeling of demography and its relevance for archaeology.

    The authors describe a model in which the density of a metapopulation determines the rate of increase (or decline) its cultural evolution, using simulations to extend analytical results from Henrich (2004). Follow their assumptions and you arrive at the conclusion that population density can, under certain conditions, constrain the trajectory of cultural change.

    The question is whether the model's assumptions can apply to the real world. Here's the abstract of the paper:

    The origins of modern human behavior are marked by increased symbolic and technological complexity in the archaeological record. In western Eurasia this transition, the Upper Paleolithic, occurred about 45,000 years ago, but many of its features appear transiently in southern Africa about 45,000 years earlier. We show that demography is a major determinant in the maintenance of cultural complexity and that variation in regional subpopulation density and/or migratory activity results in spatial structuring of cultural skill accumulation. Genetic estimates of regional population size over time show that densities in early Upper Paleolithic Europe were similar to those in sub-Saharan Africa when modern behavior first appeared. Demographic factors can thus explain geographic variation in the timing of the first appearance of modern behavior without invoking increased cognitive capacity.

    You can always tell what's supposed to be bad, it's the thing that you're not supposed to to "invoke". You know, like witches and vampires.

    The model

    In fact, "cognitive capacity", as a continuous, one-dimensional variable, underlies the model. In a nutshell, the model assumes that people learn behaviors by instantaneously absorbing the "skill" (which I'll call "mojo") from the best (highest "mojo") individual in their population. But they don't learn perfectly; their mojo ends up varying.

    Nevertheless the whole population is choosing one individual to copy, so what happens over time is that the population changes in one direction or the other. If the distribution among individuals includes a few with higher mojo, then the average amount of mojo should increase over time. Imagine if the whole population copied the running style of the best 100 m runner. The world record might reduce over time; and then people copy the new world record holder, and the average speeds up again, ad infinitum. There is stochastic variation from one step to the next -- sometimes it will increase more, sometimes less, and sometimes it may shrink a little. But the model is deterministic: depending on the distribution of mojo, it will either trend upward or downward.

    I picked the analogy because it points out a weakness of the model. There's no possibility of reaching an optimum, or a stasis. In fact, the survival value of "mojo" simply isn't part of the model, nor is the cost of developing mojo.

    OK, it's a simple model -- too simple to capture most aspects of reality. What value can it possibly have?

    The assumption is that some behaviors take more mojo than others. Some behaviors then will lie near a threshold where the population is just at the border between gaining or losing mojo over time. The fastest runner in the population might still be slower than last year's champion. If the population models the new winner, they might lose mojo on average.

    So the change in mojo doesn't depend on the current average; it depends on the distribution of the highest-mojo individual. That's an extreme value, and extreme values depend on the total number of individuals. There's some chance that the Jamaican national champion will be the Olympic gold medalist -- like last year. But on average the world champion is faster than the champion of any single country; the champion of a country is faster than the champion of any average local track club, and so on. Numbers make a difference. Add more individuals, and you have a better chance of a high extreme value -- a better chance in the model that mojo will increase.

    Again, the analogy shows the model's deficiencies. Local track clubs don't vary randomly. There are some local track clubs where the average 100 m time is pretty close to the Olympic champion's. In part this is because information isn't shared instantly and universally. There are both explicit dynamics and path-dependence: Jamaica's running team has been so successful in part because of recent investments in infrastructure, in part because of leadership from a few gifted coaches. And in large part it's because talent matters. Some people just have more running mojo.

    But the model does show that for a limited range of behaviors, population size (in Powell and colleagues' simulations, local population density) can exert a deterministic effect on the behavior of the population. Outside that range, the behavior will be dominated by non-demographic factors, such as intrinsic qualities of the learners.

    Deterministic versus stochastic models

    The question is whether the limited range of behaviors that might respond to demography are actually relevant to the archaeology. Unfortunately, there's no way to predict which behaviors ought to respond to demography in this way. You might find a really clever way to test the hypothesis, even without knowing -- that was one of the features of Henrich's (2004) paper that first presented the model. I think in the current case, we can start here: If the authors' model were true, then demography would exert a deterministic effect on technology. A larger population would have a higher average "skill" level, which (by the authors' model) would allow the development of more complex culture.

    When it comes to individual artifacts, demography's effect is stochastic. The development of technology has been path-dependent, with different populations following different paths. Sometimes those paths have included similar features, sometimes not. The same idea that spreads in some populations may fail to spread in others, despite the same demographic conditions.

    For example, the Aurignacian split-based bone point is an intrinsically unlikely artifact. Most people in the world did not produce them, even though bone points were fairly common, especially in groups who used small-projectiles. Carved ivory figurines, on the other hand, are not nearly so unlikely; many peoples in the world have produced them. But some populations did so at very low population sizes and densities, while others have made carved ivory figurines only after reaching very large population sizes with highly specialized division of labor. Large populations make it more likely that we'll see carved ivory figurines, among other things, but they do not determine that such figurines will be present. In other words, population size is one factor affecting the stochastic appearance of these artifacts.

    OK, but what if we try to generalize beyond individual artifacts or traditions and consider "modern human behavior" as a whole? Isn't there some general and abstract factor that might change deterministically with demography? To test that hypothesis, we need to (a) develop some accurate measure of the abstract factor, and (b) observe it to be deterministically influenced by demography.

    Here's an example: For our work on the acceleration of recent adaptive evolution, our hypothesis was that a deterministic model based on recent demographic expansion could describe the number of new selected mutations in human populations. We tested the hypothesis by developing a measure for selection, and by showing that the numbers of variants matched the predictions of the deterministic model. This global conclusion about the number of variants holds despite the fact that any particular case of selection on a gene depends on many stochastic factors, including the occurrence of a favorable mutation, its escape from genetic drift when rare, and the function of the gene relative to recent human ecological changes. In the limit of large numbers, these random processes do not obscure the deterministic effect of population size.

    Now, for archaeological observations, we could in principle follow the same procedure. If there is an abstract factor of "modern behavior", we might develop an accurate measure of it by understanding the relationship of the abstract factor and particular artifact types. That's the reason why archaeologists have devoted such extensive effort to defining "modern human behavior." The entire goal of defining "modern human behavior" is to make archaeology an instrument for measuring the cognitive advancement of prehistoric groups.

    Yes, there's some irony here. Many archaeologists don't want to "invoke" cognitive capacity, even as they define "modern behavior" as a proxy for it. Artifacts certainly change stochastically. If we wanted to test a stochastic model of change, we might as well use artifacts directly. But that might not allow us to test whether the demographic factor was more important than other factors, such as developmental or ecological ones. Can we expect some combination of artifacts to behave deterministically?

    The current paper chooses a simple threshold definition for the abstract factor: the Blombos incised ochre artifacts and pierced shells define the same level of "modern behavior" as the early Aurignacian of Europe. Why those two populations? Why those two behaviors? Why ignore much earlier engraved lines from other places, or pierced artifacts made by Neandertals? The paper doesn't make any serious effort to defend this measure of an abstract factor underlying "modern behavior".

    I think at a minimum, the authors need to show that their measure of "modern behavior" is replicable and predictive outside the context of these two populations. If engraved lines can be a threshold measure of "skill", then they should reliably appear in some contexts and not others. If pierced shells can stand in for other elements of behavior, like small game exploitation or projectile use, then show the strength of the correlation. If they can't stand in reliably for their abstract factor, then they need to find some combination of observations that can. If there is no combination of observations that proves reliable, then their model cannot validly apply.

    The second necessary element for testing the deterministic model is to show whether the measure is deterministically affected by demography. On this score, the paper is much more convincing: Their demographic model cannot explain the distribution of their measure of "modernity".

    Oh, I know, the conclusion of the paper says the opposite. But look at the data: The model predicts that southern Asia should have Upper Paleolithic-like industries beginning long before they appeared in Europe, and that southern Africa should have retained Upper Paleolithic-like behaviors throughout the last 90,000 years or more. Neither of those predictions holds up. The authors don't consider the mtDNA evidence for population growth in the New World (where art and ornamentation are rare among Paleoindians) or Australia (which underwent substantial complexification during the Holocene). The comparison of Europe and South Africa is an assumption of their measure, not a prediction or conclusion.

    The model really only gets one prediction correct: The West Asian record undergoes an Upper Paleolithic transition at around the same time as Europe. And even on that score, one may quibble: was the Levantine initial Upper Paleolithic earlier than Europe or later? Does the European mtDNA expansion, which mainly consists of mtDNA lineages derived from West Asia, record European demography or West Asian demography?

    They're left making a variety of ad hoc arguments to explain why the model doesn't fit the demography: maybe the mtDNA samples don't represent Late Pleisocene populations exactly; maybe the population really shrank in post-Howieson's Poort South Africa even though the mtDNA (and a lot of archaeology) say it didn't; maybe there were recurrent bottlenecks and expansions not covered by the mtDNA demographic models. When ad hoc hypotheses add up so quickly, there's often much more parsimonious option: maybe the model is wrong.

  • Will Wolfram make bioinformatics obsolete?

    Tue, 2009-03-17 12:33 -- John Hawks

    I was talking with a scientist last week who is in charge of a massive dataset. He told me he had heard complaints from many of his biologist friends that today's students are trained to be computer scientists, not biologists. Why, he asked, would we want to do that when the amount of data we handle is so trivial?

    Now, you have to understand, to this person a dataset of 1000 whole genomes is trivial. He said, don't these students understand that in a few years all the software they wrote to handle these data will be obsolete? They certainly aren't solving interesting problems in computer science, and in a short time, they won't be able to solve interesting problems in biology.

    I said, well, yeah. I've been through this once already -- fifteen years ago, the hot thing was setting up a wet lab for sequencing -- or worse, RFLP. That sure looked like a lot of data at the time, and a lot of students spent a lot of time figuring out how to do it. Some of them successfully started careers, got grants, and moved on with the times. Others fell by the wayside. Meanwhile, clusters of people at the DOE, Whitehead Institute, Wellcome Trust and several private companies were spending their time figuring out faster and faster ways of automating sequencing. Now one machine can do the work of ten thousand 1990's graduate students.

    Anyway, I've was thinking about that conversation. And then I ran across an article by Nova Spivack, describing the new Wolfram Alpha.

    Stephen Wolfram is building something new -- and it is really impressive and significant. In fact it may be as important for the Web (and the world) as Google, but for a different purpose. It's not a "Google killer" -- it does something different. It's an "answer engine" rather than a search engine.

    ...

    Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.

    For example, it contains formal models of much of what we know about science -- massive amounts of data about various physical laws and properties, as well as data about the physical world.

    Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.

    This sounds very pie-in-the-sky. And indeed, commenters on the article (as well as this article by Cycorp head Doug Lenat) come up with lots of questions that would be impossible for such a system to answer.

    But I'm not really interested in the things that will stump the system. Compared to restaurant reviews and kinship systems, bioinformatics is pretty simple. Right now, there are two things that make it a multi-year effort to learn: mutually incompatible databases, and the various kludges necessary to model ascertainment bias.

    I'm a Mathematica user, and am familiar with its theorem-proving capabilities. Mathematica already has genome lookup utilities, which I use quite often -- it's just easier to do a lookup on my own system than to plow through two or three webpages to get to the query. It really wouldn't take that much to bring intelligent and interactive genome analysis into the system.

    Alpha could turn into an online robot armed with basic genetics knowledge. And if not Alpha -- genetics is a logical priority for Wolfram, but it may not be the first or primary one -- certainly some other system using similar technology will emerge. Put it to work on public databases of genetic information, and you have a system that can resolve the incompatibilities by adding semantic knowledge. A bit of effort on existing databases would allow the resolution of discrepancies in ascertainment. Or, more likely, another couple of years of whole-genome sequencing will solve most of ascertainment biases by drowning them in new data.

    So it's not a stretch for me to imagine a year from now entering this search query:

    "List all human genes with significant evidence of positive selection since the human-chimpanzee common ancestor, where either the GO category or OMIM entry includes 'muscle'"

    It seems to me that bioinformatics is what generates the output to that query. What you do with the output of that query is evolutionary biology.

    So that raises the obvious question. Tomorrow's high-throughput plain-English bioinformatics tool will do the work of ten thousand 2009 graduate students. If a freely-available (or heck, even a paid) service can do the bioinformatics, what should today's graduate students be learning?

    UPDATE (2009-03-19):

    Some folks have interesting reactions to this post, including Thomas Mailund and Dan MacArthur. They make good points.

    I will add that I'm not arguing against modeling or simulation in biology. There are lots of interesting things in evolutionary biology you can do -- must do, in all practical terms -- with computers. But I don't like the five-year degree program in genetics where only one semester is given to population genetics, and most of the student's time is spent learning scripting, doing data entry, and figuring out ten or twelve database formats.

    I come back to my first example -- fifteen years ago, people were telling you how essential and wonderful sequencing would always be. If you're pursuing a five-year degree program and two or three years of postdoc, I hope you're thinking about what skills you'll need fifteen years from now.

  • A debate: information overload?

    Fri, 2009-01-09 11:32 -- John Hawks

    If you're looking for a way to waste your time today, you might check out The Economist's online debate, which focuses on the question of whether the world is getting more or less cultured. Or as they put it, "smarting up or dumbing down":

    Intelligent Life, The Economist's quarterly sister magazine, has been looking into what is happening to culture in Britain. The editor, Tim de Lisle, presents a mass of evidence that makes a seemingly irrefutable case: all over the country, more people are going to museums, visiting literary festivals and listening to classical music than ever before. If that isn't wising up, it is hard to know what is.

    Susan Jacoby, a scholar whose career began as a reporter on the Washington Post and whose writing now focuses on American intellectual history, sees no reason for Westerners to pat themselves on the back. The education bar, in the Anglo-American world, at least, she says, is being set lower and lower. Fewer and fewer people read books; instead they just hoover up information on the internet. After she wrote an article for her former paper on the decline of reading, she received a deluge of emails from people who said they were proud that they never read books at all. They couldn't see the point.

    The recurring issue in the debate seems to be whether people are using information in a deeper or more superficial way. Since both these terms are laden with moral value (always better to be deep than superficial, right?), one may wonder whether the real question isn't whether we feel better about ourselves or not.

    Indeed, the two participants devolve immediately into schoolmarmy arguments about whether "high culture" is thriving or not. So we have "increasing attendance at museums" on one side the balance and "decreasing market for hardbound fiction" on the other. Blah.

    It would be more interesting to consider the biocultural question: If our culture presents us with more information, do we actually get better at using it over time? There's no mention of the Flynn Effect in the debate, but it seems very relevant -- especially considering the worry that the Brits are "dumbing down".

  • Dating of Howieson's Poort and Still Bay industries

    Mon, 2008-11-03 00:21 -- John Hawks

    Zenobia Jacobs and colleagues have a paper in this week's Science that provides age estimates for two of the MSA industries of Southern Africa: the Howieson's Poort and Still Bay industries. Here's the abstract:

    The expansion of modern human populations in Africa 80,000 to 60,000 years ago and their initial exodus out of Africa have been tentatively linked to two phases of technological and behavioral innovation within the Middle Stone Age of southern Africa—the Still Bay and Howieson's Poort industries—that are associated with early evidence for symbols and personal ornaments. Establishing the correct sequence of events, however, has been hampered by inadequate chronologies. We report ages for nine sites from varied climatic and ecological zones across southern Africa that show that both industries were short-lived (5000 years or less), separated by about 7000 years, and coeval with genetic estimates of population expansion and exit times. Comparison with climatic records shows that these bursts of innovative behavior cannot be explained by environmental factors alone.

    It's a dating paper, and I like the dating parts. The review of why these two MSA industries are important, I think, overstates the issues to a considerable extent. Yes, there are some interesting elements of the two industries, but these are paralleled in some other MSA industries, both earlier and later, in East and North Africa -- not to mention the Neandertal-associated Middle Paleolithic industries of the Near East and Europe. There is no reason at all to suppose that Howieson's Poort (or the earlier Still Bay) was made by people who embarked from southern Africa on an "out of Africa exodus." The southern African sites are important enough for what they tell us about cultural variability; I don't see the need to exaggerate their significance to the global story.

    In many ways, the paper relies on similar methods as found in the 2007 paper by Michael Waters and Thomas Stafford, "Redefining the age of Clovis." In that paper, the authors applied a statistical model to new and existing radiocarbon dates, which allowed them to conclude that the age interval represented by Clovis sites is relatively narrow -- probably as little as 200 years.

    That conclusion has not gone unchallenged (e.g., Haynes et al. 2007), in particular on the basis of some earlier dates which might indicate an initially rare Clovis lasted for some time before a brief florescence. Anytime we have to deal with dates from different methods or different laboratories, there is the potential that some will be systematically different. Should we dismiss outliers? Or are they essential evidence of a more extensive time range, during which an industry was relatively rare? Hamilton and Buchanan (2007) found a spatial gradient in Clovis radiocarbon dates, suggesting that they represented a wave of advance from north to south. That observation doesn't refute the short chronology, it refines our notion of how long an industry should persist, and shows that it need not represent a spatially uniform population.

    In the current paper on Howieson's Poort and Still Bay dating, Jacobs and colleagues took the approach of systematically providing new OSL dates for nine sites. That deals with ambiguity about earlier dates and different methods quite simply: The authors did not rely on dates from other labs and sources. They do present a figure that puts other labs' dates in the context of their own results (they are consistent with the paper's conclusions), but these do not form the main interpretive context.

    The essential picture from the paper is figure 4:

    Howieson's Poort chronology

    This shows the cluster of dates that fit into Howieson's Poort phase, all consistent with a range from around 60,000 to 65,000 years ago, a cluster for the initial post-Howieson's Poort deposits, most consistent with a date around 57,000 years ago, and a smaller cluster of earlier, Still Bay levels. Considering the problems that have plagued OSL dating up to now, this is an impressive level of consistency. Comparing many dates from different sites gives a solid impression of a short time span for the technology.

    Unlike the case of Clovis, Jacobs and colleagues found no spatial pattern in the dates, even though they did look. The figure also shows paleoclimate evidence from ice cores; the Howieson's Poort appears to correspond to a long warming period, but it spans the range of climate from cold to warm. That's what the abstract means when it says that environmental factors do not suffice to explain the industry.

    I think the dates are important because of what they can tell us about cultural and biological variability within the MSA. From genetics, we know that the MSA African population was apparently structured, with a clear possibility that the genetic differentiation was once higher than today. If so, we might expect long-lasting cultural differences between African regions. We will need better dates across Africa---not just southern Africa---to really compare regions with each other. Howieson's Poort and Still Bay cultures are a start in this process.

    The short duration of the two industries is a very important fact. It was already suspected that the two existed for only a short time -- they are not found in every well-stratified site, and their recognition depends on a few relatively rare artifacts. A rare, high-information artifact is useful as a type fossil, but it is not likely to have persisted for very long in the cultural history of an ancient people.

    The data seem to indicate that Howieson's Poort lasted around 5000 years, and spanned an area of between 1.5 and 2 million square kilometers. That falls well within the ranges of time span and duration for the industries of the European Upper Paleolithic, and for that matter the later Middle Paleolithic of Europe. The Still Bay, even shorter and smaller, is also within this range. It will be important to assess whether other MSA variants and earlier Neandertal-associated industries of Europe and West Asia also fall within a cohesive distribution of time and space.

    My inclination is to interpret these cultural distributions in terms of information exchanges. In that regard, it is essential to consider smaller units of information transfer. An entire culture is inherited by no one. A stone tool manufacturing technique, on the other hand, may be manifested in multiple artifacts and may have been learned by many individuals over thousands of years. I would be very interested in the temporal patterning within the Howieson's Poort; a question that the dates may now allow archaeologists to answer.

    References:

    Jacobs Z, Roberts RG, Galbraith RF, Deacon HJ, Grün R, Mackay A, Mitchell P, Vogelsang R, Wadley L. 2008. Ages for the Middle Stone Age of Southern Africa: Implications for human behavior and dispersal. Science 322:733-735. doi:10.1126/science.1162219

    Hamilton MJ, Buchanan B. 2007. Spatial gradients in Clovis-age radiocarbon dates across North America suggest rapid colonization from the north. Proc Nat Acad Sci USA 104:15625-15630. doi:10.1073/pnas.0704215104

    Haynes G and 14 others. 2007. Comment on "Redefining the age of Clovis: Implications for the peopling of the Americas." Science 317:320. doi:10.1126/science.1141960

    Waters MR, Stafford TW, Jr. 2007. Redefining the age of Clovis: Implications for the peopling of the Americas. Science 315:1122-1126. doi:10.1126/science.1137166

    Synopsis: 
    A paper by Zenobia Jacobs et al. puts these industries into fairly tight time intervals.

Pages

Subscribe to information

Neandertals

For years, I've worked on their bones. Now I'm working on their genes. Read more about the science studying these ancient people.

Denisova

From a finger bone of an ancient human came the record of a completely unexpected population. My lab is working on the science of the Denisova genome.

Acceleration

The advent of agriculture caused natural selection to speed up greatly in humans. We're uncovering some of the ways that populations have rapidly changed during the last 10,000 years.

Malapa

Just outside Johannesburg, the Malapa site is producing some of the most exciting finds in human evolution. This site is the headquarters of the Malapa Soft Tissue Project.