Gould's "Unconscious Manipulation of Data"

OK, so I can't say it's not "brain science" because measuring skulls is as close to brain science as anthropology ever gets. But it just shouldn't be that hard to measure volume. It's a simple physical fact.

Sure, there are complexities in measuring the volume of an object with a complicated shape and holes, like a human skull. But this is not one of the world's great mysteries. Seal the holes, fill the skull with beads or shot or something, and pour it into a graduated cylinder. Junior high stuff.

Samuel Morton became famous in the mid 19th century as an empirical scientist for measuring the skulls of people from different parts of the world. Stephen Jay Gould claimed, first in Science in 1978 Gould:Morton:1978 and later in his book The Mismeasure of Man Gould:1981a, that Samuel Morton fudged his data on skull volumes. In Gould's telling, Morton began with a strong bias toward finding that Caucasians were the superior race and made several choices in measurement and reporting statistics that tended to confirm this bias. Gould's biggest claims, in a statistical sense, were fairly obscure statistical points about the tabulation of averages and treatment of subpopulations as compared to major race groups. One claim, however, was more memorable than all the rest -- the notion that Morton used seed to measure the skulls and packed it in harder with his thumb to increase the measured volume of "White" skulls.

Most people probably suppose Gould to have been an expert on the Morton collection, but in fact he never examined or measured the crania himself. A new paper in PLoS Biology by Jason Lewis and colleagues Lewis:Morton:2011 accomplishes what no one else did in the succeeding 30 years (despite one earlier attempt): they checked Gould's facts. They find that again and again, Gould misstated the evidence or simply made stuff up.

This is an important paper. The authors wrote in an even tone and lay out the facts in a very straightforward way. As a reader, I can't see how they managed to keep their cool. Some of Gould's mistakes are outrageous, with others it is hard for me to believe that the misstatements were not deliberate misrepresentations.

For example, let's take the story about pushing seed into the skulls. Here is a paragraph from Lewis and colleagues, with direct quotes from Gould:

Gould famously suggested that Morton's measurements may have been subject to bias: Plausible scenarios are easy to construct. Morton, measuring by seed, picks up a threateningly large black skull, fills it lightly and gives it a few desultory shakes. Next, he takes a distressingly small Caucasian skull, shakes hard, and pushes mightily at the foramen magnum with his thumb. It is easily done, without conscious motivation; expectation is a powerful guide to action [5]. While Gould offers this as only a plausible scenario, and did not remeasure any crania, subsequent authors have generally (and incorrectly) cited Gould as demonstrating that Morton physically mismeasured crania (e.g., [15]).

In other words, Gould made up the whole thing. It was an utter fabulation. It is disgraceful that later authors have cited this idea as fact.

When Lewis and colleagues examined Morton's numbers they found that there had been no bias in the direction Gould claimed. Measurements from seed had greater error than those from lead shot, in part (as Morton himself had written) because he employed an assistant for seed measurements early on, but later did these personally with shot.

Moreover, Lewis and colleagues systematically remeasured the volumes of a sample comprising half of the skulls Morton measured, and found no systematic bias, with the few deviations in Morton's data actually in the direction opposite his supposed bias.

With numbers like these, it is natural to wonder exactly where Gould came up with his idea that Morton's numbers were fudged. Here's how: Gould fudged his own numbers! I'm quoting here a long passage from the paper, because it is essential to understand Gould's full perfidy.

Gould also performed his own analysis of Morton's cranial capacity data and came to the conclusion that there are no differences to speak of among Morton's races ([1], italics in original). For Morton's 1839 seed-based measurements, Gould claims that Morton's Native American average capacity is artificially depressed by his inappropriate use of a straight mean (taking the average of each individual specimen in the entire sample) rather than a grouped mean (first taking the average of each Native American population subsample, then calculating the mean of those means), since the former is sensitive to differences in sample sizes between large headed populations and small headed populations. In fact, the grouped mean for Morton's Native American dataset is 79.9 in3, almost identical to the straight mean of 80.2 in3 (Dataset S3). So Morton's use of a straight mean actually slightly increased his Native American average. Gould's calculation of a higher Native American average (83.8 in3) is entirely a function of Gould omitting 34 crania (of 144) as coming from populations with samples of n<4 and, even by that criterion, erroneously excluding 6 crania, all with small cranial capacities (Dataset S3).
Gould's reanalysis of Morton's 1849 shot-based data resulted in a Native American mean capacity of 86 in3 rather than Morton's original 79 in3 [1]. Gould obtained his new average by again taking the group mean of Native American populations with four or more crania. But Gould also applied an additional restriction: he only included Native American crania that Morton had also previously measured with seed. This restriction is entirely arbitrary on Gould's part, as Morton's publications and analyses for his seed- and shot-based measurements are completely separate (1839 versus 1849), and Gould did not apply this restriction to the other groups he reanalyzed in Morton's shot-based data. If this restriction is lifted, Gould's Native American average would be reduced to about 83 in3, considerably below his reported 86 in3 (Dataset S3).

Here is the most sympathetic reading I can give to these facts. Gould systematically selected data from Morton's tables that tended to inflate the measured volumes of Native American crania. He did so by averaging some group means instead of overall means (although Lewis and colleagues show that Morton himself had used group means for many comparisons, contrary to Gould's claims), by excluding some small-skulled groups entirely (claiming sample size as a criterion), and by omitting crania that had not been measured in the earlier, seed-based analysis. There is no logical reason for these choices other than selection bias -- Gould began with a conclusion about Morton's unconscious motivations, and worked to confirm that conclusion by selecting some data and omitting contrary data.

Anyway, you can see why I find this outrageous. Gould used the well-documented work of a long-dead man to make an argument that unconscious bias is widespread in science. He posed as a concerned critic, but thereby cast doubt on the validity of the scientific enterprise. He picked volume measurement and tabulation of averages as his target, making it seem as if the simplest and most objective observations -- the Junior High-level science methods -- were themselves subject to all-encompassing cultural biases. His paper and book are very widely read and cited by people who will never examine the primary evidence. Gould owed us a responsible reading and trustworthy reporting on that evidence. In its place, he made up fictional stories, never directly examined the evidence himself, and misreported Morton's numbers.

This stuff really ticks me off. I don't think that Gould's errors can be written off as "unconscious bias". Reading back over his 1978 article, I cannot believe that Science published it.

The new paper is open access ("The Mismeasure of Science: Stephen Jay Gould versus Samuel George Morton on Skulls and Bias"), and I think that everyone should read it. The text is easy to follow, and the authors include clear answers to common questions about Morton's work and beliefs. It is a very suitable article for assignment in classes. They note that the basic issue here (endocranial volume of different groups) is largely explained by ecogeography -- the authors mention climate explicitly, but I would add body size and life history as parameters that covary with climate. Measurement of endocranial volume was cutting edge science in 1840, but I repeat, this is simple stuff.