George Cowgill is an archaeologist with a long interest in promoting the unfortunately rare good use of statistics by archaeologists. He has a paper within the current Annual Review of Anthropology, titled “Some Things I Hope You Will Find Useful Even if Statistics Isn’t Your Thing”, which is full not only of wisdom but also highly quotable paragraphs.
Some are even useful for biologists, such as:
If you are worried about data quality, reducing data to “present” and “absent” just makes the problem worse, unless you are sure that absence in the sample unambiguously implies absence in the relevant population. But a category that is scarce but present in the population will be totally absent in many samples from that population. The chance that it is absent in any one sample strongly depends on the size of that sample. This makes “presence versus absence” a very unstable statistic. If you want to be intentionally vague and conservative, it would be much better to use terms such as “way below average,” “about average,” and “way above average.”
Paleoanthropologists use an awful lot of “present” and “absent” when assessing the relationships of fossil samples, often based on a single individual. Of course, I wrote something about that myself (Hawks 2004), showing that small fossil samples scored as “present” or “absent” for traits lead much of the time to incorrect phylogenetic conclusions.
Back to Cowgill, there is also this wonderful paragraph:
In fact, archaeologists have been astonishingly ready to assume that their findings are highly reliable (repeated studies would get almost the same results) and very valid (their findings are a nearly unbiased sample of the population of interest). Leaving these assumptions unchallenged is nothing short of a conspiracy of silence, and it is not going too far to call it a dirty little secret. Over the years I have compiled a bibliography of more than 4,000 publications on a wide variety of archaeological topics. Among other things, I have been on the lookout for publications reporting studies of reliability and/or validity, wherein sites were recollected or resurveyed or different observers independently classified the objects in a collection. I have located just 20, of which the most recent and most telling is that by Heilen & Altschul (2013). The bad news is that the level of reliability and/or validity is often shockingly low. If you believe your results can be trusted without any checking, you are fooling yourself and doing the profession a disservice.
He emphasizes the ethical need to make and curate collections as an essential matter of allowing repeated observations and checking:
To be sure, not all sites can be redug or resurveyed. Some sites can be, and even when that is not possible, curated collections can be restudied; the frequent need for restudying them is a major reason why it is essential to make collections and is irresponsible to imagine that recording observations without making collections is adequate.
In archaeology, it is usually observers and not machines that produce the data. While you can reverse-engineer the way a machine scores observations, it is much less easy to do so for a human observer, and impossible without ready access to the collections that human observer has studied.
Cowgill GL. 2015. Some Things I Hope You Will Find Useful Even if Statistics Isn’t Your Thing. Annual Review of Anthropology 44:1–14 doi:10.1146/annurev-anthro-102214-013814
Hawks J. 2004. How much can cladistics tell us about hominid relationships? American Journal of Physical Anthropology 125: 207-219. doi:10.1002/ajpa.10280