Essays and Comments by Year:

Comment on G. Shafer. Testing By Betting. (Royal Statistical Society A)

Harry Crane (2020).

Abstract

Betting is central to the history of probability and the way probability is intuitively understood, making it both natural to link betting to statistical analysis and curious that this connection is absent from conventional statistical thinking. Betting appears in Bayesian foundations, but as a philosophical relic rather than a substantive com- ponent of the framework. When the devout Bayesian talks about betting, he only cares that the bettor’s probabilities make sense (are coherent), not whether they make money. Shafer’s view is more palatable for science, where personal opinions should take a back seat to objective reality—though sadly this isn’t always the case [3].

Shafer’s proposal may improve how statistical work is communicated, but it stops short of what’s needed to resolve systemic statistical abuse. In advocating his theory, Shafer writes, “I need not risk a lot of money. [...] I am betting merely to make a point”. But what’s the point of a fictitious bet?

In gambling parlance, a freeroll is a bet that can be won but not lost. In scien- tific work, the Freeroll Effect occurs when scientists incur minimal personal risk in exchange for broad societal impact. Scientists are rewarded for publishing their re- search in high-impact journals while society bears the risk of inaccurately reported findings and their potentially dire consequences. The replication crisis, misinformed Covid-19 response, and muddled climate policies are all consequences of the Freeroll Effect [2].

So, yes, the amount risked does matter. As any gambler knows, there’s a big difference between betting a penny and a thousand dollars. It isn’t “irrational”, as a Bayesian might claim. It’s common sense. We should hope that scientists exercise this same sense before publishing research that burdens society with substantial risk.

Fortunately, mathematical probability has a built-in property to achieve this ob- jective, called the Fundamental Principle of Probability (FPP) [1]. Under the FPP, statistical claims are meaningless, and should be disregarded, unless the statistician faces real-world consequences for being wrong. Some critics of the FPP offer the jejune moral objection that gambling is lowbrow, having no place in science. Aside from their deep misunderstanding of risk and its central role in probability, such critics exhibit disregard for the serious practical problems that can be resolved by appealing to risk in statistical practice. So I advocate to take Shafer’s proposal even more seriously than he suggests, by restoring fundamental principles of probability and risk to statistical work, not simply using a different language.

References:

[1] H. Crane. (2018). The Fundamental Principle of Probability. Researchers.One, https://www.researchers.one/article/2018-08-16.
[2] H. Crane. (2020). Naive Probabilism. Researchers.One, https://www.researchers.one/article/2018-03-9.
[3] H. Crane, J. Guinness and R. Martin. (2020). Comment on the Proposal to Rename the R.A. Fisher Lecture. Researchers.One, https://www.researchers.one/article/2020-06-11.

A fiasco in the making: More data is not the answer to the coronavirus pandemic

Harry Crane (2020).

Abstract

A response to John Ioannidis's article A fiasco in the making? As the coronavirus pandemic takes hold, we are making decisions without reliable data (March 17, 2020) in which he downplays the severe risks posed by coronavirus pandemic.

Fundamental Principle of Probability

Harry Crane (2019). Researchers.One.

Abstract

I make the distinction between academic probabilities, which are not rooted in reality and thus have no tangible real-world meaning, and real probabilities, which attain a real-world meaning as the odds that the subject asserting the probabilities is forced to accept for a bet against the stated outcome. With this I discuss how the replication crisis can be resolved easily by requiring that probabilities published in the scientific literature are real, instead of academic. At present, all probabilities and derivatives that appear in published work, such as P-values, Bayes factors, confidence intervals, etc., are the result of academic probabilities, which are not useful for making meaningful assertions about the real world.

Why "Redefining Statistical Significance" Will Not Improve Reproducibility and Might Make the Replication Crisis Worse

Harry Crane (2018).

Abstract

A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two and replication rates would double simply by changing the conventional cutoff for 'statistical significance' from P<0.05 to P<0.005. I analyze the veracity of these claims, focusing especially on how Benjamin, et al neglect the effects of P-hacking in assessing the impact of their proposal. My analysis shows that once P-hacking is accounted for the perceived benefits of the lower threshold all but disappear, prompting two main conclusions: (i) The claimed improvements to false positive rate and replication rate in Benjamin, et al (2017) are exaggerated and misleading. (ii) There are plausible scenarios under which the lower cutoff will make the replication crisis worse.

Commentary by Andrew Gelman.

Response by one of the authors (E.J. Wagenmackers) Bayesian Spectacles Blog.

More discussion by Tim van der Zee.

Comment on F. Caron and E.B. Fox. Sparse graphs using exchangeable random measures. (Royal Statistical Society B)>

Harry Crane. (2017). Journal of the Royal Statistical Society, Series B, 79, Part 5.

Caron and Fox tout their proposal as

"the first fully generative and projective approach to sparse graph modelling [...] with a notion of exchangeability that is essential for devising our scalable statistical estimation procedure." (p. 12, emphasis added).

In calling theirs the first such approach, the authors brush aside prior work of Barabasi and Albert (1999), whose model is also generative, projective, and produces sparse graphs. The Barabasi–Albert model is not exchangeable, but neither is the authors’. And while the Barabasi–Albert model is inadequate for most statistical purposes, the proposed model is not obviously superior, especially with respect to the highlighted criteria above.

Generative. Though amenable to simulation, the obscure connection between Kallenberg’s theory of exchangeable CRMs and the manner in which real networks form makes it hard to glean much practical insight from this model. At least Barabasi and Albert’s preferential attachment mechanism offers a clear explanation for how sparsity and power law might arise in nature. I elicit no such clarity from the Caron–Fox model.

Projective. Projectivity is important for relating observed network to unobserved population, and is therefore crucial in applications for which inferences extend beyond the sample. Without a credible sampling interpretation, however, the statistical salience of projectivity is moot. Here projectivity involves restricting point processes in the two-dimensional real plane to bounded rectangles, whose best known interpretation via p-sampling (Veitch and Roy, 2016) seems unnatural for most conceivable networks applications, including those in Section 8.

Exchangeability and Sparsity. A certain nonchalance about whether and how this model actually models real networks betrays an attitude that sees sparsity as an end in itself and exchangeability as a means to that end. Even the authors acknowledge that "exchangeability of the point process [...] does not imply exchangeability of the associated adjacency matrix" (p. 3). So why all the fuss about exchangeability if its primary role here is purely utilitarian? To me, the pervasiveness of "exchangeability" throughout the article is but a head fake for unsuspecting statisticians who, unlike many modern machine learners, understand that exchangeability is far more than a computational contrivance.

Final Comment. The Society’s Research Section is all too familiar with the Crane–Dempsey edge exchangeable framework, which meets the above four criteria while staying true to its intended application of interaction networks. For lack of space, I refer the reader to Crane and Dempsey (2015, 2016) for further discussion.

Comment on A. Gelman and C. Hennig. Beyond subjective and objective in statistics. (Royal Statistical Society A)

Harry Crane. (2017). Journal of the Royal Statistical Society, Series A, 180, Part 4.

I applaud the authors’ advocacy for subjectivity in statistical practice and appreciate the overall attitude of their proposal. But I worry that the proposed virtues will ultimately serve as a shield to deflect criticism, much like objectivity and subjectivity often do now. In other words, won’t acceptance of 'virtue' as a research standard in short order be supplanted by the "pursuit to merely appear" virtuous?

I believe Gelman and Hennig when they assert, "[W]e repeatedly encounter publications in top scientific journals that fall foul of these virtues" (p. 27). I’m less convinced, however, that this "indicates [...] that the underlying principles are subtle". This conclusion seems to conflate doing science and publishing science. In fact I suspect that most scientists are more or less aware of these virtues, and many would agree that these virtues are indeed virtuous for doing science. But I’d expect those same scientists to acknowledge that some of these virtues may be regarded as vices in the publishing game. Just think about the lengths to which journals go to maintain the appearance of objectivity. They achieve this primarily through peer review, which promises transparency, consensus, and impartiality, three of Gelman and Hennig's 'virtues', but rarely delivers either. It should be no surprise that a system so obsessed with appearances also tends to reward research that 'looks the part'. As "communication is central to science" (p. 6) and publication is the primary means of scientific communication, is it any wonder that perverse editorial behaviors heavily influence which virtues are practiced and which are merely preached?

Finally, I ask: just as statistical practice is plagued by the "pursuit to merely appear objective", is science not also plagued by the pursuit to 'appear statistical'? Judging from well publicized issues, such as p-hacking (Gelman and Lokin, 2014; Nuzzo, 2014; Wasserstein and Lazar, 2016), and my own conversations with scientists, I’d say so. To borrow from Feyerabend (2010, p. 7), "The only principle that does not inhibit progress is: anything goes". So why not simply encourage scientists to make convincing, cogent arguments for their hypotheses however they see fit, without having to check off a list of 'virtues' or run a battery of statistical tests.

Wasserman (2012) invites us to imagine "a world without referees". Instead, I’m envisioning a world without editors, journals, or statistics lording over science and society. Without 'objectivity' obscuring the objective, and without 'virtues' standing in the way of ideals. That world looks pretty good to me.

The modern-day snake oil salesman

Harry Crane. December 2, 2016.

Description:

Commentary on how probabilistic predictions can be both correct and meaningless at the same time, with a focus on the 2016 presidential election.

Rejoinder: The ubiquitous Ewens sampling formula

Harry Crane. (2016). Statistical Science, 31(1):37-39.

Description:

Some concluding remarks regarding my 2016 article on The ubiquitous Ewens sampling formula, which was discussed by Arratia, Barbour & Tavare, Favaro & James, Feng, McCullagh, and Teh.