Betting is central to the history of probability and the way probability is intuitively understood, making it both natural to link betting to statistical analysis and curious that this connection is absent from conventional statistical thinking. Betting appears in Bayesian foundations, but as a philosophical relic rather than a substantive com- ponent of the framework. When the devout Bayesian talks about betting, he only cares that the bettor’s probabilities make sense (are coherent), not whether they make money. Shafer’s view is more palatable for science, where personal opinions should take a back seat to objective reality—though sadly this isn’t always the case [3].
Shafer’s proposal may improve how statistical work is communicated, but it stops short of what’s needed to resolve systemic statistical abuse. In advocating his theory, Shafer writes, “I need not risk a lot of money. [...] I am betting merely to make a point”. But what’s the point of a fictitious bet?
In gambling parlance, a freeroll is a bet that can be won but not lost. In scien- tific work, the Freeroll Effect occurs when scientists incur minimal personal risk in exchange for broad societal impact. Scientists are rewarded for publishing their re- search in high-impact journals while society bears the risk of inaccurately reported findings and their potentially dire consequences. The replication crisis, misinformed Covid-19 response, and muddled climate policies are all consequences of the Freeroll Effect [2].
So, yes, the amount risked does matter. As any gambler knows, there’s a big difference between betting a penny and a thousand dollars. It isn’t “irrational”, as a Bayesian might claim. It’s common sense. We should hope that scientists exercise this same sense before publishing research that burdens society with substantial risk.
Fortunately, mathematical probability has a built-in property to achieve this ob- jective, called the Fundamental Principle of Probability (FPP) [1]. Under the FPP, statistical claims are meaningless, and should be disregarded, unless the statistician faces real-world consequences for being wrong. Some critics of the FPP offer the jejune moral objection that gambling is lowbrow, having no place in science. Aside from their deep misunderstanding of risk and its central role in probability, such critics exhibit disregard for the serious practical problems that can be resolved by appealing to risk in statistical practice.
So I advocate to take Shafer’s proposal even more seriously than he suggests, by restoring fundamental principles of probability and risk to statistical work, not simply using a different language.
References:
[1] H. Crane. (2018). The Fundamental Principle of Probability. Researchers.One, https://www.researchers.one/article/2018-08-16.
[2] H. Crane. (2020). Naive Probabilism. Researchers.One, https://www.researchers.one/article/2018-03-9.
[3] H. Crane, J. Guinness and R. Martin. (2020). Comment on the Proposal to Rename the R.A. Fisher Lecture. Researchers.One, https://www.researchers.one/article/2020-06-11.
A response to John Ioannidis's article A fiasco in the making? As the coronavirus pandemic takes hold, we are making decisions without reliable data (March 17, 2020) in which he downplays the severe risks posed by coronavirus pandemic.
I make the distinction between academic probabilities, which are not rooted in reality and thus have no tangible real-world meaning, and real probabilities, which attain a real-world meaning as the odds that the subject asserting the probabilities is forced to accept for a bet against the stated outcome. With this I discuss how the replication crisis can be resolved easily by requiring that probabilities published in the scientific literature are real, instead of academic. At present, all probabilities and derivatives that appear in published work, such as P-values, Bayes factors, confidence intervals, etc., are the result of academic probabilities, which are not useful for making meaningful assertions about the real world.
A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two and replication rates would double simply by changing the conventional cutoff for 'statistical significance' from P<0.05 to P<0.005. I analyze the veracity of these claims, focusing especially on how Benjamin, et al neglect the effects of P-hacking in assessing the impact of their proposal. My analysis shows that once P-hacking is accounted for the perceived benefits of the lower threshold all but disappear, prompting two main conclusions: (i) The claimed improvements to false positive rate and replication rate in Benjamin, et al (2017) are exaggerated and misleading. (ii) There are plausible scenarios under which the lower cutoff will make the replication crisis worse.
Commentary by Andrew Gelman.
Response by one of the authors (E.J. Wagenmackers) Bayesian Spectacles Blog.
More discussion by Tim van der Zee.
Caron and Fox tout their proposal as
"the first fully generative and projective approach to sparse graph modelling [...] with a notion of exchangeability that is essential for devising our scalable statistical estimation procedure." (p. 12, emphasis added).
In calling theirs the first such approach, the authors brush aside prior work of Barabasi and Albert (1999), whose model is also generative, projective, and produces sparse graphs. The Barabasi–Albert model is not exchangeable, but neither is the authors’. And while the Barabasi–Albert model is inadequate for most statistical purposes, the proposed model is not obviously superior, especially with respect to the highlighted criteria above.
I applaud the authors’ advocacy for subjectivity in statistical practice and appreciate the overall attitude of their proposal.
But I worry that the proposed virtues will ultimately serve as a shield to deflect criticism, much like objectivity and subjectivity
often do now. In other words, won’t acceptance of 'virtue' as a research standard in short order be supplanted by the "pursuit to merely appear" virtuous?
I believe Gelman and Hennig when they assert, "[W]e repeatedly encounter publications in top scientific journals that fall foul of these virtues" (p. 27).
I’m less convinced, however, that this "indicates [...] that the underlying principles are subtle".
This conclusion seems to conflate doing science and publishing science.
In fact I suspect that most scientists are more or less aware of these virtues, and many would agree that these virtues are indeed virtuous for
doing science. But I’d expect those same scientists to acknowledge that some of these virtues may be regarded as vices in the publishing game.
Just think about the lengths to which journals go to maintain the appearance of objectivity.
They achieve this primarily through peer review, which promises transparency, consensus, and impartiality, three of Gelman and Hennig's 'virtues',
but rarely delivers either. It should be no surprise that a system so obsessed with appearances also tends to reward research that 'looks the part'. As "communication is central to science" (p. 6) and publication
is the primary means of scientific communication, is it any wonder that perverse editorial behaviors heavily influence which virtues are practiced and which are merely preached?
Finally, I ask: just as statistical practice is plagued by the "pursuit to merely appear objective",
is science not also plagued by the pursuit to 'appear statistical'? Judging from well publicized issues, such as p-hacking
(Gelman and Lokin, 2014; Nuzzo, 2014; Wasserstein and Lazar, 2016), and my own conversations with scientists, I’d say so.
To borrow from Feyerabend (2010, p. 7), "The only principle that does not inhibit progress is: anything goes".
So why not simply encourage scientists to make convincing, cogent arguments for their hypotheses however they see fit,
without having to check off a list of 'virtues' or run a battery of statistical tests.
Wasserman (2012) invites us to imagine "a world without referees". Instead, I’m envisioning a world without editors,
journals, or statistics lording over science and society. Without 'objectivity' obscuring the objective, and without 'virtues'
standing in the way of ideals. That world looks pretty good to me.
Commentary on how probabilistic predictions can be both correct and meaningless at the same time, with a focus on the 2016 presidential election.
Some concluding remarks regarding my 2016 article on The ubiquitous Ewens sampling formula, which was discussed by Arratia, Barbour & Tavare, Favaro & James, Feng, McCullagh, and Teh.