Talks and Videos

Cognitum Episode 5 // Replication + Probability

December 28, 2018 on Cognitum with Iosif M Gershteyn


Cognitum's Iosif M Gershteyn discusses the scientific replication crisis with Harry Crane, Professor at Rutgers University and Co-founder of Researchers.ONE. Crane provides a systems-level explanation of statistical errors and disincentives in the conventional paradigm, and the solution offered by the new peer review initiative at Researchers.One.

Discussion of Researchers.One

November 5, 2018, Department 12 IO Psych Podcast


Harry Crane and Ryan Martin discuss Researchers.One on the Department 12 IO Psychology Podcast with Ben Butina.

Replication Crisis, Prediction Markets, and the Fundamental Principle of Probability

October 22, 2018 at Rutgers, Philosophy Department, Foundations of Probability Seminar


I will discuss how ideas from the foundations of probability can be applied to resolve the current scientific replication crisis. I focus on two specific approaches:

1. Prediction Markets to incentivize more accurate assessments of the scientific community’s confidence that a result will replicate.
2. The Fundamental Principle of Probability to incentivize more accurate assessments of the author’s confidence that their own results will replicate.

I compare and contrast the merits and drawbacks of these two approaches.

The article associated to this talk can be found at

Lecture Handout


The Shape of Probability

May 14, 2018 on The Jolly Swagmen podcast


In this episode, Joe catches up with Harry in New Jersey to discuss the history of probability, how experts routinely misuse statistics, “The First Rule of Probability”, the replication crisis in science, and how Harry thinks about probability in shapes rather than numbers.

Link to Episode

Why "redefining statistical significance" will make the reproducibility crisis worse

November 27, 2017 at Rutgers, Philosophy Department, Foundations of Probability Seminar


A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two and fall as low as 5%, and replication rates would double simply by changing the conventional cutoff for 'statistical significance' from P<0.05 to P<0.005. I will survey several criticisms of this proposal and also analyze the veracity of these major claims, focusing especially on how Benjamin, et al neglect the effects of P-hacking in assessing the impact of their proposal. My analysis shows that once P-hacking is accounted for the perceived benefits of the lower threshold all but disappear, prompting two main conclusions:

(i) The claimed improvements to false positive rate and replication rate in Benjamin, et al (2017) are exaggerated and misleading.

(ii) There are plausible scenarios under which the lower cutoff will make the replication crisis worse.

My full analysis can be downloaded here.

Coherence in Statistical Modeling of Networks

October 26, 2017 at Johns Hopkins, Applied Mathematics and Statistics Department


George Box famously said, "All models are wrong, but some are useful." Classic texts define a statistical model as "a set of distributions on the sample space" (Cox and Hinkley, 1976; Lehman, 1983; Barndorff-Nielson and Cox, Bernardo and Smith, 1994). Motivated by some longstanding questions in the analysis of network data, I will examine both of these statements, first from a general point of view, and then in the context of some recent developments in network analysis. The confusion caused by these statements is clarified by the realization that the definition of statistical model must be refined --- it must be more than just a set. With this, the ambiguity in Box's statement --- e.g., what determines whether a model is 'wrong' or 'useful'? --- can be clarified by a logical property that I call 'coherence'. After clarification, a model is deemed useful as long as it is coherent, i.e., inferences from it 'make sense'. I will then discuss some implications for the statistical modeling of network data.

Probabilities as Shapes

April 10, 2017 at Rutgers, Philosophy Department, Foundations of Probability Seminar


In mathematics, statistics, and perhaps even in our intuition, it is conventional to regard probabilities as numbers, but I prefer instead to think of them as shapes. I'll explain how and why I prefer to think of probabilities as shapes instead of numbers, and will discuss how these probability shapes can be formalized in terms of infinity groupoids (or homotopy types) from homotopy type theory (HoTT).

Markov process models for time-varying networks

December 13, 2016 at Isaac Newton Institute, Cambridge University


Many models for dynamic networks, such as the preferential attachment model, describe evolution by sequential addition of vertices and/or edges. Such models are not suited to networks whose connectivity varies over time, as in social relationships and other kinds of temporally varying interactions. For modeling in this latter setting, I develop the general theory of exchangeable Markov processes for time-varying networks and discuss relevant consequences.

Tamara Broderick (MIT) Research Misconduct - Edge Exchangeability

November 20, 2016


A NIPS 2016 paper by Tamara Broderick, Diana Cai and Trevor Campbell claims credit for the idea of edge exchangeability with a misleading and improper attribution of my prior work with Walter Dempsey. Edge exchangeability was introduced over a year ago in a paper by Harry Crane and Walter Dempsey. Much of the material in the Broderick-Cai-Campbell paper resembles that of Crane-Dempsey without acknowledgement.

Edge exchangeability: a new foundation for modeling network data

July 25, 2016 at Isaac Newton Institute, Cambridge University


Exchangeable models for countable vertex-labeled graphs can- not replicate the large sample behaviors of sparsity and power law degree distribution observed in many network datasets. Out of this mathematical impossibility emerges the question of how network data can be modeled in a way that reflects known empirical behaviors and respects basic statistical principles. We address this question by observing that edges, not vertices, act as the statistical units in networks constructed from interaction data, making a theory of edge-labeled networks more natural for many applications. In this context we introduce the concept of edge exchangeability, which unlike its vertex exchangeable counterpart admits models for net- works with sparse and/or power law structure. Our characterization of edge exchangeable networks gives rise to a class of nonparametric models, akin to graphon models in the vertex exchangeable setting. Within this class, we identify a tractable family of distributions with a clear interpretation and suitable theoretical properties, whose significance in estimation, prediction, and testing we demonstrate.

Pattern avoidance for random permutations

February 18, 2016 at Rutgers University, Experimental Mathematics Seminar


A classic question of enumerative combinatorics is: How many permutations of {1,...,n} avoid a given pattern? I recast this question in probabilistic terms: What is the probability that a randomly generated permutation of {1,...,n} avoids a given pattern?
I consider this question for the Mallows distribution on permutations, of which the uniform distribution is a special case. I discuss how the probabilistic technique of Poisson approximation can be applied to bound the total variation distance between the Poisson distribution and the distribution of the number of occurrences of a fixed pattern in a random permutation. In the special case of the uniform distribution, we obtain bounds on the number of pattern avoiding permutations of all finite sizes.

Random partitions and permutations

November 20, 2014 at Rutgers University, Experimental Mathematics Seminar


Historically, enumerative combinatorics and discrete probability theory are closely related through uniform probability distributions on finite sets. I will first explain why the uniform distribution is unnatural in some modern applications and then survey several aspects of non-uniform random partitions and permutations. The discussion touches on ideas from enumerative combinatorics, algebra, probability, and statistics. I assume no prior knowledge.

Random partitions and permutations (Part 1) from Experimental Mathematics on Vimeo.

Random partitions and permutations (Part 2) from Experimental Mathematics on Vimeo.