Featured Articles

Edge exchangeable models for network data

Harry Crane and Walter Dempsey (2016).

Abstract

Exchangeable models for countable vertex-labeled graphs can- not replicate the large sample behaviors of sparsity and power law degree distribution observed in many network datasets. Out of this mathematical impossibility emerges the question of how network data can be modeled in a way that reflects known empirical behaviors and respects basic statistical principles. We address this question by observing that edges, not vertices, act as the statistical units in networks constructed from interaction data, making a theory of edge-labeled networks more natural for many applications. In this context we introduce the concept of edge exchangeability, which unlike its vertex exchangeable counterpart admits models for net- works with sparse and/or power law structure. Our characterization of edge exchangeable networks gives rise to a class of nonparametric models, akin to graphon models in the vertex exchangeable setting. Within this class, we identify a tractable family of distributions with a clear interpretation and suitable theoretical properties, whose significance in estimation, prediction, and testing we demonstrate.

A framework for statistical network modeling

Harry Crane and Walter Dempsey (2016).

Abstract

Basic principles of statistical inference are commonly violated in network data analysis. Under the current approach, it is often impossible to identify a model that accommodates known empirical behaviors, possesses crucial inferential properties, and accurately models the data generating process. In the absence of one or more of these properties, sensible inference from network data cannot be assured. Our proposed framework decomposes every network model into a (relatively) exchangeable data generating process and a sampling mechanism that relates observed data to the population network. This framework, which encompasses all models in current use as well as many new models, such as edge exchangeable models, that lie outside the existing paradigm, offers a sound context within which to develop theory and methods for network analysis.

Exchangeable graph-valued Feller processes

Harry Crane. (2017). Probability Theory and Related Fields, in press.

Abstract

The transition law of every exchangeable Feller process on the space of countable graphs is determined by a sigma-finite measure on the space of {0,1}x{0,1}-valued arrays. In discrete-time, this characterization amounts to a construction from an independent, identically distributed sequence of exchangeable random functions. In continuous-time, the behavior is enriched by a Levy--Ito-type decomposition of the jump measure into mutually singular components that govern global, vertex-level, and edge-level dynamics. Furthermore, every such process almost surely projects to a Feller process in the space of graph limits.

The ubiquitous Ewens sampling formula (with discussion and a rejoinder by the author)

Harry Crane. (2016). Statistical Science, 31(1):1-39.

Abstract

Ewens's sampling formula exemplifies the harmony of mathematical theory, statistical application, and scientific discovery. The formula not only contributes to the foundations of evolutionary molecular genetics, the neutral theory of biodiversity, Bayesian nonparametrics, combinatorial stochastic processes, and inductive inference but also emerges from fundamental concepts in probability theory, algebra, and number theory. With an emphasis on its far-reaching influence throughout statistics and probability, we highlight these and many other consequences of Ewens's seminal discovery.

Dynamic random networks and their graph limits

Harry Crane. (2016). Annals of Applied Probability 26(2):691-721.

Abstract

We study a broad class of stochastic process models for dynamic networks that satisfy the minimal regularity conditions of (i) exchangeability and (ii) cadlag sample paths. Our main theorems characterize these processes through their induced behavior in the space of graph limits. Under the assumption of time-homogeneous Markovian dependence, we classify the discontinuities of these processes into three types, prove bounded variation of the sample paths in graph limit space and express the process as a mixture of time-inhomogeneous, exchangeable Markov processes with cadlag sample paths.