Over-squashing is a common plight of Graph Neural Networks occurring when message passing fails to propagate information efficiently on the graph. In this post, we discuss how this phenomenon can be understood and remedied through the concept of Ricci curvature, borrowed from the field of differential geometry.
This post was co-authored with Francesco Di Giovanni and Jake Topping and is based on the paper J. Topping, F. Di Giovanni et al., Understanding over-squashing and bottlenecks on graphs via curvature (2021) arXiv:2111.14522, a collaboration between Twitter Cortex and the University of Oxford. This is the first installment of our discussion on Graph Ricci curvature and part of a series on Graph Neural Networks through the lens of Differential Geometry and Algebraic Topology.
The majority of Graph Neural Network (GNN) architectures are based on the message-passing paradigm, in which information is propagated between the nodes of the graph along its edges. Traditionally, the input graph is used both as part of the data (along with the node features) as well as the computational structure for information propagation. Recent works showed however that certain graphs in certain situations tend to be “unfriendly” for message passing , challenging this paradigm.
More specifically, Message Passing Neural Networks (MPNNs)  tend to perform poorly when the learned task requires long-range dependencies (i.e., the output of an MPNN depends on representations of distant nodes interacting with each other) and at the same time, the structure of the graph results in exponentially many long-range neighboring nodes (i.e., the “receptive field” of a node grows exponentially with the radius of the neighbourhood). In such situations, messages coming from non-adjacent nodes need to be propagated and compressed into fixed-size vectors, causing a phenomenon of information over-squashing . The structural characteristics of the graph responsible for over-squashing are referred to as “bottlenecks”. It is fair to say that while extensively observed empirically, the theoretical understanding of these phenomena is rather limited.
Ina recent paper , we introduce tools to study over-squashing and bottlenecks by a characterisation of the local geometric properties of the graph. Suppose we have a multi-layer MPNN; we can calculate the sensitivity of the hidden features computed by the MPNN to the input features at a remote node. Given a node p and another node s that is r-distant from it , the Jacobian ∂hₚ⁽ʳ⁺¹⁾/∂xₛ tells us how a change in the input feature xₛ will be propagated by MPNN and affect the (r+1)st layer output hₚ⁽ʳ⁺¹⁾. Small absolute values of this Jacobian are indicative of poor information propagation, or over-squashing .
The Jacobian can be bounded by a term expressed through the powers of the normalised adjacency matrix of the graph . However, besides indicating that somehow the graph structure (“bottleneck”) is to blame for the over-squashing, such an expression is not very illuminating. In order to better understand the nature of bottlenecks, we need to take a closer look at the fine-grain geometric properties of the graph.