# Brief Introduction to Social Network Modeling

Here is a brief, and not comprehensive, introduction to social network modeling drawing from academic literature in mathematics, statistics, computer science, sociology and physics. This will simply introduce some basic models. I won’t get into stochastic (random) processes on networks (such as epidemics or “cascades”), dynamics of networks, nor algorithms for approximating metrics on large-scale networks.

This is to supplement the perspective that John Kelly gave in class on Wednesday.

Background

One way to start is to think about a network itself as a random object, much like a random number or random variable. The network itself can be conceived of as the result of a random process or as coming from an underlying probability distribution. You can in fact imagine a sample of networks.

The discipline of (social) network analysis or graph theory comes out of many academic departments: math, statistics, computer science, physics and sociology with far-ranging applications in even more fields including fMRI research, epidemiology, and studies of online social networks such as Facebook or Google+. It’s a bit of a challenge to untangle the development of the field because it’s been developing in different departments, in some cases without communication between fields, with publications in academic journals that largely ignore each other, and so in some cases, there are similar ideas with different vocabulary, and in other cases, clashing philosophies or points of view.

Why did I put “social” in parentheses? Until Facebook (or earlier online social networks), the word “social network” didn’t immediately make people think of Facebook, but it was a broader term referring to a network of individuals, including offline networks. Further, I see no need to be specifically talking about “social” networks because from a mathematical perspective, the nodes/vertices could be individual humans, or they could be genes, for example, or voxels in the brain.

Independence

One reason that networks are of theoretical interest and a rich source of research problems is that the assumption of independence often made about individuals in a sample does not hold. Much statistical literature assumes independence of observations. If one thinks of the networks themselves as observations, rather than the individuals, then this isn’t an issue. But oftentimes, the connection between individuals is at the heart of the problem of interest.

Definitions

We define a network to be a structure of nodes that are connected by edges. We can think of the nodes as representing people and edges as representing relationships. The nodes can be assigned attributes. These attributes can be vector-valued. In the case of a social network, the attributes could be demographic information about the individual.

If we assume that the edges are symmetric (if node A “knows” node B then B “knows” A), this is an undirected network. If the edges have direction, i.e. an arrow points from A to B, this is a directed network. Of course in a directed network, it’s possible that A knows B and B knows A.

The degree of any particular node A is the number of other nodes that node A is connected to. The network can be represented by an $N \times N$ matrix comprised of 1’s and 0’s, where the ($i$,$j$)th element in the matrix is a 1 if nodes $i$ and $j$ are connected. This matrix is known as an adjacency matrix, or incidence matrix. Alternatively the network can be represented by a matrix where for each node $i$, we list his friends, this is known as an incidence list. Representing the network this way saves storage space. Further the edges can themselves have values which could capture information about the nature of the relationship between the nodes they connect. These values could be stored in the $N \times N$ matrix, in place of the 1’s and 0’s which simply represent the presence or not of a relationship.

Vocabulary

The vocabulary used depends on the academic discipline.
Nodes = Vertices = Actors
Edges = Links = Relationships = Ties
Network = Graph

Erdos-Renyi (1959) or Bernoulli

We can imagine that an observed (social) network is a single realization of an underlying stochastic process. For a fixed set of $N$ nodes, there are $D = {N\choose2}$ pairs of nodes, or dyads which can either be connected by an edge or not. Under an assumption of symmetry, there are $2^D$ possible observed networks. They are not all equally likely. For example, the simplest underlying distribution is the Erdos-Renyi model, which assumes that for every pair of nodes $(i,j)$, with probability $p$ an edge exists between the two nodes. Thus under this model, observing a network with all nodes attached to all other nodes has probability $p^{D}$, while observing a network with all nodes disconnected has probability $(1-p)^{D}$. And of course there are many other possible networks between these two extremes. The Erdos-Renyi model is also known as a Bernoulli network. In the mathematics literature, the Erdos-Renyi model or graph is treated as a mathematical object with interesting properties that allow for theorems to be proved. [Paul Erdos is an important 20th century mathematician known for having many collaborators and posing challenging problems/puzzles.]

Real World Properties of Networks

Social networks that can be observed in the real world, for example, friendship networks or academic collaboration networks, tend not to resemble Bernoulli Networks because Bernoulli Networks do not capture common characteristics of social networks such as transitivity, clustering, reciprocity or mutuality and betweenness. These characteristics of real-world networks have graph-statistical counterparts. For example, transitivity can be captured by the number of triangles in a network.

Exponential Random Graph Models

It is desirable to have models that capture these real-world properties of networks. Within sociology, a class of models that has become widely used is the exponential random graph models (ERGMs), (for example, see Robins et al and Wasserman and Pattison) also known as p* models. These models are distributions over the space of all possible networks. Here $Y$ is the random variable, an $N \times N$ matrix, which represents a network. The general form of these models is

$P_{\theta}\left\{Y=y\right\} = \text{exp}\left(\theta_{1}z_{1}\left( y \right) + \theta_{2}z_{2}\left( y \right) + ... + \theta_{k}z_{k}\left( y \right)-\psi\left(\theta \right)\right)$,

where the parameter is
$\theta=\left(\theta_{1},\theta_{2},\ldots,\theta_{k}\right)$, and the sufficient statistics are

$\left(z_{1}\left( y \right),z_{2}\left( y \right),\ldots,z_{k}\left( y \right)\right)$ and $\psi$ is a normalizing constant. For example,

$P_{\theta}\left\{Y=y\right\} = \text{exp}\left(\theta_{1}z_{1}\left( y \right) + \theta_{2}z_{2}\left( y \right) + \theta_{3}z_{3}\left( y \right)-\psi\left(\theta \right)\right)$,
is an ERGM, where

$z_{1}, z_{2}$ and $z_{3}$ could be the graph statistics: number of triangles, number of edges and number of “2-stars”. Additional possible graph statistics that have been introduced include: k-stars, degree, alternating k-stars, alternating triangles. A positive value for $\theta_1$ would indicate a tendency towards a larger number of triangles.

The reasons for introducing these graph statistics, aside, of course, to make ERGMs as general as possible, is as mentioned, these graph statistics correspond to aspects of real world networks such as reciprocity, clustering, mutuality and transitivity (if $i$ knows $j$ and $k$, then $j$ and $k$ are more likely to know each other). A Bernoulli network is a special case of an ERGM. Let $z(y)$=number of edges, then $P_{\theta}\left\{Y=y\right\} = \text{exp}\left(\theta z\left( y \right)\right).$

Inference for ERGMs

Ideally, though in some cases unrealistic in practice, one could observe a sample of several networks, $Y_1,\ldots,Y_n$, all $N \times N$ matrices, modeled as independent and identically distributed (iid) observations from the same probability model, and make inferences about the parameters of that model. In the case of a Bernoulli Network, the likelihood would be $L=\prod_i^n {p^{d_i}(1-p)^{D-d_i}},$ where $d_i$ would be the number of observed edges in the $i$th network and $D$ the total number of dyads in the network, which would be $2^N$ for these $N \times N$ matrices. Then an estimator for $p$ is $\hat{p}=\frac{\sum_{i=1}^{n}{d_i}}{nD}$.

In practice in the ERGM literature, only one network is observed, which is a sample size of one. From this, parameter estimates for the probability model that generated this network are made. For a Bernoulli network, from even just one network, we could estimate $p$ as the proportion of edges out of the total number of dyads, which seems a reasonable estimate.

For ERGMs, estimating the parameters from one observation of the network, as is done in practice, tends to be done using either a pseudo-likelihood estimation procedure or MCMC methods (see, for example, Strauss and Ikeda and Snjiders).

This first procedure has been discounted because it produces infinite values even in situations when the pseudo-likelihood converges. MCMC methods have also been criticized or called into question because of inferential degeneracy, where the algorithms converge to degenerate graphs, graphs that are complete or empty, or the algorithm does not converge consistently (see Handcock et al.). It is possible that the degeneracy is a function of there not being any corresponding generative model for ERGMs, where a generative model can be described heuristically as a story for how the network was generated or created. Further, we point out that inferences from a sample size of one may be problematic.

Latent Space Models

Motivated by problems of model degeneracy and instability in exponential random graph models, and describing these problems as “defects in the models themselves” which cannot be addressed by alternative estimation procedures, Hoff, Raferty and Handcock introduced latent space models. The observed data is an $n \times n$ social network $Y$ with entries $y_{ij}$ denoting relationships between nodes $i$ and $j$ and corresponding covariate information, $X$. Unobserved are unknown latent positions, $z_i$, in social space, for all nodes $i=1,\ldots,n$, such that conditional upon these unobserved positions, the edges in the network are independent. $z$‘s and $\theta$ are treated as parameters to be estimated in the model $P(Y| z, x, \theta) = \prod_{i \neq j} {P(y_i | z_i, z_j, X_{i,j}, \theta)},$ where the terms in the product can be modeled using logistic regression. Maximum Likelihood Estimates (MLE)’s for $Z$ and $\theta$ are found using MCMC methods.

Watts-Strogatz Small-World Networks

Duncan Watts (previously a sociology professor at Columbia, and now at Microsoft Research), and Strogatz proposed “small-world networks“, which lie on a spectrum between completely random (Erdos-Renyi) and “completely regular” graphs and capture the real world phenomenon of six degrees of separation. The algorithm used to construct a small-world network is as follows: randomly generate a ring lattice with $n$ vertices and $k$ edges per vertex, then for each edge, rewire it with probability $p$ where $p = 0$ represents the regularity end of the spectrum and $p=1$ represents extreme disorder.

Sensitivity analysis on parameters $n$ and $k$ reveal that $n >> k >> \text{ln} (n) >> 1$ produces a sufficiently connected graph, sufficient in that it resembles observable real-world graphs. Further, graph statistics, $L(p)$ and $C(p)$, the characteristic path length, and clustering coefficient, respectively, measure typical separation between two nodes and the cliquishness of “typical neighborhood”. For certain values of $p$, $L(p)$ and $C(p)$ are sufficiently greater than $L(1)$ and $C(1)$, the graph statistics for a completely random graph, so as to be defined as small-world networks. A criticism of this model is that it produces networks that are homogeneous in degree, while observable real-world networks tend to be scale-free and inhomogeneous in degree.

Beyond Statistics and Math

For completeness, in addition to the models described above, other classes of models include markov random fields, stochastic block models, mixed membership models and stochastic block mixed membership models, each of which model relational data in various ways, and seek to include properties that other models do not.

From the physics world, Mark Newman and his collaborators at the University of Michigan has made several important contributions. See for example, The structure and function of complex networks.

From Cornell’s Computer Science Department, Jon Kleinberg‘s lab has been instrumental in developing algorithms for computations on large scale networks. Kleinberg’s work often translates well to practical applications. Jure Leskovec came out of as a post-doc of that  lab and went to Stanford where he’s also made a number of important contributions.

Also out of Cornell’s Sociology Department is Michael Macy, one of the “creators” of Computational Social Sciences, along with many others including NYU’s Sinan Aral, who has worked on problems including separating out social influence from homophily. (Did A cause B to do something, or are they just both more likely to do it because they’re connected.)

From Columbia, Tian Zheng, Andrew Gelman, Matt Salganik (now at Princeton) and Tyler McCormick (now at the University of Washington), have done work around estimating personal network size. See for example the award-winning paper: How Many People do you know in Prison? Their work is for real-world situations where it’s not possible to observe the complete network, which is dramatically different than cases of online social networks where it is possible to observe everything. (Although one can think of the offline relationships as un-observed.)

Classes of Problems

A lot of the interesting problems lie not in modeling the networks themselves, but rather in finding algorithms or models that explore the following classes of problems:

• Finding “influential” nodes
• Epidemics or cascades on networks, diffusion of information
• Community Detection
• Network dynamics over time
• Sampling methods on networks, including Respondent Driven Sampling
• Experiments and causal modeling on networks
• Finding clusters
• Dealing with very large scale networks, algorithms and approximations
• Making inferences about unobserved attributes or relationships
• Bipartite networks, which we’ve discussed for recommendation systems