Structural properties — Rohdy의 study

KAIST AI 대학원 신기정 교수님의 수업인 데이터마이닝 및 소셜 네트워크 분석 수업 필기입니다.

Three parts/goals

Statistical properties of large networks
Models that help understand these properties
Predict behavior of networked systems based on measured srtructural properties and local rules governing individual nodes

Random Graph: Null Model

What is the simplest way to generate a graph?
Random graph model \(G(n,p)\) (= Erdos-Renyi model, Poisson random graph model)
- Given \(n\) vertices, connect each pair i.i.d. with probability \(p\)

Paths / Distance in Graph

Path: Sequence of nodes between the two nodes
Can intersect itself and pass through the same edge
Distance: the number of edges along a shortest path
Not connected, the distance is usually defined as infinite

Network Diameter

Diameter: The maximum distance between any pair of nodes in a graph -> sensitive to outlier
Effective diameter: distance at which 90% of all connected pairs of nodes can be reached
Average path length for a connected graph or component

Small-world Effect

약 6번의 path를 건너면 거의 모든 사람들과 연결되어 있다는 이론(Diameter in social network)
EX) Boston에 편지보내기, MSN

MSN network & Milgram's small world experiment

MSN network & Milgram's small world experiment

Degree Distribution

Degree Distribution \(p_k\): Fraction of node with degree \(k\)
$$ p_k = N_k / N $$

Degrees in real graph: Heavily skewed to the right
Long tail of values far above the mean
many real world networks contain hubs
A power-law distribution: \(p_k ~ k^{-\alpha}\)

Degrees in real graph

Degree distribution in random graph: binomial
$$ P(k)={n-1\choose k}p^k(1-p)^{n-1-k} $$
approximately nomarl distribution

Degree distribution in random graph

Clustering Coefficient

Local Clustering Coefficient of node \(i\) with degree \(k_i\) -> 인접노드들끼리 얼마나 연결되어 있나
$$ C_i := {e_i\over k_i(k_i - 1)/2} $$
Average Clustering Coefficient
$$ C := {1 \over N} \sum C_i $$
In real-world graphs
Expected value CC: \(C >> {2m \over N(N-1)}\)
Why?
Homophily: nodes connect to nodes similar to themselves(같은 관심사가 있으면 친구 됨)
Transitivity: nodes with common friends become friends(공통의 친구가 있으면 친구 됨)

Giant Connected Component(GCC)

Largest set where any two vertices can be joined by a path
In real world graphs, huge GCC contains most nodes

MSN network

MSN network

In Random Graph \(G(n,p)\)

Difference between real-world graph and random graph

Degree distribution: Heavily skewed to the right vs Gaussian Distribution
Avg path length: 비슷
Avg clustering coef: real-world graph가 더 높음
Largest Connected Componet: 일정수 이상의 node가 존재하면 비슷

Network Resilience

How the connectivity of the network changes as the nodes get removed?

Real graphs are resilient to random attacks
Real graphs are vulnerable to targeted attacks(GCC가 attack 받으면 취약)
Randm network는 real network에 비해 target attack에 robost

동그라미가 target attack, 네모가 random attack

How Graphs Evolve over Time

Conventional wisdom/intuition

Constant average degree: the number of edges grows linearly with the number of nodes
$$ E(t) \propto N(t) $$
\(E(t)\): number of edges at time
\(N(t)\): number of nodes at time
Slowly growing diameter: as the network grows the distances between nodes grow
$$ O(N^{1/3}) \; or \; O(logN) $$

Real Network

Networks are denser over time(denser = \({E(t) \over N(t)}\))

the number of edges grow faster than that of nodes
average degree is increasing
$$ E(t) \propto N(t)^a $$

with \(1 \le a \le 2\)

Diameter shrinks over time(= people get closer over time)

'GraphMining' 카테고리의 다른 글

Kronecker Graph Model (0)	2022.09.26
Community Guided Attachment & Forest Fire Model (0)	2022.09.26
Basic Models (0)	2022.09.14
Random Networks (0)	2022.09.14
Intro & Basic Concepts (0)	2022.09.06

티스토리툴바