[GNN] Graph Convolutional Networks (GCNs) - Part 1 : Spectral Convolution on Graph

[GNN] Graph Convolutional Networks (GCNs) - Part 1 : Spectral Convolution on Graph

2023, Jul 10    

Outlines


Reference



0. Spectral Convolution Operation on Graphs


  • GCN (Graph Convolutional Networks) :

    • Aims to generalize convolutional networks (CNN) to graphs.

    • Illustration of GCN

  • There are largely two types of convolution methods used in GCN, one is spectral method and the other is spatial method.

  • For this post, I will focus on the spectral convolution method.

  • Spectral method performs convolution to graphs by transforming the spatial node representation into the spectral domain using graph fourier transform.


1. Fourier Transform on Graph


  • Fourier transform of graph signal :

    • Classical fourier transform is a linear combinations of orthonormal basis. (here, each basis is a complex exponential function with different frequencies.)

      image

      • top : fourier transform, bottom : inverse FT

      • FT interpretation : inner product between a signal and orthonormal basis functions.

    • Finding orthogonal basis of a symmetric matrix with real eigenvalues can be done by eigenvector decomposition.

      • Eigenvectors of real-symmetric matrix are orthonormal to each other.

      • proof HERE

    • If one can find a matrix associated to the graph signal and that matrix is real-symmetric, then fourier transform of the graph signal can be done by the eigen-decomposition of the matrix.


2. Laplacian(Laplace Operator) and Laplacian Matrix


2.1. Laplacian (Laplace Operator)


  • Divergence of gradient vector.

    image

  • Measures the spatial variation or curvature of the scalar function : how rapid the graident changes near a point.


2.2. Laplacian Matrix (Graph Laplacian)


  • Application of the laplace operator to a graph is a laplacian matrix (graph laplacian).

  • $\large L = D - A$

    • $\large D(i, i) = \sum_{j} A(i, j)$

    • $\large A$ : Weighted adjacency matrix whose element (i, j) is $\large w_{ij}$ if there’s an weighted edge between node i, j and 0 if not.

      image

  • As a difference operator, for any signal $\large f \in \mathbb{R}^{N}$ on weighted graph, it satisties

    image

  • As graph laplacian is a positive semidefinite matrix (eigenvalues of graph laplacian is non-negative), it has a complete set of orthogonal eigenvectors.

    • proof : Laplacian Quadratic Form, $\large x^{T} L x$

    • Eigenvectors of graph laplacian : $\large u_{l}$ where $l = 0,\,,1\,,2\,,…,N-1$.

      • Corresponding eigenvalues are $\large \lambda_{l}$ where $l = 0,\,,1\,,2\,,…,N-1$, satisfying $\large Lf_{l} = \lambda_{l} u_{l}$

      • ordered as  image


3. Graph Fourier Transform


  • In classcial fourier transform of a signal $\large f$ is the expansion of $\large f$ in terms of the orthonormal complex exponential functions, which are the conjugate of the eigenfunctions of one-dimensional laplacian operator.

    image

    image

  • Similarly, can define the graph fourier transform as the expansion of $\large f$ in terms of the eigenvectors ($\large u_{l}$) of the graph laplacian.

    image

    • Inverse graph fourier transform

      image

  • In classical fourirer transform, the eigenvalues $\large (2 \pi \xi)^{2}$ carry the information of frequency.

    • Eigen-complex functions associated with large $\xi$ oscillates rapidly, whereas the ones with small $\xi$ has smooth oscillation.
  • Similar interpretation of frequency is also valid for graph fourier transform.

    • Eigenvectors of laplacian coupled with smaller eigenvalues (low frequency) vary smoothly across the graph, eigenvectors with large eigenvalues are likely to oscillate rapidly.


3.1. Laplacian Quadratic Form : Meaure of Smoothness of Graph


  • For the notion of global smoothness of a graph, one can define the discrete p-Dirichlet form of f as follows

    image

  • When p = 2, it is the laplacian quadratic form

    • $\large S_{2}(f)$ has smaller values when signal $\large f$ has constant values acorss all vertices of the graph and quadratically increases with the disimilarity of signal $\large f$.

    • Hence, it can be a measure of the smoothness or the frequency of a signal $\large f$.

    • spectral graph theory.

  • This laplacian quadratic form can be re-expressed as $\large f^{T} L f$ where L is a laplacian matrix.

    image

    • Derivation is up above.

    • Then, $\large f^{T} L f$ represents the frequency of the graph signal.

  • Now let’s solve the optimization problem for laplacian quadratic form.

    • (1) : Define the optimization problem with a equality constraint $\large ||f||_{2}\,=\,1$

    • (2) : Method of lagrangian multiplier

    • (3) : As $\large f^{T} L f$ is convex, the point where first derivative equals to 0 is the minimum the objective function with respect to $\large f$.

    • (4) : Result shows the eigendecomposition of laplacian matrix.

    • Solution : $\large f$ that minimize the laplacian quadratic form are in the eigenspace of $\large L$

  • Then, the magnitude of the laplacian quadratic form, the frequency of graph signal, is its corresponding eigenvalue.

    • Starting from $\large Lf_{l} = \lambda_{l} f_{l}$, take $\large f^{T}$ to the left of both sides.

    • $\large f_{i}^{T}\,L\,f = f_{i}^{T}\,\lambda_{i}\,f = \lambda_{i}$

    • Eigenvectors associated with larger eigenvalues have greater disimilarities across the graph and vice versa.

  • Embedding the graph with a set of eigenvectors of laplacian matrix with different eigenvalues.

    image

    image

    • The variations between the signals of all nodes across the graph increases when embedded with the eigenvectors with higher eigenvalues.


  • To summarize, eigenvectors of the laplacian matrix are the graph fourier basis where each eigenvector corresponds to a frequency on the graph depending on its coupled eigenvalue.


3.2. Spectral Filtering with Graph Fourier Transform (GFT)


  • Graph fourier transfrom is a decomposition of graph signals in spatial domain into a linear combination of different frequencies, which are the eigenvectors of graph laplacian, in spectral domain.

    • Eigendecomposition of laplacian matrix

      • $\large L = U^{T} \Lambda U$ where $\large U$ is a set of eigenvectors and $\large \Lambda$ is corresponding eigenvalues.
    • GFT of signal $\large x$

      • $\large \mathcal{F}(x)\, =\, U^{T}x$
    • Inverse GFT

      • $\large \mathcal{F}^{-1}(\hat{x})\, =\, U\hat{x}$

    image

    • left : spatial representation of graph singal.

    • right : same signal in the graph spectral domain.

  • By transforming the spatial graph signals into spectral doamin with GFT, one can filter the signals with desired frequencies.


4. Spectral Graph Convolution


4.1. Learnable Graph Filters



  • Operations in spectral filtering

    1. GFT : Spectral decomposition of the spatial siganl $\large f$

      • Output : $\large U^{T}\,f$
    2. Filtering : Use spectral filters $\large \hat{g}(\Lambda)$ to filter the decomposed frequencies of the signal

      • Output : $\large \hat{g}(\Lambda)\,U^{T}\,f$
    3. IGFT : Reconstruct the decomposed signal into spatial domain

      • Output : $\large U\,\hat{g}(\Lambda)\,U^{T}\,f$

      • equal as $\large \hat{g}(L)\,f$

  • Final Output of spectral filtering : $\large \hat{g}(L)\,f$

  • Note that the entire filtering steps can be shortened as $\large \hat{g}(L)f$, which means there’s no actual need to perform eigendecomposition.


4.1.1. Traditional Filtering Process


  • Traditionally, this filtering procedure is done by non-parametric spectral filters that are pre-desinged to filter desired frequencies (low-pass or high-pass filters).

    image

  • Hence, these non-parametric filters are not learnable.


4.1.2. Learnable Filters with Spectral GCN


  • Instead, spectral GCN enables parametric learning of the filters by re-defining them in parameterized polynomial functions.

    image

  • Each eigenvalue is extended to polynomial series of $\large kth$ order, weighted by different parameters.

    • $\large \hat{g}_{nm}(\Lambda)$ (N x N) :

        $\large \sum_{k=0}^{k=K-1} \, \theta_{k}^{(nm)} \,\Lambda^{k}$

  • How to deal with multi-channel signals ?

    image

    image


4.2. Convolution Theorem


  • Besides of filtering desried frequencies of graph signal, one advantage of taking GFT to convert spatial domain into spectral domain is that convolution step becomes much simpler in spectral domain due to the convolution theorem.

    image

    image

  • Proof

  • Convolution in time domain

    image

    • A signal can be expressed as the sum of impulses in LTI (Linear-Time Invariant) system.

    • Computing integral over infinte tau domain.

  • Convolution in frequency domain (spectral domain)

    • Take GFT to each part, get product of two, take IGFT to reconstruct to original spatial domain.




  • So far, I’ve reviewed how spectral convolution, one of the convolution method on graphs, is performed on graph data.

  • This post may contain some incorrect information. If you notice any errors, please let me know so that I can correct them.