[GNN] Hyperbolic Neural Networks - Part 1 : Hyperbolic Geometry and Generalization of Euclidean Operations in Hyperbolic Space

[GNN] Hyperbolic Neural Networks - Part 1 : Hyperbolic Geometry and Generalization of Euclidean Operations in Hyperbolic Space

2023, Oct 13    

Outlines


Reference



1. Limitation of Euclidean Space to Represent the Hierarchical Data


  • The geometric framework upon which most current neural networks are based is the Euclidean space, which aligns with our spatial intuition rooted in Euclidean geometry.

  • However, recent works have shown that complex and hierarchical networks cannot be suitably represented in Euclidean space due to their exponentially increasing complexity with depth.

  • Networks can be decomposed to tree-like structure where the number of the nodes grows exponentially with increasing depth.


  • Euclidean space, however, cannot reflect this exponentially expanding behavior of tree-like data as it expands at most polynomially, which leads to the distortion of data representation.

  • To address this challenge, hyperbolic geometry can be a compelling alternative to model the hierarchical data as it has an intrinsic tendency to grow exponentially.


2. Hyperbolic Geometry for Hierarchical Representation


  • Hyperbolic space is non-Euclidean space with constant negative curvature ($\large K \, = \, - \zeta^{2}$ where $\large \zeta > 0$).

    • Eliptic vs Euclidean vs Hyperbolic Geometries



2.1. Exponential Growth


  • As mentined before, the key property that makes hyperbolic spaces suitable for modeling hierarchical structure is that they expands exponentially while the Euclidean spaces expand polynomially.

  • For example, Euclidean distance and hyperbolic distance in Poincaré disk model, $\large r_{e}$ and $\large r_{h}$, respectively, from the origin of the disk, are related as $\large r_{e} = \tanh(\frac{r_{h}}{2})$

    Illustration of 2D Poincaré Model

    • left one shows the straight lines in Poincaré disk.

    • Can notice that given a constant hyperbolic distance (0.7), Euclidean distances decrease rapidly from 0.36 to 0.1.

  • Also, the circumference $\large L(r)$ and the area $\large A(r)$ of the disk of radius $\large r$ (here, $\large r$ defined in polar coordinate, same for both Euclidean and hyperbolic geometries) are

    • Euclidean : $\large L(r) \, = \,2 \pi r$, $\large A(r) \, = \, \pi r^{2}$

    • Hyperbolic : $\large L(r) \, = \,2 \pi \sinh(\zeta r)$, $\large A(r) \, = \, 2 \pi (\cosh(\zeta r) - 1)$

      • As $\large \sinh \zeta r,\,\cosh \zeta r$ both are the exponential functions of $\large \zeta r$ ($\large \thickapprox e^{\zeta r}$), hyperbolic spaces grow exponentially with increasing distance from the center of the disk.

      • In Euclidean case, on the other hand, both grow linearly with $\large r$.

  • This exponential growth of the circle length and disk area is an analogy of the exponential growth of the number of nodes in a tree with respect to its depth.

    • Suppose a k-ary tree has branching factor $\large k$, then the number of nodes that are exactly $\large r$-hops away from the root is $\large k^{r}$ and the number of nodes not more than r hops is $\large (k^{r+1} - k)(k-1)$.

    • The first and second one can be analogies of circle length and area of the disk of radius $\large r$ and both are growing as $\large k^{r}$ with $\large r$.

    • If $\large \zeta \,=\, \ln k$, then both cases grow as $\large e^{r}$, which means k-ary trees and hyperbolic spaces are intrinsically equivalent from the metric perspective.

    • Hence, trees can be naturally embedded into hyperbolic spaces with very low geometric distortion.

  • Comparison of the embedding representation of hierarchical networks in Euclidean space and hyperbolic space (Poincaré model).

    • Euclidean Embedding

    • Hyperbolic Embedding (Poincaré Disk)

    • Note that hierarchical distances between nodes are preserved better in hyperbolic embedding compared to Euclidean.


2.2. Hyperbolic Space as Smooth Riemannian Manifold


  • As a geometric framework for neural network, another important thing to be considered is whether the space is smooth or not because the optimization process in neural networks depends on differentiability.

  • For a manifold to be Riemannian (Smooth), there are serveral conditions to be satisfied such as

    • Transition maps between charts are infinitely differentiable.

    • Smoothness of positive definite bilinear maps (metric) defined across the entire manifold.

  • Hyperbolic space is Riemannian and thus various deep learning approaches can be generalized and utilized in hyperbolic space.


2.2.1 Preliminaries on Riemannian Manifold


  • To express various deep learning operations on a non-Euclidean manifold, there are a few preliminaries required regarding the Riemannian manifold $\large (M, g)$.
  1. Metric

    • A tensor that takes 2 vectors on the manifold and gives a scalar that is inner product of them. : $\large g : T_{x}M \, \times \, T_{x}M \rightarrow \mathbb{R}$.

    • Enables quantitative analysis of the manifold.

    • Defines local notion of angle, length, area, volume, distance and is only valid on the associated manifold.

  2. Tangent Space ($\large T_{x}M$)

      

    • Tangent space $\large T_{x}M$ is an Euclidean approximation (n-dimensional vector space) locally defined on each point $\large x$ on a manifold.

    • Using mapping functions (exponential and logarithmic map), one can approximate the transition between non-Euclidean manifold and Euclidean space ($\large \mathbb{R}^{n}$)

  3. Conformal Mapping

    • Metrics that preserve angles between vectors.

    • Any mapping functions (that satisfies $\large \lambda \,: M \, \rightarrow (0, \infty)$ such that $\large \hat{g_{x}} \, = \, \lambda_{x}^{2} g_{x}$.)

  4. Geodesic

    • Analog to straight line (second derivative along the curve is zero.) in $\large \mathbb{R}^{n}$.

    • Shortest path (curve) between two points on a manifold.

  5. Exponential / Logarithmic map

      

    • Exponential map : maps from $\large T_{x}M$ to $\large M$.

    • In more detail, $\large exp_{x}(v) \, = \, q$ where q is the point reached by following the geodesic with initial velocity $\large v$ (tangent vector) from point $\large x$.

    • Logarithmic map is the inverse of exponential map and do the exact opposite thing of exp map.

  6. Parallel Transport

      

    • Moving a vector from one point on a manifold to the other along the geodesic between two points.

    • Preserves the metric tensors while transporting.


3. Isometric Models for Hyperbolic Geometry


  • Hyperbolic space is often modeled in 2 or 3-dimensional Euclidean space to make it visually and mathematically accessible for study and visualization.

  • There are several models that are isometric to the hyperbolic geometry, which means these models can define at least one bijective function that preserves some geometric properties such as angles, lengths and distances of the hyperbolic manifold.

  • Here, I will review 3 different models, Lorentz model, Poincaré model, and Klein model.


    • $\large \mathbb{L}$ : Lorentz Model, $\large \mathbb{B}$ : Poincaré Model, $\large \mathbb{K}$ : Klein Model


3.1. Lorentz Model


  • Lorentz model is a hyperbolic manifold embedded in the Minkowski space that models the Einstein’s theory of special relativity.

  • It’s defined as a upper sheet of two-sheeted n-dimensional hyperbola.

    • Here, the metric $\large <x,x>_{\mathbb{L}}$ is defined as

    • $\large g_{\mathbb{L}}$ is a diagonal matrix of 1s except the first element as -1.

  • The distnace is defined as $\large d(x, y) \, = \, arcosh(- <x, y>_{\mathbb{L}})x$

  • As the metric and distance functions are quite simple, the Lorentz model is numerically stable and efficient compared to Poincaré model that will come next.


3.2. Poincaré Disk Model


  • Poincaré Disk is formed by projecting each point of $\large \mathbb{L}^{n}$ onto the hyperplane $\large x_{0} \, = \, 0$ (the z-axis), drawing a circle of radius 1.

    $\large \mathbb{B}^{n}$ = {$\large x \, \in \, \mathbb{R}^{n} \,: \, ||x|| < 1$}

    Distance :

  • Curves drawn onto the disk are the straight lines in the model, so called as geodesics.


  • Figure above shows Euclidean distances dramatically increase while distances of Poincaré model stay constant.

  • This means that the model is suitable for representing exponentially increasing space of hyperboloid with respect to the distance from the origin.

  • Hence, Poincaré disk fits very well to model the hierarchical tree where the number of nodes grows exponentially with the depth increasing in a tree.

  • Another advantage of it is that the metric of Poincaré manifold $\large g^{\mathbb{B}}$ is conformal to that of Euclidean space.

    • With the conformal factor $\large \lambda_{x} \, = \, \frac{1}{1-||x||^{2}}$, the metric $\large g^{\mathbb{B}} \, = \, \lambda_{x}^{2} g^{\mathbb{E}}$.

    • Note that the conformal factor also indicates that the spaces in Poincare model exponentially grow with $\large x$.

  • This conformal invariance between Poincare manifold and Euclidean space allows for easy comparisons between Euclidean and hyperbolic geometry in terms of angles.

  • However, using Poincaré model as a framework of hyperbolic neural network has numerical instability and is inefficient due to its complex distance function.


3.3. Klein Model (Beltrami-Klein Model)


  • Klein model is also given by the projection of Lorentz hyperboloid but onto $\large x_{0} \, = \, 1$ not 0.

    $\large \mathbb{K}^{n}$ = {$\large x \, \in \, \mathbb{R}^{n} \,: \, ||x|| < 1$}

    Distance :


3.4. Isometric Equivalence between Models


  • Even though all three models provide different representation of hyperbolic geometry, they all are isometrically equivalent, which means each model has a mapping function between the models that preserves geometric properties.

  • Lorentz $\large \rightarrow$ Poincaré

  • Lorentz $\large \rightarrow$ Klein


4. Generalization of Euclidean Neural Networks to Hyperbolic Spaces


  • As current neural networks are built upon Euclidean geometry, mathematical operations required are also valid only in Eucldiean space.

  • Therefore, in order to use hyperbolic space as an alternative framework for neural networks, all these operations need to be re-defined in a way that is consistent to the given hyperbolic geometry.

   Euclidean Convolutional Neural Networks

  

   Hyperbolic Graph Convolutional Neural Networks

  

  • There are largely two strategies to implement the hyperbolic neural networks, transition between hyperbolic manifold and tangent space via exp/log maps and re-defining arithmetic operations in a gyrovector-based space using Mobius Transformation.


4.1. Mapping between Hyperbolic Manifold and Tangent Spaces


  • The most straightforward way to do this is simply to transfer the given data naturally embedded in the hyperbolic space into Euclidean space using $\large Log_{p}(x)$, perform all the operations, and then recover the output into hyperbolic space using $\large Exp_{p}(x)$.

  • However, tangent approximation using exp/log maps contains an approximation error.

  • Applying Euclidean operations to approximated data that is not intrinsically Euclidean can cause negative impact to learning process such as the information loss or distortion.

  • Hence, generalizing the operations particularly fitted to the hyperbolic geometry is crucial for implementing deep learning approaches to hyperbolic data.


4.2. Möbius Transformation for Gyrovector Space


  • Before delving into the Möbius transformation, we first need to define the gyrovector space model where all the mathematical operations are performed.


4.2.1 Gyrovector Space Model


  • In Euclidean neural networks, data is mostly given by the set of vectors (matrices) and all the operations are also defined based on this vector space.

  • Analogous to this, we also need to establish gyrovector space, which is simply a generalization of the vector space.

  • Just like the vectors are defined by its origin, direction, and the legnth, gyrovectors can also be defined by same properties.

  • Then what remains is to define these basic operations ⊕, ⊗, which are the generalization of addition (+) and multiplication (x) and this will be done using Möbius Transformation.


4.2.2 Möbius Addition and Scalar Mulitiplication for Gyrovector Space


  • In order to generalize mathematical approaches to gyrovector space, we need to define the basic arithmetic operations such as addition and scalar mulitplication that are valid in gyrovector space.

  • For a Poincare disk, addition and scalar multiplication are defined as

    • Mobius Addition

    • Mobius Scalar and Vector Multiplication

      •   Each operation can be decoposed as

         $\large Exp_{0}(r\,Log_{0}(x))$

        1. projecting gyrovector $\large x$ onto tangent space (at 0) via $\large Log_{0}(x)$

        2. multiplying scalar value r to the projected x.

        3. mapping it back to the manifold (Poincare manifold)

      • Same operations extended to vector multiplication to the input $\large x$. ($\large M matrix$)

  • Based on the mappings between the manifold and its tangent space, $\large Exp_{0}(x)$ and $\large Log_{0}(x)$ for Poincare model can be derived.


4.2.3 Mean in Hyperbolic Space


  • Due to differing geometric properties, simply averaging out to compute mean just like in Euclidean space may not well fit to the hyperbolic manifold.

  • There could be two options for this, computing Euclidean mean on tangent space or using Einstein midpoint.


1. Weighted average on tangent space


  • $\large \mu \, = \, Exp_{x}(\sum_{j \in N(i)} \, w_{ij}Log(x_{j}))$

  • $\large N(i)$ refers to the neighborhood of node $\large i$ and $\large w_{ij}$ represents the attention weight for node i and j.


2. Einstein Midpoint


  • It’s an extension of mean to hyperbolic space and defined for both Klein and Poincare model (projecting to other hyperbolic spaces are also possible as long as they are isometrc).

  • Klein Model

    • $\large \gamma_{i} \, = \, \frac{1}{||x||^{2}}$ : Lorentz factor (inversely related to the distance from center)


  • Poincaré Model

    • $\large \gamma_{i} \, = \, \frac{2}{||x||^{2}}$

    • $\large \alpha_{i}$ : weights for each.




  • Using all these basic mathematical operations generalized to gyrovector space, part 2 will focus on implementing graph convolutional networks (GCN) in hyperbolic space.