一般般, 不推荐

Representing network structure

The most basic graph representation is the mathematical data structure adjacency matrix.

(:Person {name: "Thomas"})
  -[:FRIEND {since: 2016}]
  ->(:Person {name: "Elaine"})
(:User)-[:PUBLISH]->(:Tweet)<-[:RETWEETS]-(:User)

One reason for that is that you avoid having nodes that can be connected to large parts of the graphs.

Your first steps with the Cypher query

WITH 'Tomaz' AS first_name
WITH 'Bratanic' AS last_name, first_name
RETURN first_name + ' ' + last_name AS result
WITH 'Elon' AS first_name, 'Musk' as last_name
WHERE first_name = 'Elon'
RETURN *
CREATE (elaine:Person{name: 'Elaine'}), (michael:Person {name: 'Michael'})
CREATE (elaine)-[f:FRIEND]->(michael)
RETURN *
MATCH (p:Person {name: 'Satish'})
RETURN p
MATCH (p)
WHERE p:Person AND p.name = 'Satish'
RETURN p
MATCH (from:Person), (to:Person)
WHERE from.name = 'Satish' AND to.name = 'Elaine'
CREATE (from)-[:FRIEND]->(to)
RETURN *
MATCH (t:Person)
WHERE t.name = 'Satish'
SET t.interest = 'Gardening',
    t.hungry = True
MATCH (e:Person)
WHERE e.name = 'Elaine'
SET e += {hungry: false, pet: 'dog'}
MATCH (n:Person)
WHERE n.name = 'Elaine'
DETACH DELETE n
CREATE CONSTRAINT IF NOT EXISTS ON (u:User) ASSERT u.id IS UNIQUE;
LOAD CSV WITH HEADERS FROM
"https://raw.githubusercontent.com/a/b/main/users.csv" as row
WITH row
LIMIT 5
RETURN row
LOAD CSV WITH HEADERS FROM
"https://raw.githubusercontent.com/a/b/main/users.csv" as row
MERGE (u:User{id:row.id})
ON CREATE SET u.name = row.name,
              u.username = row.username,
              u.registeredAt = datetime(row.createdAt)
CALL db.schema.visualization()

Cypher aggregations and social network analysis

MATCH (n:User)
WHERE (n)<-[:MENTIONS]-()
AND NOT EXISTS {
  MATCH (original)<-[:PUBLISH]-(n)<-[:MENTIONS]-()-[:RETWEETS]->(original)
}
RETURN count(*) AS countOfUsers

Node degree distribution

MATCH (u:User)
WITH u, size((u)-[:FOLLOWS]->()) as outDegree
RETURN apoc.agg.statistics(outDegree)

Graph Catalog and Native projection

CALL gds.graph.create(
  graphName,
  nodeProjection,
  relationshipProjection,
  optional configuration
)

Weakly Connected Component algorithm

CALL gds.wcc.write('follower-network', {writeProperty: 'followerWcc'})
YIELD componentCount, componentDistribution

Strongly Connected Components algorithm

Local clustering coefficient

Projecting monopartite networks with Cypher Projection

Due to worse performance, Cypher Projection is not recommended for larger graphs or the production phase.

Inferring co-occurrence networks based off bipartite networks

Jaccard similarity coefficient ranges from values 0 to 1.
When there is no intersection of members between two sets,
the Jaccard similarity coefficient equals 0.
In graph context, a typical input to the Jaccard similarity
algorithm is a bipartite network consisting of two types
or sets of nodes. The idea behind using the Jaccard similarity
algorithm is to project a monopartite graph
based on the bipartite input graph.

Constructing a nearest neighbor similarity network

Node embeddings and classification

Node embedding models fall under the dimensionality reduction category.


One of the most simple and broadly used node embedding models is the node2vec.

Node2vec algorithm



Remember, when a feature contains the same or comparable
information as the output variable but is unavailable
when making predictions, you have introduced
data leakage into the workflow.
The most obvious example is the network distance
between nodes in the graph. If the network features
and training examples are calculated on the same set
of relationships, then the model would simply learn that
relationships exist between nodes that are only one hop away.
However, none of the pairs of nodes without a link in the
network will be classified as probable to form a connection,
as none are one hop away. Even node embeddings based on the
homophily principle, could be problematic if you didn't
perform a proper dataset split.

Therefore, it is common to subsample the negative examples and use about the same number of positive and negative samples in most link prediction workflows.

Network feature engineering

Remember, all the graph-based features for the train and test sets will be calculated strictly only on the feature set of relationships to prevent any data leakage.

  1. The network distance is calculated by finding the shortest path between the pair of nodes and then counting the number of relationships in the shortest path.
  2. Another popular metric used in link prediction is the so-called preferential attachment.
    • Preferential attachment is an underlying organizing principle occurring in real-world networks where nodes with a higher number of relationships are more likely to make new relationships.
    • In the social network example, people with more friends are more likely to make new connections.
  3. The next metric you will calculate as a link prediction feature is the common neighbors metric.
    • The intuition behind the common neighbor metric is simple.
    • The more common neighbors two nodes have, the higher the chance of a link forming in the future.
  4. The idea behind the Adamic-Adar index is that the smaller degree the common neighbors between a pair of nodes have, the more likely that they will form a connection in the future.
  5. The last link prediction that you will calculate is the clustering coefficient of common neighbors.
    • A clustering coefficient measures the connectedness of the neighbors of a particular node.
    • The value ranges from zero to one.
    • A value of zero indicates that the neighboring nodes have no connections with each other.
    • On the other hand, the value of 1 indicates that the network of neighbors forms a complete graph where all the neighbors are connected.

Knowledge graph completion

Construct a graph using NLP techniques

Large Language Models, such as the GPT-4 from OpenAI [2023],
have been a game-changing introduction to the world of
natural language procesing. Their ability to understand
human-like text has presented remarkable opportunities in
the field of information extraction. With a model trained on
diverse internet text, they can sift through massive amounts
of data, understanding the context, and pull out relevant details,
making the information extraction pipelines more accessible.
Their ability to generalize allows them to be applied to a
variety of domains, from academic research to business analytics,
extracting valuable insights from unstructured data.

图嵌入

DeepWalk 的核心思想是通过类比图上的随机游走和自然语言, 即每个节点相当于一个单词,
并将图中每个采样得到的游走视为自然语言中的一个句子.
DeepWalk 作者发现, 短距离随机游走中节点的概率分布与自然语言中的单词概率分布十分相似.
基于这一观察, DeepWalk 提出借鉴自然语言处理的单词嵌入来学习图的节点嵌入.
具体来说, DeepWalk 采用单词嵌入的 SkipGram 模型.
由于 AROPE 仅需要在原始的稀疏邻接矩阵 A 上计算一次特征分解,
其时间复杂度与图规模呈线性关系, 可以扩展到大规模图. 此外,
在保持不同阶邻近度时, AROPE
仅需计算对应的变换而无须重新计算特征值分解或奇异值分解,
因此该算法可以在不同阶邻近度间快速切换,
以支持高效并有效地保持不同阶邻近度.
因此, 基于随机游走的图嵌入方法等价为构造特殊的相似度矩阵并计算矩阵分解.
一方面由于随机游走过程不需要显式地构造相似度并计算其矩阵分解,
因此随机游走方法的计算效率往往较高; 另一方面,
由于现实中随机游走的数量是有限的,
因此随机游走方法相当于在优化过程中进行了近似,
而直接采用矩阵分解方法可以更有效地优化目标函数.

图神经网络

概括来说, 卷积定理证明, 函数卷积的傅里叶变换是函数傅里叶变换的乘积.
因此, 利用谱图理论可以将该定理扩展到图数据上,
即两个图信号的卷积是其图傅里叶变换的乘积.
所以, 图卷积操作等价为如下过程:
首先利用图傅里叶变换将图上节点特征 (即若干图信号) 从空域转换到谱域,
在谱域与可学习的滤波器进行乘积操作, 然后再将处理后的特征通过图傅里叶逆变换转换回空域,
即得到处理后的图信号. 其中, 谱域图神经网络的可学习参数与谱域滤波器相关.
类比图像上的卷积神经网络, 上述过程均是可微分的,
因此谱域图神经网络可以实现图数据上端到端的学习.
谱域图卷积定义为三个步骤:
(1) 通过图傅里叶变换将两组图信号从空域转换到谱域;
(2) 在谱域上计算向量点乘;
(3) 将处理后的图信号经过图傅里叶逆变换转换回空域上.
与切比雪夫图卷积神经网络类似, 图卷积神经网络不需要显式地计算拉普拉斯矩阵的特征分解,
也无须切比雪夫图卷积神经网络中的递归计算, 因此图卷积神经网络的计算效率很高.
从图卷积神经网络的计算公式可以看出, 其计算复杂度与图中边的数量呈线性关系.
此外, 由于图卷积神经网络可被写为简单的矩阵形式, 其编程实现也非常便捷.

在每一层图卷积中, 每个节点的表征仅与其邻居节点相关,
因此图卷积神经网络有很强的空间局部性. 事实上,
图卷积神经网络也可从空域图神经网络角度理解,
从而将谱域图神经网络与空域图神经网络有效地联系到了一起.
由于消息传递图神经网络的所有操作都是连续可微分的,
因此消息传递图神经网络可以进行端到端的学习.
不难看出, 与前文介绍的空域图神经网络类似,
消息传递图神经网络也是通过在图的邻居结构上定义函数,
聚合邻居信息以更新节点表征的.
层次图池化模型的核心思想是逐步"粗化"处理图,
即有序减小图的规模, 并保留图的层次结构信息,
直至学习到所需要颗粒度的图表征.

图表征学习理论分析

谱域中的低频分量和高频分量也可能分别反映了图在空域中的局部结构信息和全局结构信息.
基于上述原因, 仅考虑低通滤波的图神经网络可能在许多图任务中存在局限,
这也促使很多研究者开始探索如何从图信号处理的角度来设计超越低通滤波器的图神经网络.
当每个节点可被唯一识别时, 图神经网络的消息传递机制可以更有效地传递信息.
然而, 对于现实中的图, 其经常无法保证节点可被唯一识别,
例如在图同构的 WL 测试中, 通常假设所有节点均无任何节点特征.
在这种情况下, 图神经网络的通用逼近性则无法保证. 此外,
利用节点编号等方式人工设置特征作为节点唯一识别往往是不可行的,
因为其无法满足图的置换等变性.

DDGM 和分层 VAE 之间的区别在于变分后验的定义和隐变量的维度,
但它们的整个构造是基本相同的. 我们可以联想到其学习目标是什么 --
是的, 是 ELBO!

具体来说, VAE 通过在潜在变量空间中引入一个先验分布来确保模型可以生成具有多样性的样本.
这个先验分布通常是高斯分布或者混合高斯分布. 在训练过程中,
VAE 尝试最大化重建数据的对数似然, 同时最小化模型学习到的潜在变量与先验分布之间的差异.
这个差异可以使用 KL 散度来度量, KL 散度是一种用于衡量两个分布之间差异的度量.
Transformer 模型 (2018 年):
它使用自注意力机制来捕捉输入序列中不同位置之间的依赖关系, 从而更好地处理长文本序列.
Transformer 模型在机器翻译和文本生成等任务中取得了非常好的效果, 是预训练技术的一个重要里程碑.

大规模预训练模型 (2018 年至今):
例如, BERT, GPT 等. 这些模型使用更大规模的数据集进行训练,
并且使用更复杂的网络结构和训练策略来提高效果和泛化能力.
这些大规模预训练模型在自然语言处理领域取得了非常显著的成果,
并且成为当前自然语言处理研究的一个重要方向.