« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

ABSTRACT

Weakness of GCN

capability of the state-of-the-art GCNs in fusing node features and topological structures is distant from optimal or even satisfactory

⇒ GCN은 topological structures과 node features 사이의 상관 정보를 adaptively 학습한다

Solution

GCN 성능은 유지하면서 node features와 topological structures을 같이 고려하는 AM-GCN

Adaptive Multi-channel Graph Convolutional Networks for semi-supervised classification

The cental idea is that we extract the specific and common embeddings from node features, topological structures, and their combinations simultaneously, and use the attention mechanism to learn adaptive importance weights of the embeddings.

중심 아이디어는 동시에 node feature와 topological structures 그리고 그들의 combinations로부터 명확하고 공통된 Embedding 값을 뽑아낸다는 것이다. 그리고 Attention mechanism을 사용해서 Embedding들의 Adaptive Importance Wieghts를 학습한다.

⇒ AM-GCN은 node features와 topological structures로부터 가장 correlated information을 뽑아내고 정확도를 향상시키게 된다.

1. INTRODUCTION

Typical GCN

usually follow a message-passing manner

A key step : feature aggregation - a node aggregates feature information from its topological neighbors in each convolutional layer

Q) In this way, feature information propagates over network topology to node embedding, and then node embedding learned as such is used in classification tasks.

Q) The whole process is supervised partially by the node labels.

우선 GCN에서도 fusion strategy를 사용한다. 그리고 이 fusion process는 end-to-end learning framework에 의해 supervised된다.

그러나,

최근 GCN 연구들에서 node feature와 topological structure을 fusing하는 것의 한계를 볼 수 있다.

1) Li et al.[15] shows that GCNs actually perform the Laplacian smoothing on node features, and make the node embedding in the whole network gradually converge.(모이다)

2) Nt and Maehara[20] and Wu et al.[30] prove that topological structure play the role of low-pass filtering on node features when the feature information propagates over network topological structure.

3) Gao et al. [8] design a Conditional Random Field (CRF) layer in GCN to explicitly preserve connectivity between nodes.

Hidden Markov Model (HMM)
Markov Model은 어떠한 날씨, 주식 가격 등과 같은 어떠한 현상의 변화를 확률 모델로 표현한 것이다. Hidden Markov Model은 이러한 Markov model에 은닉된 state와 직접적으로 확인 가능한 observation을 추가하여 확장한 것이다.
HMM은 observation을 이용하여 간접적으로 은닉된 state를 추론하기 위한 문제를 풀기 위해 사용된다.
위 그림은 은닉된 state와 그에 따른 observation의 개념을 나타낸다.
HMM을 이용해 우리가 풀고자 하는 문제는 관측 가능한 것은 오직 $y_t$ 뿐이며,
$y_t$ 는 $q_t$ 에 종속적으로 발생한다고 할 때, $y_t$ 의 sequence를 통해 $q_t$ 의 sequence를 추
론하는 것이다.
→ 현재는 필기체 인식, 동작 인식, 품사 태깅 등과 같이 이전 시간의 데이터에 영향을
받는 순차 데이터에서 패턴을 인식하기 위해 이용되고 있다
1. HMM의 구조
HMM은 $<Q,Y,\pi,T,E>$ 라는 tuple로 정의. 각각의 요소가 의미하는 내용은 다음과 같다.
- $Q=\{q_1,q_2,...,q_N\}$ : 은닉된 state들의 집합
- $Y=\{y_1,y_2,...,y_M\}$ : 은닉된 state에서 발생할 수 있는 observation들의 집합
- $\pi$ : $\mathbb{R}^N$ - 초기 state가 $q_i$ 일 확률을 나타내는 initial probability $p(q_i)$ 의 집합
- $A$ : $\mathbb{R}^{N\times N}$ - $q_i$ 에서 $q_j$ 로 이동할 확률을 나타내는 transition probability
  $p(q_j|q_i)$ 의 집합
- $B$ : $\mathbb{R}^{N\times M}$ - $q_i$ 에서 $y_j$ 가 발생할 확률을 나타내는 emission probability
  $p(y_j|q_i)$ 의 집합
그래프에서 $s_i$ 는 은닉된 state를 나타내며, $Q$ 에 있는 원소 중 한
가지 값을 갖는다. $o_j$ 는 $s_i$ 에서 발생한 observation을 의미하며,
Y의 원소 중 한 가지 값을 갖는다.

예시)

지출내역을 통해 과거의 행동을 유추하는 문제를 HMM으로 정
의할 수 있다. 오늘 하려는 행동은 무엇을 했는지에 영향을 받으
며, 과거에 했던 행동은 과거로 돌아가지 않는 한 다시는 관측할
수 없기 때문에 과거의 행동은 은닉된 state에 해당한다.
이 예제에서 $Q=\{study,friend,game\}$
지출 내역은 카드명세서를 통해 과거의 내역이더라도 직접적으
로 관찰할 수 있기 때문에 observation에 해당됨. 이 예제에서
$Y=\{1,2,3,4\}$ 이며, 값이 클 수록 돈을 많이 사용했다는 것을
의미.
→ HMM은 이러한 확률적 사실들을 바탕으로 observation을 통
해 은닉된 state를 추론.
2. HMM의 동작

Conditional Random Field (CRF)
하나의 sequence를 보고 판단하는 것이 아니라 이전 혹은 이후를 참고하여 지금 상태를 결정하는 것 ⇒ Conditional Random Field (CRF)

What information do GCNs really learn and fuse from topological structures and node features?

⇒ fundamental question인데 그 이유는 GCN이 end-to-end learning framework이기 때문이다.

GCN cannot adequately fuse node features and topological structures to extract the most correlated information.

*** 이 논문의 목적은 GCN의 성능은 그래도 유지하면서 동시에 topological structures와 node features를 fusing하는 능력을 향상 시키는 것이다 !!

Central Idea

we learn the node embedding based on node features, topological structures, and their combinations simultaneously !!

Technically,

in order to fully exploit the information in feature space, we derive k-nearest neighbor graph generated from node features as the feature structural graph.

1) With the feature graph and the topology graph, we propagate node features over both topology space and feature space, so as to extract two specific embeddings in these two spaces with two specific convolution modules.

2) Considering the common characteristics between two spaces, we disign a common convolution module with a parameter sharing strategy to extract the common embedding shared by them.

3) utilize the attention mechanism to automatically learn the importance weights for different embeddings, so as to adaptively fuse them.

4) design the consistency and disparity constraints to ensure the consistency and disparity of the learned embeddings.

2. FUSION CAPABILITY OF GCNS : AN EXPERIMENTAL INVESTIGATION

GCN이 topological structure와 node feature로부터 정보를 adaptively extract하지 못한다는 것을

보여주는 실험

The main idea is that we will clearly establish the high correlation between node label with network topology and node features, respectively.

⇒ GCN의 좋은 fusion 능력이란 correlated information을 adaptively extract하는 것이다.

2.1 Case 1: Random Topology and Correlated Node Features

For each class randomly select 20 nodes for training and another 200 nodes for test

Train : GCN and MLP (baseline)

The node features are highly correlated with the node labels, MLP shows excellent performance

GCN extracts information from both node features and topological structures but cannot adaptively fuse them to avoid the interference from topological structures.
→ Cannot match the high performace of MLP

2.2 Case 2: Correlated Topology and Random Node Features

Generate another network with 900 nodes

The node features, each of 50 dimensions, are randomly generated

For the topological structure, employ the Stochastic Blockmodel(SBM) to split nodes into 3 communities

Within each community, the probability of building an edge is set to 0.03, and the probability of building an edge between nodes in different communities is set to 0.0015.

In this data set, the node labels are highly correlated with the topological structures, but not the node features

Stochastic Blockmodel (SBM)
A random graph model with planted clusters.
🤖Deep Learning 기초

For each class randomly select 20 nodes for training and another 200 nodes for test

Train : GCN and DeepWalk

DeepWalk performs well because it models network topological structures throughly

GCN extracts information from both the node features and the topological structures, but cannot adaptively fuse them to avoid the inference from node features.
→ Cannot match the high performance of DeepWalk

Summary

The current fusion mechanism of GCN is distant from optimal

Even the correlation between node label with network topology or node features is very high, the current GCN cannot make full use of the supervision by node label to adaptively extract the most correlated information

Rethink the current mechanism of GCN because it is hard to know whether the topology or the node features are more correlated with the final task

3. AM-GCN : THE PROPOSED MODEL

Problem Setting

semi-supervised node classification

in an attributed graph $G=(A,X)$ ,

where $A \in\mathbb{R}^{n\times n}$ is the symmetric adjacency matrix with n nodes,

$X\in \mathbb{R}^{n \times d}$ is the node feature matrix,

$d$ is the dimension of node features.

Suppose each node belongs to one out of $C\$ classes

(위의 그림은 overall framework of AM-GCN)

Key Idea

Node features to propagate not only in topology space and in feature space

The most correlated information with node label should be extracted from both of these two spaces

Construct a feature graph based on node features $X$

$X$ propagate over both of feature graph and topology graph to learn two specific embeddings $Z_F$ and $Z_T$

Design common convolution module with parameter sharing strategy to learn the common embedding $Z_{CF}$ and $Z_{CT}$

Consistency constraint $\mathcal{L}_c$ is employed to enhance the "common" property of $Z_{CF}$ and $Z_{CT}$

Disparity constraint $\mathcal{L}_d$ is to ensure the independence between $Z_F$ and $Z_{CF},\$ $Z_T$ and $Z_{CT}$

Utilizes an Attention Mechanism to adaptively fuse these embeddings and extract the most correlated information $Z$ for the final classification task

3.1 Specific Convolution Module

In order to capture the underlying structure of nodes

in Feature Space

construct a $k$ -nearest neighbor( $kNN$ ) graph $G_f=(A_f, X)$ based on node feature matrix $X$

$A_f$ : The adjacency matrix of $kNN$ graph

$n$ 개의 node들 사이에서 similarity matrix $S\in \mathbb{R}^{n\times n}$ 를 계산

➕ $S$ 를 연산하는데 많은 방법들이 있지만, 가장 많이 쓰이는 2가지

나열. ( $x_i$ and $x_j$ are feature vectors of nodes $i$ and $j$ )

1) Cosine Similarity (논문에서는 이거 사용함)

It uses the cosine value of the angle between two vectors to measure the similarity

$S_{ij}=\frac{x_i\cdot x_j}{|x_i||x_j|}$

2) Heat Kernel

$t$ is the time parameter in heat conduction equation and we set $t=2$

$S_{ij}=e^{-\frac{|||x_i-x_j||^2}{t}}$

결론

Choose the Consine Similarity to obtain the similarity matrix $S$

Choose top $k$ similar node pairs for each node to set edges and
finally get the adjacency matrix $A_f$

With the input graph ( $A_f,X$ ) in feature graph, the $l$ -th layer
output $Z_f^{(l)}$ can be represented as :
$Z_f^{(l)}=ReLU(\tilde{D}^{-\frac{1}{2}}\tilde{A_f}\tilde{D}^{-\frac{1}{2}}Z_f^{(l-1)}W_f^{(l)})$
$W_f^{(l)}$ : weight matrix of the $l$ -th layer in $GCN$ ,
$ReLU$ : Relu activation function,
The initial $Z_f^{(0)}=X$ ,
$\tilde{A}_f=A_f+I_f$ ,
$\tilde{D}_f$ : the diagonal degree matrix of $\tilde{A}_f$ ,
$Z_F$ : the last layer output embedding

이 과정을 통해서, we can learn the node embedding which captures the specific information $Z_F$  in feature space

As for the topology space,

we have the original input graph $G_t=(A_t,X_t)$

where $A_t=A$ and $X_t=X$

→ Then, the learned output embedding $Z_T$  based on topology graph can be calculated in the same way as in feature space.

3.2 Common Convolution Module

Not only extract the node specific embedding in these two spaces, but also to extract the common information shared by the two spaces.

⇒ common한 것을 찾는 이유가 정보의 어느 부분이 가장 correlated한지 결정하는 task를 하는데

유용하다. 그리고 common information을 찾기 위해서 Common-GCN을 디자인함.

First,

1) Common-GCN to extract the node embedding $Z_{ct}^{(l)}$ from topology graph $(A_t,X)$ as follows

$Z_{ct}^{(l)}=ReLU(\tilde{D_t}^{-\frac{1}{2}}\tilde{A_t}\tilde{D_t}^{-\frac{1}{2}}Z_{ct}^{(l-1)}W_c^{(l)})$ (4)

where $W_c^{(l)}$ is the $l$ -th layer weight matrix of $Common-GCN$ and $Z_{ct}^{(l-1)}$ is the node embedding in the $(l-1)$ th layer and $Z_{ct}^{(0)}=X$ .

2) When utilizing $Common-GCN$ to learn the node embedding from feature graph $(A_f,X)$ ,

in order to extract the shared information, we share the same weight matrix $W_c^{(l)}$ for every layer of $Common-GCN$ as follows:

$Z_{cf}^{(l)}=ReLU(\tilde{D}_f^{-\frac{1}{2}}\tilde{A}_f\tilde{D}_f^{-\frac{1}{2}}Z_{cf}^{(l-1)}W_c^{(l)})$

where $Z_{cf}^{(l)}$ is the $l$ -layer output embedding and $Z_{cf}^{(0)}=X$ .

The shared weight matrix can filter out the shared characteristics from two spaces.

⇒ 이 부분 이해안됨 !!

According to different input graphs,

we can get two output embedding $Z_{CT}$ and $Z_{CF}$ and the common embedding $Z_C$ of the two spaces is :

$Z_C=(Z_{CT}+Z_{CF})/2$

3.3 Attention Mechanism

Now we have two specific embeddings $Z_T, Z_F$ and one common embedding $Z_C$

Considering the node label can be correlated with one of them or even their combinations,

we use the attention mechanism $att(Z_T,Z_C,Z_F)$ to learn their corresponding importance

$(\alpha_t,\alpha_c,\alpha_f)\in \mathbb{R}^{n\times 1}$ indicate the attention values of $n$ nodes with embeddings $Z_T,Z_C,Z_F$

Here we focus on node $i,$ where its embedding in $Z_T$ is $z_T^i\in \mathbb{R}^{1\times h}$ (i.e., the $i$ -th row of $Z_T$ )

Firstly,

transform the embedding through a nonlinear transformation, and then use one shared attention vector $q\in \mathbb{R}^{h' \times 1}$ to get attention value $\omega_T^i$ as follows:

$\omega_T^i=q^T\cdot tanh(W\cdot(z_T^i)^T+b)$

$W\in \mathbb{R}^{h'\times h}$ is the weight matrix and $b\in \mathbb{R}^{h' \times 1}$ is the bias vector.

We can get the attention values $\omega_C^i$ and $\omega_F^i$ for node $i$ in embedding matrices $Z_C$ and $Z_F$ , respectively.

Normalization

normalize the attention values $\omega_T^i,\omega_C^i,\omega_F^i$ with softmax function to get the final weight:

$\alpha_T^i=softmax(\omega_T^i)=\frac{exp(\omega_T^i)}{exp(\omega_T^i)+exp(\omega_C^i)+exp(\omega_F^i)}$

⇒ Larger $\alpha_T^i$ implies the corresponding embedding is more important !!

Similarly, $a_C^i=softmax(\omega_C^i)$ and $\alpha_F^i=softmax(\omega_F^i)$

For all the $n$ nodes, we have the learned weights $\alpha_t=[\alpha_T^i],\alpha_c=[\alpha_C^i], \alpha_f=[\alpha_F^i] \in \mathbb{R}^{n\times 1}$

$\alpha_T=diag(\alpha_t), \alpha_C=diag(\alpha_c),\alpha_F=diag(\alpha_f)$

⇒ Combine these three embeddings to obtain the final embedding $Z$ :

$Z=\alpha_T\cdot Z_T+ \alpha_C\cdot Z_C+\alpha_F\cdot Z_F$

3.4 Objective Function

3.4.1 Consistency Constraint

For the two output embeddings $Z_{CT}$ and $Z_{CF}$ of Common-GCN,

(despite the Common-GCN has the shared weight matrix)

we design a consistency constraint to further enhance their commonality.

Firstly,

we use L2-normalization to normalize the embedding matrix as $Z_{CTnor},Z_{CFnor}$

Then, the two normalized matrix can be used to capture the similarity of $n$ nodes as $S_T$ and $S_F$

$S_T=Z_{CTnor}\cdot Z_{CTnor}^T$

$S_F=Z_{CFnor}\cdot Z_{CFnor}^T$

The consistency implies that the two similarity matrices should be similar, which gives rise to following constraint :

$\mathcal{L}_c=||S_T-S_F||^2_F$

3.4.2 Disparity Constraint

Because embedding $Z_T$ and $Z_{CT}$ are learned from the same graph $G_t=(A_t,X_t)$ , to ensure they can capture different information, We employ the Hilbert-Schmidt Independence Criterion (HSIC),

a simple but effective measure of independence, to enhance the disparity of these two embeddings.

Uploaded by Notion2Tistory v1.1.0

꾸준히 써보는 공부 기록

AM-GCN 본문

AM-GCN

ABSTRACT

Weakness of GCN

Solution

Adaptive Multi-channel Graph Convolutional Networks for semi-supervised classification

1. INTRODUCTION

Typical GCN

1. HMM의 구조

2. HMM의 동작

Central Idea

Technically,

2. FUSION CAPABILITY OF GCNS : AN EXPERIMENTAL INVESTIGATION

The main idea is that we will clearly establish the high correlation between node label with network topology and node features, respectively.

2.1 Case 1: Random Topology and Correlated Node Features

2.2 Case 2: Correlated Topology and Random Node Features

Summary

3. AM-GCN : THE PROPOSED MODEL

Problem Setting

Key Idea

3.1 Specific Convolution Module

In order to capture the underlying structure of nodes

in Feature Space

1) Cosine Similarity (논문에서는 이거 사용함)

2) Heat Kernel

결론

As for the topology space,

3.2 Common Convolution Module

First,

According to different input graphs,

3.3 Attention Mechanism

Firstly,

Normalization

3.4 Objective Function

3.4.1 Consistency Constraint

Firstly,

3.4.2 Disparity Constraint

'Paper Review' 카테고리의 다른 글

티스토리툴바