ABSTRACT
Weakness of GCN
capability of the state-of-the-art GCNs in fusing node features and topological structures is distant from optimal or even satisfactory
⇒ GCN은 topological structures과 node features 사이의 상관 정보를 adaptively 학습한다
Solution
GCN 성능은 유지하면서 node features와 topological structures을 같이 고려하는 AM-GCN
Adaptive Multi-channel Graph Convolutional Networks for semi-supervised classification
The cental idea is that we extract the specific and common embeddings from node features, topological structures, and their combinations simultaneously, and use the attention mechanism to learn adaptive importance weights of the embeddings.
중심 아이디어는 동시에 node feature와 topological structures 그리고 그들의 combinations로부터 명확하고 공통된 Embedding 값을 뽑아낸다는 것이다. 그리고 Attention mechanism을 사용해서 Embedding들의 Adaptive Importance Wieghts를 학습한다.
⇒ AM-GCN은 node features와 topological structures로부터 가장 correlated information을 뽑아내고 정확도를 향상시키게 된다.
1. INTRODUCTION
Typical GCN
usually follow a message-passing manner
A key step : feature aggregation - a node aggregates feature information from its topological neighbors in each convolutional layer
Q) In this way, feature information propagates over network topology to node embedding, and then node embedding learned as such is used in classification tasks.
Q) The whole process is supervised partially by the node labels.
우선 GCN에서도 fusion strategy를 사용한다. 그리고 이 fusion process는 end-to-end learning framework에 의해 supervised된다.
그러나,
최근 GCN 연구들에서 node feature와 topological structure을 fusing하는 것의 한계를 볼 수 있다.
1) Li et al.[15] shows that GCNs actually perform the Laplacian smoothing on node features, and make the node embedding in the whole network gradually converge.(모이다)
2) Nt and Maehara[20] and Wu et al.[30] prove that topological structure play the role of low-pass filtering on node features when the feature information propagates over network topological structure.
3) Gao et al. [8] design a Conditional Random Field (CRF) layer in GCN to explicitly preserve connectivity between nodes.
Hidden Markov Model (HMM)
Markov Model은 어떠한 날씨, 주식 가격 등과 같은 어떠한 현상의 변화를 확률 모델로 표현한 것이다. Hidden Markov Model은 이러한 Markov model에 은닉된 state와 직접적으로 확인 가능한 observation을 추가하여 확장한 것이다.
HMM은 observation을 이용하여 간접적으로 은닉된 state를 추론하기 위한 문제를 풀기 위해 사용된다.
위 그림은 은닉된 state와 그에 따른 observation의 개념을 나타낸다.
HMM을 이용해 우리가 풀고자 하는 문제는 관측 가능한 것은 오직 뿐이며,
는 에 종속적으로 발생한다고 할 때, 의 sequence를 통해 의 sequence를 추
론하는 것이다.
→ 현재는 필기체 인식, 동작 인식, 품사 태깅 등과 같이 이전 시간의 데이터에 영향을
받는 순차 데이터에서 패턴을 인식하기 위해 이용되고 있다
1. HMM의 구조
HMM은 라는 tuple로 정의. 각각의 요소가 의미하는 내용은 다음과 같다.
- : 은닉된 state들의 집합
- : 은닉된 state에서 발생할 수 있는 observation들의 집합
- : - 초기 state가 일 확률을 나타내는 initial probability 의 집합
- : - 에서 로 이동할 확률을 나타내는 transition probability
의 집합
그래프에서 는 은닉된 state를 나타내며, 에 있는 원소 중 한
가지 값을 갖는다. 는 에서 발생한 observation을 의미하며,
Y의 원소 중 한 가지 값을 갖는다.
예시)
지출내역을 통해 과거의 행동을 유추하는 문제를 HMM으로 정
의할 수 있다. 오늘 하려는 행동은 무엇을 했는지에 영향을 받으
며, 과거에 했던 행동은 과거로 돌아가지 않는 한 다시는 관측할
수 없기 때문에 과거의 행동은 은닉된 state에 해당한다.
이 예제에서
지출 내역은 카드명세서를 통해 과거의 내역이더라도 직접적으
로 관찰할 수 있기 때문에 observation에 해당됨. 이 예제에서
이며, 값이 클 수록 돈을 많이 사용했다는 것을
의미.
→ HMM은 이러한 확률적 사실들을 바탕으로 observation을 통
해 은닉된 state를 추론.
2. HMM의 동작
What information do GCNs really learn and fuse from topological structures and node features?
⇒ fundamental question인데 그 이유는 GCN이 end-to-end learning framework이기 때문이다.
GCN cannot adequately fuse node features and topological structures to extract the most correlated information.
*** 이 논문의 목적은 GCN의 성능은 그래도 유지하면서 동시에 topological structures와 node features를 fusing하는 능력을 향상 시키는 것이다 !!
Central Idea
we learn the node embedding based on node features, topological structures, and their combinations simultaneously !!
Technically,
in order to fully exploit the information in feature space, we derive k-nearest neighbor graph generated from node features as the feature structural graph.
1) With the feature graph and the topology graph, we propagate node features over both topology space and feature space, so as to extract two specific embeddings in these two spaces with two specific convolution modules.
2) Considering the common characteristics between two spaces, we disign a common convolution module with a parameter sharing strategy to extract the common embedding shared by them.
3) utilize the attention mechanism to automatically learn the importance weights for different embeddings, so as to adaptively fuse them.
4) design the consistency and disparity constraints to ensure the consistency and disparity of the learned embeddings.
2. FUSION CAPABILITY OF GCNS : AN EXPERIMENTAL INVESTIGATION
GCN이 topological structure와 node feature로부터 정보를 adaptively extract하지 못한다는 것을
보여주는 실험
The main idea is that we will clearly establish the high correlation between node label with network topology and node features, respectively.
⇒ GCN의 좋은 fusion 능력이란 correlated information을 adaptively extract하는 것이다.
2.1 Case 1: Random Topology and Correlated Node Features
- For each class randomly select 20 nodes for training and another 200 nodes for test
- Train : GCN and MLP (baseline)
- The node features are highly correlated with the node labels, MLP shows excellent performance
- GCN extracts information from both node features and topological structures but cannot adaptively fuse them to avoid the interference from topological structures.
→ Cannot match the high performace of MLP
2.2 Case 2: Correlated Topology and Random Node Features
- Generate another network with 900 nodes
- The node features, each of 50 dimensions, are randomly generated
- For the topological structure, employ the Stochastic Blockmodel(SBM) to split nodes into 3 communities
- Within each community, the probability of building an edge is set to 0.03, and the probability of building an edge between nodes in different communities is set to 0.0015.
- In this data set, the node labels are highly correlated with the topological structures, but not the node features
- For each class randomly select 20 nodes for training and another 200 nodes for test
- Train : GCN and DeepWalk
- DeepWalk performs well because it models network topological structures throughly
- GCN extracts information from both the node features and the topological structures, but cannot adaptively fuse them to avoid the inference from node features.
→ Cannot match the high performance of DeepWalk
Summary
- The current fusion mechanism of GCN is distant from optimal
- Even the correlation between node label with network topology or node features is very high, the current GCN cannot make full use of the supervision by node label to adaptively extract the most correlated information
- Rethink the current mechanism of GCN because it is hard to know whether the topology or the node features are more correlated with the final task
3. AM-GCN : THE PROPOSED MODEL
Problem Setting
semi-supervised node classification
in an attributed graph ,
where is the symmetric adjacency matrix with n nodes,
is the node feature matrix,
is the dimension of node features.
Suppose each node belongs to one out of classes
(위의 그림은 overall framework of AM-GCN)
Key Idea
- Node features to propagate not only in topology space and in feature space
- The most correlated information with node label should be extracted from both of these two spaces
- Construct a feature graph based on node features
- propagate over both of feature graph and topology graph to learn two specific embeddings and
- Design common convolution module with parameter sharing strategy to learn the common embedding and
- Consistency constraint is employed to enhance the "common" property of and
- Disparity constraint is to ensure the independence between and and
- Utilizes an Attention Mechanism to adaptively fuse these embeddings and extract the most correlated information for the final classification task
3.1 Specific Convolution Module
In order to capture the underlying structure of nodes
in Feature Space
- construct a -nearest neighbor() graph based on node feature matrix
- : The adjacency matrix of graph
- 개의 node들 사이에서 similarity matrix 를 계산
➕ 를 연산하는데 많은 방법들이 있지만, 가장 많이 쓰이는 2가지
나열. ( and are feature vectors of nodes and )
1) Cosine Similarity (논문에서는 이거 사용함)
It uses the cosine value of the angle between two vectors to measure the similarity
2) Heat Kernel
is the time parameter in heat conduction equation and we set
결론
- Choose the Consine Similarity to obtain the similarity matrix
- Choose top similar node pairs for each node to set edges and
finally get the adjacency matrix
- With the input graph () in feature graph, the -th layer
output can be represented as :
: weight matrix of the -th layer in ,
: Relu activation function,
The initial ,
,
: the diagonal degree matrix of ,
: the last layer output embedding
이 과정을 통해서, we can learn the node embedding which captures the specific information in feature space
As for the topology space,
we have the original input graph
where and
→ Then, the learned output embedding based on topology graph can be calculated in the same way as in feature space.
3.2 Common Convolution Module
Not only extract the node specific embedding in these two spaces, but also to extract the common information shared by the two spaces.
⇒ common한 것을 찾는 이유가 정보의 어느 부분이 가장 correlated한지 결정하는 task를 하는데
유용하다. 그리고 common information을 찾기 위해서 Common-GCN을 디자인함.
First,
1) Common-GCN to extract the node embedding from topology graph as follows
(4)
where is the -th layer weight matrix of and is the node embedding in the th layer and .
2) When utilizing to learn the node embedding from feature graph ,
in order to extract the shared information, we share the same weight matrix for every layer of as follows:
where is the -layer output embedding and .
The shared weight matrix can filter out the shared characteristics from two spaces.
⇒ 이 부분 이해안됨 !!
According to different input graphs,
we can get two output embedding and and the common embedding of the two spaces is :
3.3 Attention Mechanism
Now we have two specific embeddings and one common embedding
Considering the node label can be correlated with one of them or even their combinations,
we use the attention mechanism to learn their corresponding importance
indicate the attention values of nodes with embeddings
Here we focus on node where its embedding in is (i.e., the -th row of )
Firstly,
transform the embedding through a nonlinear transformation, and then use one shared attention vector to get attention value as follows:
is the weight matrix and is the bias vector.
We can get the attention values and for node in embedding matrices and , respectively.
Normalization
normalize the attention values with softmax function to get the final weight:
⇒ Larger implies the corresponding embedding is more important !!
Similarly, and
For all the nodes, we have the learned weights
⇒ Combine these three embeddings to obtain the final embedding :
3.4 Objective Function
3.4.1 Consistency Constraint
For the two output embeddings and of Common-GCN,
(despite the Common-GCN has the shared weight matrix)
we design a consistency constraint to further enhance their commonality.
Firstly,
we use L2-normalization to normalize the embedding matrix as
Then, the two normalized matrix can be used to capture the similarity of nodes as and
The consistency implies that the two similarity matrices should be similar, which gives rise to following constraint :
3.4.2 Disparity Constraint
Because embedding and are learned from the same graph , to ensure they can capture different information, We employ the Hilbert-Schmidt Independence Criterion (HSIC),
a simple but effective measure of independence, to enhance the disparity of these two embeddings.
Uploaded by Notion2Tistory v1.1.0