Numerous real-world and scientific complex structures are represented in form of networks, such
as social networks [1], citation networks [2], and biological networks [3, 4]. As a network analysis
tool, network representation learning has attracted increasing attention in recent years. Network
representation learning aims to represent the nodes in the network as low-dimensional dense vector
forms, so that the obtained vector has the ability of representation and reasoning in vector space, and
can be applied to many downstream applications, such as node classification [5, 6], link prediction [7],
and node visualization [8], etc.
Recently, a large number of network representation learning methods have been proposed, which
can be simply divided into two categories. One is to preserve the network structure properties, the
other is to maintain the attribute properties. The first type of methods aims to keep the embeddings
of nodes with similar structures as similar as possible. For instance, DeepWalk [9], Node2Vec [10],
LINE [11], NetMF [12], GraRep [13], HOPE [14], FSADA [15], and so on. The second type of methods
aims to keep the embeddings of nodes with similar attributes as similar as possible. Which mainly
includes CAN [16], STNE [17], AANE [18], TADW [19], ANRL [20], PGE [21], etc.
However, most existing representation learning methods focus on embedding the node that already
exists in network. More specifically, they are essentially transductive and cannot be directly applied to
unseen nodes. On the contrary, inductive methods can quickly generate embedding for newly-added
nodes, which are urgently needed in real-world applications, e.g., category prediction of newly-added
papers in citation network, social recommendation for new users, etc. Unfortunately, most network
embedding methods are impracticable for unseen nodes, which makes inductive network embedding
much more challenging than transductive problems.
To address this challenge, a few inductive representation methods have been proposed. Some of
them generate unseen nodes representation by fusing node attribute information, such as Planetoid
[22] and EP-B [23]. Others infer the unseen nodes embedding by capturing the proximity relationship
between nodes with various machine learning methods, mainly including GraphSAGE [24], GAT [25],
SPINE [26], SANNE [27], Caps2NE [28], etc. The above approaches can well capture the structure
and attribute information, but ignore the hierarchical information which is ubiquitous in network.
For example, as a citation network shown in Fig.1, we can roughly divide papers into four categories
2
Figure 1: An example of a hierarchical network, where (a) is a small citation network, articles of the same type are
marked with the same color. (b) is a hierarchical citation network abstracted from (a) according to ACM Computing
Classification System. (c) is the category to which the article belongs
according to ACM Computing Classification System
2
: MTrans, InfoE, MPP, and RP. Among them,
MTrans can be classified as NLP articles, and can be classified as AI articles further, each of the above
steps divides the article into a coarser classification, and other articles can be divided in the same way,
as a result, the citation network can be treated as a typical hierarchical network from fine to coarse.
Real-world networks such as social networks and biological networks also have similar hierarchical
characteristics.
In fact, the hierarchical information is not only ubiquitous but also plays a major role in vari-ous network applications. For example, Hierarchical [29] can use hierarchical information to predict
lost links. Some representation learning algorithms such as Harp [30], HSRL [31], and Mile [32] use
hierarchical information to obtain higher-quality node representations.
In this paper, we present HINRL, a method of Hierarchical Inductive Network Representation
Learning, which can induct the embedding of newly-added nodes while retaining the hierarchical infor-mation and the local information of the networks. To be specific, the method first builds a hierarchical
network which is a hierarchy of successively smaller networks from fine to coarse by the coarsening
strategy fusing topological structure and node attributes. Then we inherit the embeddings from the
coarse layer to preserve the hierarchical information and use an unsupervised graph convolutional neu-ral network to maintain the local information of the network, thus we get a well-trained refinement
model. After the new nodes are hierarchized using the node neighbor hierarchy information, the well-trained refinement model is used to generate their embeddings. We summarize the contributions of
this paper as follows:
1) We propose HINRL, an innovative method of Hierarchical Inductive Network Representation
Learning framework, which can directly induct the unseen nodes without retraining the model.
2) We build a hierarchy of successively smaller networks from fine to coarse by the coarsening
strategy fusing topological structure and node attributes to preserve the hierarchical information of
the network.
3) We design a refinement model that fuses hierarchical and local information of the network to
obtain a better network representation.
The experiments of node classification on three well-known benchmark datasets show that HINRL
is superior to state-of-the-art inductive representation learning methods