benedekrozemberczki/SimGNN
A PyTorch implementation of “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” (WSDM 2019).
repo name | benedekrozemberczki/SimGNN |
repo link | https://github.com/benedekrozemberczki/SimGNN |
homepage | |
language | Python |
size (curr.) | 2563 kB |
stars (curr.) | 233 |
created | 2019-01-31 |
license | GNU General Public License v3.0 |
SimGNN
A PyTorch implementation of “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” (WSDM 2019).
Abstract
This repository provides a PyTorch implementation of SimGNN as described in the paper:
SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, Wei Wang. WSDM, 2019. [Paper]
A reference Tensorflow implementation is accessible [here] and another implementation is [here].
Requirements
The codebase is implemented in Python 3.5.2. package versions used for development are just below.
networkx 2.4
tqdm 4.28.1
numpy 1.15.4
pandas 0.23.4
texttable 1.5.0
scipy 1.1.0
argparse 1.1.0
torch 1.1.0
torch-scatter 1.4.0
torch-sparse 0.4.3
torch-cluster 1.4.5
torch-geometric 1.3.2
torchvision 0.3.0
scikit-learn 0.20.0
Datasets
Every JSON file has the following key-value structure:
{"graph_1": [[0, 1],[1, 2],[2, 3],[3, 4]],
"graph_2": [[0, 1], [1, 2], [1, 3], [3, 4], [2, 4]],
"labels_1": [2, 2, 2, 2],
"labels_2": [2, 3, 2, 3],
"ged": 1}
Options
Input and output options
--training-graphs STR Training graphs folder. Default is `dataset/train/`.
--testing-graphs STR Testing graphs folder. Default is `dataset/test/`.
Model options
--filters-1 INT Number of filter in 1st GCN layer. Default is 128.
--filters-2 INT Number of filter in 2nd GCN layer. Default is 64.
--filters-3 INT Number of filter in 3rd GCN layer. Default is 32.
--tensor-neurons INT Neurons in tensor network layer. Default is 16.
--bottle-neck-neurons INT Bottle neck layer neurons. Default is 16.
--bins INT Number of histogram bins. Default is 16.
--batch-size INT Number of pairs processed per batch. Default is 128.
--epochs INT Number of SimGNN training epochs. Default is 5.
--dropout FLOAT Dropout rate. Default is 0.5.
--learning-rate FLOAT Learning rate. Default is 0.001.
--weight-decay FLOAT Weight decay. Default is 10^-5.
--histogram BOOL Include histogram features. Default is False.
Examples
Training a SimGNN model for a 100 epochs with a batch size of 512.
python src/main.py --epochs 100 --batch-size 512
Training a SimGNN with histogram features.
python src/main.py --histogram
Training a SimGNN with histogram features and a large bin number.
python src/main.py --histogram --bins 32
Increasing the learning rate and the dropout.
python src/main.py --learning-rate 0.01 --dropout 0.9