December 3, 2019

386 words 2 mins read

benedekrozemberczki/SimGNN

benedekrozemberczki/SimGNN

A PyTorch implementation of “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” (WSDM 2019).

repo name benedekrozemberczki/SimGNN
repo link https://github.com/benedekrozemberczki/SimGNN
homepage
language Python
size (curr.) 2563 kB
stars (curr.) 233
created 2019-01-31
license GNU General Public License v3.0

SimGNN PWC GitHub stars GitHub forks License

A PyTorch implementation of “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” (WSDM 2019).

Abstract

This repository provides a PyTorch implementation of SimGNN as described in the paper:

SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, Wei Wang. WSDM, 2019. [Paper]

A reference Tensorflow implementation is accessible [here] and another implementation is [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
torch-scatter     1.4.0
torch-sparse      0.4.3
torch-cluster     1.4.5
torch-geometric   1.3.2
torchvision       0.3.0
scikit-learn      0.20.0

Datasets

Every JSON file has the following key-value structure:

{"graph_1": [[0, 1],[1, 2],[2, 3],[3, 4]],
 "graph_2": [[0, 1], [1, 2], [1, 3], [3, 4], [2, 4]],
 "labels_1": [2, 2, 2, 2],
 "labels_2": [2, 3, 2, 3],
 "ged": 1}

Options

Input and output options

  --training-graphs   STR    Training graphs folder.      Default is `dataset/train/`.
  --testing-graphs    STR    Testing graphs folder.       Default is `dataset/test/`.

Model options

  --filters-1             INT         Number of filter in 1st GCN layer.       Default is 128.
  --filters-2             INT         Number of filter in 2nd GCN layer.       Default is 64. 
  --filters-3             INT         Number of filter in 3rd GCN layer.       Default is 32.
  --tensor-neurons        INT         Neurons in tensor network layer.         Default is 16.
  --bottle-neck-neurons   INT         Bottle neck layer neurons.               Default is 16.
  --bins                  INT         Number of histogram bins.                Default is 16.
  --batch-size            INT         Number of pairs processed per batch.     Default is 128. 
  --epochs                INT         Number of SimGNN training epochs.        Default is 5.
  --dropout               FLOAT       Dropout rate.                            Default is 0.5.
  --learning-rate         FLOAT       Learning rate.                           Default is 0.001.
  --weight-decay          FLOAT       Weight decay.                            Default is 10^-5.
  --histogram             BOOL        Include histogram features.              Default is False.

Examples

Training a SimGNN model for a 100 epochs with a batch size of 512.

python src/main.py --epochs 100 --batch-size 512

Training a SimGNN with histogram features.

python src/main.py --histogram

Training a SimGNN with histogram features and a large bin number.

python src/main.py --histogram --bins 32

Increasing the learning rate and the dropout.

python src/main.py --learning-rate 0.01 --dropout 0.9
comments powered by Disqus