A repository of pretty cool datasets that I collected for network science and machine learning research.
Datasets
Datasets collected for network science and machine learning research.
Contents
- GitHub StarGazer Graphs
- Twitch Ego Nets
- Reddit Thread Graphs
- Deezer Ego Nets
- GitHub Social Network
- Deezer Social Networks
- Facebook Page-Page Networks
- Wikipedia Article Networks
- Twitch Social Networks
- Facebook Large Page-Page Network
GitHub StarGazer Graphs
Description
Link
Properties
- Number of graphs: 12,725
- Directed: No.
- Node features: No.
- Edge features: No.
- Graph labels: Yes. Binary-labeled.
- Temporal: No.
|
Min |
Max |
Nodes |
10 |
957 |
Density |
0.003 |
0.561 |
Diameter |
2 |
18 |
Possible Tasks
Citing
>@misc{karateclub2020,
title={An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs},
author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
year={2020},
eprint={2003.04819},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Twitch Ego Nets
Description
Link
Properties
- Number of graphs: 127,094
- Directed: No.
- Node features: No.
- Edge features: No.
- Graph labels: Yes. Binary-labeled.
- Temporal: No.
|
Min |
Max |
Nodes |
14 |
52 |
Density |
0.038 |
0.967 |
Diameter |
1 |
2 |
Possible Tasks
Citing
>@misc{karateclub2020,
title={An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs},
author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
year={2020},
eprint={2003.04819},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Reddit Thread Graphs
Description
Link
Properties
- Number of graphs: 203,088
- Directed: No.
- Node features: No.
- Edge features: No.
- Graph labels: Yes. Binary-labeled.
- Temporal: No.
|
Min |
Max |
Nodes |
11 |
97 |
Density |
0.021 |
0.382 |
Diameter |
2 |
27 |
Possible Tasks
Citing
>@misc{karateclub2020,
title={An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs},
author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
year={2020},
eprint={2003.04819},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Deezer Ego Nets
Description
Link
Properties
- Number of graphs: 9,629
- Directed: No.
- Node features: No.
- Edge features: No.
- Graph labels: Yes. Binary-labeled.
- Temporal: No.
|
Min |
Max |
Nodes |
11 |
363 |
Density |
0.015 |
0.909 |
Diameter |
2 |
2 |
Possible Tasks
Citing
>@misc{karateclub2020,
title={An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs},
author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
year={2020},
eprint={2003.04819},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
GitHub Social Network
Description
Link
Properties
- Directed: No.
- Node features: Yes.
- Edge features: No.
- Node labels: Yes. Binary-labeled.
- Temporal: No.
|
GitHub |
Nodes |
37,700 |
Edges |
289,003 |
Density |
0.001 |
Transitvity |
0.013 |
Possible Tasks
- Binary node classification
- Link prediction
- Community detection
- Network visualization
Citing
>@misc{rozemberczki2019multiscale,
title = {Multi-scale Attributed Node Embedding},
author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},
year = {2019},
eprint = {1909.13021},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
Deezer Social Networks
Description
Links
Properties
- Directed: No.
- Node features: No.
- Edge features: No.
- Node labels: Yes. Multi-labeled.
- Temporal: No.
|
RO |
HR |
HU |
Nodes |
41,773 |
54,573 |
47,538 |
Edges |
125,826 |
498,202 |
222,887 |
Density |
0.0001 |
0.0004 |
0.0002 |
Transitvity |
0.0752 |
0.1146 |
0.0929 |
Possible Tasks
- Node classification
- Link prediction
- Community detection
- Network visualization
Citing
If you find these datasets useful in your research, please cite the following paper:
>@inproceedings{rozemberczki2019gemsec,
title={GEMSEC: Graph Embedding with Self Clustering},
author={Rozemberczki, Benedek and Davies, Ryan and Sarkar, Rik and Sutton, Charles},
booktitle={Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2019},
pages={65-72},
year={2019},
organization={ACM}
}
Facebook Page-Page Networks
Description
Links
Properties
- Directed: No.
- Node features: No.
- Edge features: No.
- Node labels: No.
- Temporal: No.
|
Nodes |
Edges |
Density |
Transitvity |
Politicians |
5,908 |
41,729 |
0.0024 |
0.3011 |
Companies |
14,113 |
52,310 |
0.0005 |
0.1532 |
Athletes |
13,866 |
86,858 |
0.0009 |
0.1292 |
News Sites |
27,917 |
206,259 |
0.0005 |
0.1140 |
Public Figures |
11,565 |
67,114 |
0.0010 |
0.1666 |
Artists |
50,515 |
819,306 |
0.0006 |
0.1140 |
Government |
7,057 |
89,455 |
0.0036 |
0.2238 |
TV Shows |
3,892 |
17,262 |
0.0023 |
0.5906 |
Possible Tasks
- Link prediction
- Community detection
- Network visualization
Citing
If you find these datasets useful in your research, please cite the following paper:
>@inproceedings{rozemberczki2019gemsec,
title={GEMSEC: Graph Embedding with Self Clustering},
author={Rozemberczki, Benedek and Davies, Ryan and Sarkar, Rik and Sutton, Charles},
booktitle={Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2019},
pages={65-72},
year={2019},
organization={ACM}
}
Wikipedia Article Networks
Description
Links
Properties
- Directed: No.
- Node features: Yes.
- Edge features: No.
- Node labels: Yes. Continuous target.
- Temporal: No.
|
Chameleon |
Crocodile |
Squirrel |
Nodes |
2,277 |
11,631 |
5,201 |
Edges |
31,421 |
170,918 |
198,493 |
Density |
0.012 |
0.003 |
0.015 |
Transitvity |
0.314 |
0.026 |
0.348 |
Possible Tasks
- Regression
- Link prediction
- Community detection
- Network visualization
Citing
If you find these datasets useful in your research, please cite the following paper:
>@misc{rozemberczki2019multiscale,
title = {Multi-scale Attributed Node Embedding},
author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},
year = {2019},
eprint = {1909.13021},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
Twitch Social Networks
Description
Links
Properties
- Directed: No.
- Node features: Yes.
- Edge features: No.
- Node labels: Yes. Binary-labeled.
- Temporal: No.
|
DE |
EN |
ES |
FR |
PT |
RU |
TW |
Nodes |
9,498 |
7,126 |
4,648 |
6,549 |
1,912 |
4,385 |
2,772 |
Edges |
153,138 |
35,324 |
59,382 |
112,666 |
31,299 |
37,304 |
63,462 |
Density |
0.003 |
0.002 |
0.006 |
0.005 |
0.017 |
0.004 |
0.017 |
Transitvity |
0.047 |
0.042 |
0.084 |
0.054 |
0.131 |
0.049 |
0.120 |
Possible tasks
- Binary node classification
- Link prediction
- Community detection
- Network visualization
Citing
>@misc{rozemberczki2019multiscale,
title = {Multi-scale Attributed Node Embedding},
author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},
year = {2019},
eprint = {1909.13021},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
Facebook Large Page-Page Network
Description
Links
Properties
- Directed: No.
- Node features: Yes.
- Edge features: No.
- Node labels: Yes. Binary-labeled.
- Temporal: No.
|
Facebook |
Nodes |
22,470 |
Edges |
171,002 |
Density |
0.001 |
Transitvity |
0.232 |
Possible tasks
- Multi-class node classification
- Link prediction
- Community detection
- Network visualization
Citing
>@misc{rozemberczki2019multiscale,
title = {Multi-scale Attributed Node Embedding},
author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},
year = {2019},
eprint = {1909.13021},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}