facebookresearch/ParlAI
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
repo name | facebookresearch/ParlAI |
repo link | https://github.com/facebookresearch/ParlAI |
homepage | https://parl.ai |
language | Python |
size (curr.) | 35979 kB |
stars (curr.) | 5333 |
created | 2017-04-24 |
license | MIT License |
ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat to VQA (Visual Question Answering).
Its goal is to provide researchers:
- 80+ popular datasets available all in one place, with the same API, among them PersonaChat, DailyDialog, Wizard of Wikipedia, Empathetic Dialogues, SQuAD, MS MARCO, QuAC, HotpotQA, QACNN & QADailyMail, CBT, BookTest, bAbI Dialogue tasks, Ubuntu Dialogue, OpenSubtitles, Image Chat, VQA, VisDial and CLEVR. See the complete list here.
- a wide set of reference models – from retrieval baselines to Transformers.
- a large zoo of pretrained models ready to use off-the-shelf
- seamless integration of Amazon Mechanical Turk for data collection and human evaluation
- integration with Facebook Messenger to connect agents with humans in a chat interface
- a large range of helpers to create your own agents and train on several tasks with multitasking
- multimodality, some tasks use text and images
ParlAI is described in the following paper: “ParlAI: A Dialog Research Software Platform", arXiv:1705.06476 or see these more up-to-date slides.
See the news page for the latest additions & updates, and the website http://parl.ai for further docs.
Installing ParlAI
ParlAI currently requires Python3 and Pytorch 1.1 or
newer. Dependencies of the core modules are listed in requirement.txt
. Some
models included (in parlai/agents
) have additional requirements.
Run the following commands to clone the repository and install ParlAI:
git clone https://github.com/facebookresearch/ParlAI.git ~/ParlAI
cd ~/ParlAI; python setup.py develop
This will link the cloned directory to your site-packages.
This is the recommended installation procedure, as it provides ready access to the examples and allows you to modify anything you might need. This is especially useful if you if you want to submit another task to the repository.
All needed data will be downloaded to ~/ParlAI/data
, and any non-data files if requested will be downloaded to ~/ParlAI/downloads
. If you need to clear out the space used by these files, you can safely delete these directories and any files needed will be downloaded again.
Documentation
- Quick Start
- Basics: world, agents, teachers, action and observations
- List of available tasks/datasets
- Creating a dataset/task
- List of available agents
- Creating a new agent
- Model zoo (pretrained models)
- Plug into MTurk
- Plug into Facebook Messenger
Examples
A large set of scripts can be found in parlai/scripts
. Here are a few of them.
Note: If any of these examples fail, check the requirements section to see if you have missed something.
Display 10 random examples from the SQuAD task
python -m parlai.scripts.display_data -t squad
Evaluate an IR baseline model on the validation set of the Personachat task:
python -m parlai.scripts.eval_model -m ir_baseline -t personachat -dt valid
Train a single layer transformer on PersonaChat (requires pytorch and torchtext). Detail: embedding size 300, 4 attention heads, 2 epochs using batchsize 64, word vectors are initialized with fasttext and the other elements of the batch are used as negative during training.
python -m parlai.scripts.train_model -t personachat -m transformer/ranker -mf /tmp/model_tr6 --n-layers 1 --embedding-size 300 --ffn-size 600 --n-heads 4 --num-epochs 2 -veps 0.25 -bs 64 -lr 0.001 --dropout 0.1 --embedding-type fasttext_cc --candidates batch
Code Organization
The code is set up into several main directories:
- core: contains the primary code for the framework
- agents: contains agents which can interact with the different tasks (e.g. machine learning models)
- scripts: contains a number of useful scripts, like training, evaluating, interactive chatting, …
- tasks: contains code for the different tasks available from within ParlAI
- mturk: contains code for setting up Mechanical Turk, as well as sample MTurk tasks
- messenger: contains code for interfacing with Facebook Messenger
- zoo: contains code to directly download and use pretrained models from our model zoo
Support
If you have any questions, bug reports or feature requests, please don’t hesitate to post on our Github Issues page.
The Team
ParlAI is currently maintained by Emily Dinan, Dexter Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Stephen Roller, Kurt Shuster, Eric Michael Smith, Jack Urbanek, Jason Weston, and Mary Williamson.
Former major contributors and maintainers include Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Antoine Bordes, Devi Parikh, Dhruv Batra, Filipe de Avila Belbute Peres and Chao Pan.
Citation
Please cite the arXiv paper if you use ParlAI in your work:
@article{miller2017parlai,
title={ParlAI: A Dialog Research Software Platform},
author={{Miller}, A.~H. and {Feng}, W. and {Fisch}, A. and {Lu}, J. and {Batra}, D. and {Bordes}, A. and {Parikh}, D. and {Weston}, J.},
journal={arXiv preprint arXiv:{1705.06476}},
year={2017}
}
License
ParlAI is MIT licensed. See the LICENSE file for details.