uber-research/go-explore
Code for Go-Explore: a New Approach for Hard-Exploration Problems
repo name | uber-research/go-explore |
repo link | https://github.com/uber-research/go-explore |
homepage | https://arxiv.org/abs/1901.10995 |
language | Python |
size (curr.) | 42 kB |
stars (curr.) | 215 |
created | 2019-01-30 |
license | Other |
Go-Explore
Paper located at: arxiv.org/abs/1901.10995
Requirements
Tested with Python 3.6. requirements.txt
gives the exact libraries and versions used on a test machine
able to run all phases. Unless otherwise specified, libraries can be installed using pip install <library_name>
.
Required libraries for Phase 1:
- matplotlib
- loky==2.3.1
- dataclasses
- tqdm
- gym
- opencv-python
These libraries are sufficient to run Go-Explore Phase 1 with custom environments, which you may model after goexplore_py/pitfall_env.py
and goexplore_py/montezuma_env.py
.
The ALE/atari-py is not part of Go-Explore. If you are interested in running Go-Explore on Atari environments (for example to reproduce our experiments), you may install gym[atari]
instead of just gym
. Doing so will install atari-py. atari-py is licensed under GPLv2.
Additional libraries for demo generation:
- ffmpeg (non-Python library, install using package manager)
- imageio
- fire
Additionally, to run gen_demo
, you will need to clone openai/atari-demo and
put a copy or link of the subfolder atari_demo
at gen_demo/atari_demo
in this codebase.
E.g. you could run:
git clone https://github.com/openai/atari-demo
cp -r atari-demo/atari_demo gen_demo
Additional libraries for Phase 2:
- openmpi (non-Python library, install for source or using package manager)
- tensorflow-gpu
- pandas
- horovod (install using
HOROVOD_WITH_TENSORFLOW=1 pip install horovod
- baselines (ignore mujoco-related errors)
Additionally, to run Phase 2, you will need to clone uber-research/atari-reset (note: this is an improved fork of the original project, which you can find at openai/atari-reset) and
put it, copy it or link to it as atari_reset
in the root folder for this project.
E.g. you could run:
git clone https://github.com/uber-research/atari-reset atari_reset
Usage
Running Phase 1 of Go-Explore can be done using the phase1.sh
script. To see the arguments
for Phase 1, run:
./phase1.sh --help
The default arguments for Phase 1 will run a domain knowledge version of Go-Explore Phase 1 on
Montezuma’s Revenge. However, the default parameters do not correspond to any experiment actually
presented in the paper. To reproduce Phase 1 experiments from the paper, run one of
./phase1_montezuma_domain.sh
, ./phase1_montezuma_no_domain.sh
or ./phase1_pitfall_domain.sh
.
Phase 1 produces a folder called results
, and subfolders for each experiment, of the form
0000_fb6be589a3dc44c1b561336e04c6b4cb
, where the first element is an automatically increasing
experiment id and the second element is a random string that helps prevent race condition issues if
two experiments are started at the same time and assigned the same id.
To generate demonstrations, call ./gen_demo.sh <phase1_result_folder> <destination> --game <game>
. Where <game>
is one of “montezuma” (default) or “pitfall”. The destination
will be a directory containing a .demo
file and a .mp4
file corresponding to the video of the
demonstration.
To robustify (run Phase 2), put a set of .demo
files from different runs of Phase 1 into a folder
(we used 10 for Montezuma and 4 for Pitfall, a single demonstration can also work, but is less
likely to succeed). Then run ./phase2.sh <game> <demo_folder> <results_folder>
where the game is
one of MontezumaRevenge
or Pitfall
. This should work with mpirun
if you are using distributed
training (we used 16 GPUs). The indicator of success for Phase 2 is when one of the
max_starting_point
displayed in the log has reached a value near 0 (values less than around 80 are
typically good). You may then test the performance of your trained neural network using
./phase2_test.sh <game> <neural_net> <test_results_folder>
where <neural_net> is one of the files produced by Phase 2 and printed in the log as Saving to ...
.
This will produce .json
files for each possible number of no-ops (from 0 to 30) with scores, levels
and exact action sequences produced by the test runs.