The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.
CPA - Compositional Perturbation Autoencoder
CPA is a collaborative research project from
Facebook AI Research (FAIR) and the computational biology group of Prof. Fabian
Theis (https://github.com/theislab) from Helmholtz Zentrum München.
What is CPA?
CPA is a deep generative framework to learn effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug response across different cell types, doses and drug combinations. CPA allows:
- Out-of-distribution predicitons of unseen drug combinations at various doses and among different cell types.
- Learn interpretable drug and cell type latent spaces.
- Estimate dose response curve for each perturbation and their combinations.
- Access the uncertainty of the estimations of the model.
The repository is centered around the
compert.traincontains scripts to train the model.
compert.apicontains user friendly scripts to interact with the model via scanpy.
compert.plottingcontains scripts to plotting functions.
compert.modelcontains modules of compert model.
compert.datacontains data loader, which transforms anndata structure to a class compatible with compert model.
compert.collect_resultscontains script for automatic model selection from sweeps.
Additional files and folders:
datasetscontains both versions of the data: raw and pre-processed.
preprocessingcontains notebooks to reproduce the datasets pre-processing from raw data.
notebookscontains notebooks to reproduce plots from the paper and detailed analysis of each of the datasets.
pretrained_modelscontains best models selected after the sweeps. These models were used for the analysis and figures in the paper.
scriptscontains bash files for automatic running of the model.
As a first step, download the contents of
pretrained_models/ from this tarball.
To learn how to use this repository, check
./notebooks/demo.ipynb, and the following scripts:
./scripts/run_one_epoch.shruns one epoch for all datasets.
./scripts/run_sweeps.shruns all sweeps.
./scripts/run_collect_results.sh, given a sweep, runs model-selection and prints results.
Examples and Reproducibility
All the examples and the reproducbility notebooks for the plots in the paper could be found in the
Training a model
There are two ways to train a compert model:
- Using the command line, e.g.:
python -m compert.train --dataset_path datasets/GSM_new.h5ad --save_dir /tmp --max_epochs 1 --doser_type sigm
- From jupyter notebook: example in
python ./scripts/run_one_epoch.sh to perfrom automatic testing for one epoch of all the datasets used in the study.
Currently you can access the documentation via
help function in IPython. For example:
from compert.api import ComPertAPI
from compert.plotting import CompertVisuals
A separate page with the documentation is coming soon.
Support and contribute
If you have a question or noticed a problem, you can post an
Link to the paper
This source code is released under the MIT license, included here.