intel-isl/DPT

Dense Prediction Transformers


repo name	intel-isl/DPT
repo link	https://github.com/intel-isl/DPT
homepage
language	Python
size (curr.)	450 kB
stars (curr.)	357
created	2021-03-22
license	MIT License

Vision Transformers for Dense Prediction

This repository contains code and models for our paper:

Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

Changelog

[March 2021] Initial release of inference code and models

Setup

Download the model weights and place them in the weights folder:

Monodepth:

Segmentation:

Set up dependencies:
```
conda install pytorch torchvision opencv 
pip install timm
```
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5

Usage

Place one or more input images in the folder input.
Run a monocular depth estimation model:
```
python run_monodepth.py
```
Or run a semantic segmentation model:
```
python run_segmentation.py
```
The results are written to the folder output_monodepth and output_segmentation, respectively.

Use the flag -t to switch between different models. Possible options are dpt_hybrid (default) and dpt_large.

Citation

Please cite our papers if you use this code or any of the models.

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ArXiv preprint},
	year      = {2021},
}

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}