February 19, 2020

570 words 3 mins read



Official implementation of CVPR2020 paper “VIBE: Video Inference for Human Body Pose and Shape Estimation”

repo name mkocabas/VIBE
repo link https://github.com/mkocabas/VIBE
homepage https://arxiv.org/abs/1912.05656
language Python
size (curr.) 86 kB
stars (curr.) 504
created 2019-12-12
license Other

VIBE: Video Inference for Human Body Pose and Shape Estimation [CVPR-2020]

report Open In Colab PWC

Watch this video for more qualitative results.

Sources: left video - https://www.youtube.com/watch?v=qlPRDVqYO74, right video - https://www.youtube.com/watch?v=Opry3F6aB1I

VIBE: Video Inference for Human Body Pose and Shape Estimation,
Muhammed Kocabas, Nikos Athanasiou, Michael J. Black,
IEEE Computer Vision and Pattern Recognition, 2020


Video Inference for Body Pose and Shape Estimation (VIBE) is a video pose and shape estimation method. It predicts the parameters of SMPL body model for each frame of an input video. Pleaser refer to our arXiv report for further details.

This implementation:

  • is the demo code for VIBE implemented purely in PyTorch,
  • can work on arbitrary videos with multi person,
  • supports both CPU and GPU inference (though GPU is way faster),
  • is fast, up-to 30 FPS on a RTX2080Ti (see this table),
  • achieves SOTA results on 3DPW and MPI-INF-3DHP datasets,
  • includes Temporal SMPLify implementation.

Getting Started

VIBE has been implemented and tested on Ubuntu 18.04 with python >= 3.7. It supports both GPU and CPU inference. If you don’t have a suitable device, try running our Colab demo.

Clone the repo:

git clone https://github.com/mkocabas/VIBE.git

Install the requirements using pip or conda:

# pip
bash install_pip.sh

# conda
bash install_conda.sh

Running the Demo

We have prepared a nice demo code to run VIBE on arbitrary videos. First, you need download the required data(i.e our trained model and SMPL model parameters). To do this you can just run:

bash prepare_data.sh

Then, running the demo is as simple as this:

# Run on a local video
python demo.py --vid_file sample_video.mp4 --output_folder output/ --display

# Run on a YouTube video
python demo.py --vid_file https://www.youtube.com/watch?v=wPZP8Bwxplo --output_folder output/ --display

Refer to doc/demo.md for more details about the demo code.

Sample demo output with the --sideview flag:

Google Colab

If you do not have a suitable environment to run this projects then you could give Google Colab a try. It allows you to run the project in the cloud, free of charge. You may try our Colab demo using the notebook we prepare: Open In Colab


Here we compare VIBE with recent state-of-the-art methods on 3D pose estimation datasets. Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.

Models 3DPW ↓ MPI-INF-3DHP ↓ H36M ↓
SPIN 59.2 67.5 41.1
Temporal HMR 76.7 89.8 56.8
VIBE 56.5 63.4 41.5
VIBE + 3DPW 51.9 64.6 41.4


  title={VIBE: Video Inference for Human Body Pose and Shape Estimation},
  author={Kocabas, Muhammed and Athanasiou, Nikos and Black, Michael J.},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}


This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.


We indicate if a function or script is borrowed externally inside each file. Here are some great resources we benefit:

  • Pretrained HMR and some functions are borrowed from SPIN.
  • SMPL models and layer is from SMPL-X model.
  • Some functions are borrowed from Temporal HMR.
  • Some functions are borrowed from HMR-pytorch.
  • Some functions are borrowed from Kornia.
  • Pose tracker is from STAF.
comments powered by Disqus