February 13, 2020

247 words 2 mins read

google-research/noisystudent

google-research/noisystudent

Code for NoisyStudent on SVHN. https://arxiv.org/abs/1911.04252

repo name google-research/noisystudent
repo link https://github.com/google-research/noisystudent
homepage
language Python
size (curr.) 380 kB
stars (curr.) 120
created 2020-02-14
license Apache License 2.0

NoisyStudent

Overview

NoisyStudent is a semi-supervised learning method which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. NoisyStudent is based on the self-training framework and trained with 4 simple steps:

  1. Train a classifier on labeled data (teacher).
  2. Infer labels on a much larger unlabeled dataset.
  3. Train a larger classifier on the combined set, adding noise (noisy student).
  4. Go to step 2, with student as teacher

For ImageNet checkpoints trained by NoisyStudent, please refer to the EfficientNet github.

SVHN Experiments

Our ImageNet experiments requires using JFT-300M which is not publicly available. We will release the full code for ImageNet trained on a public dataset as unlabeled data in a few weeks.

Here we show an implementation of NoisyStudent on SVHN, which boosts the performance of a supervised model from 97.9% accuracy to 98.6% accuracy.

# Download and preprocess SVHN. Download the teacher model trained on labeled data with accuracy 97.9.
bash local_scripts/prepro.sh

# Training & Eval (expected accuracy: 98.6 +- 0.1)
bash local_scripts/run_svhn.sh

You can also use the colab script noisystudent_svhn.ipynb to try the method on free Colab GPUs.

Bibtex

@article{xie2019self,
  title={Self-training with Noisy Student improves ImageNet classification},
  author={Xie, Qizhe and Hovy, Eduard and Luong, Minh-Thang and Le, Quoc V},
  journal={arXiv preprint arXiv:1911.04252},
  year={2019}
}

This is not an officially supported Google product.

comments powered by Disqus