google-research/noisystudent
Code for NoisyStudent on SVHN. https://arxiv.org/abs/1911.04252
repo name | google-research/noisystudent |
repo link | https://github.com/google-research/noisystudent |
homepage | |
language | Python |
size (curr.) | 380 kB |
stars (curr.) | 120 |
created | 2020-02-14 |
license | Apache License 2.0 |
NoisyStudent
Overview
NoisyStudent is a semi-supervised learning method which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. NoisyStudent is based on the self-training framework and trained with 4 simple steps:
- Train a classifier on labeled data (teacher).
- Infer labels on a much larger unlabeled dataset.
- Train a larger classifier on the combined set, adding noise (noisy student).
- Go to step 2, with student as teacher
For ImageNet checkpoints trained by NoisyStudent, please refer to the EfficientNet github.
SVHN Experiments
Our ImageNet experiments requires using JFT-300M which is not publicly available. We will release the full code for ImageNet trained on a public dataset as unlabeled data in a few weeks.
Here we show an implementation of NoisyStudent on SVHN, which boosts the performance of a supervised model from 97.9% accuracy to 98.6% accuracy.
# Download and preprocess SVHN. Download the teacher model trained on labeled data with accuracy 97.9.
bash local_scripts/prepro.sh
# Training & Eval (expected accuracy: 98.6 +- 0.1)
bash local_scripts/run_svhn.sh
You can also use the colab script noisystudent_svhn.ipynb to try the method on free Colab GPUs.
Bibtex
@article{xie2019self,
title={Self-training with Noisy Student improves ImageNet classification},
author={Xie, Qizhe and Hovy, Eduard and Luong, Minh-Thang and Le, Quoc V},
journal={arXiv preprint arXiv:1911.04252},
year={2019}
}
This is not an officially supported Google product.