zcaceres/spec_augment
A Pytorch implementation of GoogleBrain’s SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
repo name | zcaceres/spec_augment |
repo link | https://github.com/zcaceres/spec_augment |
homepage | https://arxiv.org/abs/1904.08779 |
language | Jupyter Notebook |
size (curr.) | 28559 kB |
stars (curr.) | 233 |
created | 2019-04-23 |
license | MIT License |
SpecAugment with Pytorch
A Pytorch Implementation of GoogleBrain’s SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
SpecAugment is a state of the art data augmentation approach for speech recognition.
The paper’s authors did not publish code that I could find and their implementation was in TensorFlow. We implemented all three SpecAugment transforms using Pytorch, torchaudio, and fastai / fastai-audio.
To use:
- Run
install.sh
(I recommend using a uniqueconda
env for the project)
After the install script runs, you should have a torchaudio
folder in your project folder.
- Check out SpecAugment.ipynb (a Jupyter notebook) for the functions.
Augmentations
Time Warp
Time Mask
Frequency Mask
Combined:
Note on Time Warp
The Time Warp augmentation relies on Tensorflow-specific functionality not supported in Pytorch. We implemented supporting functions for this augmentation in SparseImageWarp.ipynb
. You do not need to look at this notebook to use the augmentations. But the Time Warp augmentation depends on code exposed in the SparseImageWarp
notebook.
Let’s be friends!