iceychris/LibreASR
:speech_balloon: An On-Premises, Streaming Speech Recognition System
repo name | iceychris/LibreASR |
repo link | https://github.com/iceychris/LibreASR |
homepage | https://news.ycombinator.com/item?id=25099847 |
language | Python |
size (curr.) | 2806 kB |
stars (curr.) | 565 |
created | 2020-11-04 |
license | MIT License |
Example Apps
Quickstart
docker run -it -p 8080:8080 iceychris/libreasr:latest
The output looks like this:
make sde &
make sen &
make b
make[1]: Entering directory '/workspace'
python3 -u api-server.py de
make[1]: Entering directory '/workspace'
python3 -u api-server.py en
make[1]: Entering directory '/workspace'
python3 -u api-bridge.py
[api-bridge] running on :8080
LM: loaded.
LM: loaded.
Model and Pipeline set up.
[api-server] gRPC server running on [::]:50051 language en
Model and Pipeline set up.
[api-server] gRPC server running on [::]:50052 language de
If it doesn’t look like that this issue might help.
Head your browser to http://localhost:8080/
Features
-
RNN-T network
-
Dynamic Quantization
-
english
,german
-
french
,spanish
,italian
,multilingual
-
Tuned language model fusion
Performance
Model | Dataset | Network | Params | CER (dev) | WER (dev) |
---|---|---|---|---|---|
english |
1400h | 6-2-1024 |
70M | 18.9 | 23.8 |
german |
800h | 6-2-1024 |
70M | 23.2 | 37.6 |
While this is clearly not SotA, training the models for longer
and on multiple GPUs (instead of a single 2080 ti
) would yield better results.
See releases for pretrained models.
Training
RNN-T Model
-
get some audio data with transcriptions (e.g. librispeech, common voice, …)
-
edit create-asr-dataset.py if you use a custom dataset
-
process each of your datasets using create-asr-dataset.py, e.g.:
python3 create-asr-dataset.py /data/common-voice-english common-voice --lang en --workers 4
This results in multiple asr-dataset.csv
files, which will be used for training.
-
edit the configuration testing.yaml to point to your data, choose transforms and tweak other settings
-
adjust and run libreasr.ipynb to start training
-
watch the training progress in tensorboard
-
the model with the best validation loss will get saved to
models/model.pth
, the model with the best WER ends up inmodels/best_wer.pth
Language Model
See this colab notebook or use this notebook.
Contributing
Feel free to open an issue, create a pull request or join the Discord.
You may also contribute by training a large model for longer.