iceychris/LibreASR

:speech_balloon: An On-Premises, Streaming Speech Recognition System


repo name	iceychris/LibreASR
repo link	https://github.com/iceychris/LibreASR
homepage	https://news.ycombinator.com/item?id=25099847
language	Python
size (curr.)	2806 kB
stars (curr.)	565
created	2020-11-04
license	MIT License

Example Apps

Quickstart

docker run -it -p 8080:8080 iceychris/libreasr:latest

The output looks like this:

make sde &
make sen &
make b
make[1]: Entering directory '/workspace'
python3 -u api-server.py de
make[1]: Entering directory '/workspace'
python3 -u api-server.py en
make[1]: Entering directory '/workspace'
python3 -u api-bridge.py
[api-bridge] running on :8080
LM: loaded.
LM: loaded.
Model and Pipeline set up.
[api-server] gRPC server running on [::]:50051 language en
Model and Pipeline set up.
[api-server] gRPC server running on [::]:50052 language de

If it doesn’t look like that this issue might help.

Head your browser to http://localhost:8080/

Features

RNN-T network
Fused language models
Dynamic Bucketing DataLoader
Dynamic Quantization
english, german
french, spanish, italian, multilingual
Tuned language model fusion

Performance

Model	Dataset	Network	Params	CER (dev)	WER (dev)
`english`	1400h	`6-2-1024`	70M	18.9	23.8
`german`	800h	`6-2-1024`	70M	23.2	37.6

While this is clearly not SotA, training the models for longer and on multiple GPUs (instead of a single 2080 ti) would yield better results.

See releases for pretrained models.

Training

RNN-T Model

get some audio data with transcriptions (e.g. librispeech, common voice, …)
edit create-asr-dataset.py if you use a custom dataset
process each of your datasets using create-asr-dataset.py, e.g.:

  python3 create-asr-dataset.py /data/common-voice-english common-voice --lang en --workers 4

This results in multiple asr-dataset.csv files, which will be used for training.

edit the configuration testing.yaml to point to your data, choose transforms and tweak other settings
adjust and run libreasr.ipynb to start training
watch the training progress in tensorboard
the model with the best validation loss will get saved to models/model.pth, the model with the best WER ends up in models/best_wer.pth