Linux/Unix
Product Overview
This container makes it possible to quickly deploy a pretrained, english-language transcription model.
You can send any sample rate of WAV files, but they will be converted to 32kHz. Metadata is outputted as JSON.
This container uses Kaldi, an open-source speech recognition toolkit written in C++ for speech recognition and signal processing, freely available under the Apache License v2.0.
For training data, it uses Librispeech. a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.