CloudASR is a software platform and a public ASR webservice.
Its three strong features are:
- easy deployment
Platform’s API supports both batch and incremental speech recognition. The batch version is compatible with Google Speech API. New ASR engines, also multilingual, will be added onto the platform and work simultaneously.
Available in 5 languages:
The ASR engine is based on the Kaldi toolkit (https://github.com/kaldi-asr/kaldi)
The acoustic models are trained on data coming from CommonVoice and Euronews transcriptions,
using a standard chain recipe based on lattice-free maximum mutual information (LF-MMI) optimisation criterion.
In order to be more robust against possible variations in the speaking rate of the speakers, the usual data augmentation technique for the models has been expanded, generating additional time-stretched versions of the original signals.
The language models are derived from Internet news, collected from about 2000 to 2020, and from a Wikipedia dump.
We evaluate our web-based speech recognition system on two benchmarks:
- a standard dataset for general purpose speech recognition: Librispeech (test-clean)
- an intenal dataset targeting a particular under-represented domani (dentist) for the SmarTerp project
2 Modalities: real time and batch
Real time recognition via web browser:
Offilne file processing via web browser interface or via command-line:
falavi at fbk dot eu