CloudASR is a software platform and a public ASR webservice.

Its three strong features are:

  • scalability
  • customizability
  • easy deployment

Platform’s API supports both batch and incremental speech recognition. The batch version is compatible with Google Speech API. New ASR engines, also multilingual, will be added onto the platform and work simultaneously.

Available in 5 languages:

  • Italian   
  • English 
  • French   
  • Spanish
  • German

Technical info

The ASR engine is based on the Kaldi toolkit (https://github.com/kaldi-asr/kaldi)

The acoustic models are trained on data coming from CommonVoice and Euronews transcriptions, 

using a standard chain recipe based on lattice-free maximum mutual information (LF-MMI) optimisation criterion.

In order to be more robust against possible variations in the speaking rate of the speakers, the usual data augmentation technique for the models has been expanded, generating additional time-stretched versions of the original signals.

The language models are derived from Internet news, collected from about 2000 to 2020, and from a Wikipedia dump.

 Benchmarks

We evaluate our web-based speech recognition system on two benchmarks:

  • a standard dataset for general purpose speech recognition: Librispeech (test-clean)
  • an intenal dataset targeting a particular under-represented domani (dentist) for the SmarTerp project
WER LibriSpeech SmarTerp
English 9.25% 23.86%
Spanish 22.60%
Italian 15.14%

 

2 Modalities: real time and batch

Real time recognition via web browser:

Offilne file processing via web browser interface or via command-line:

Contacts
falavi at fbk dot eu