Starting from the publicly available VGG-ish feature extraction combined with a single layer GRU classifier (which consists of 70M parameters), properly using the student-teacher approach we compress are able to compress the network to 20K parameter reducing the classification accuracy only from 75% to 70% on UrbanSound8k.


CloudASR is a cloud-based speech recognition service available in 5 languages. The tool offers state-of-the-art performance on standard benchmarks. The recognition engine is highly customizable and easy to deploy, offering a very effective solution when speech recognition has to be applied in specific domains.

CAV 3D Dataset

he CAV3D (Co-located Audio-Visual streams with 3D tracks) dataset was collected for 3D speaker tracking with a sensing platform consisting of a monocular colour camera co-located with an 8-element circular microphone array. 

The related AV3T code is also available here: AV3T github


Several datasets where collected during the DIRHA project, whose goal was to develop a speech enabled automation system for smart home appliances (  Some of this datasets have been made publicly available. Details are available below.


Speech recognition technology applied to eductation to evalaute the reading capacity of kids at primary schools