Starting from the publicly available VGG-ish feature extraction combined with a single layer GRU classifier (which consists of 70M parameters), properly using the student-teacher approach we compress are able to compress the network to 20K parameter reducing the classification accuracy only from 75% to 70% on UrbanSound8k.
CloudASR is a cloud-based speech recognition service available in 5 languages. The tool offers state-of-the-art performance on standard benchmarks. The recognition engine is highly customizable and easy to deploy, offering a very effective solution when speech recognition has to be applied in specific domains.
CAV 3D Dataset
he CAV3D (Co-located Audio-Visual streams with 3D tracks) dataset was collected for 3D speaker tracking with a sensing platform consisting of a monocular colour camera co-located with an 8-element circular microphone array.
The related AV3T code is also available here: AV3T github
Several datasets where collected during the DIRHA project, whose goal was to develop a speech enabled automation system for smart home appliances (https://dirha.fbk.eu). Some of this datasets have been made publicly available. Details are available below.