SED@EDGE
Starting from the publicly available VGG-ish feature extraction combined with a single layer GRU classifier (which consists of 70M parameters), properly using the student-teacher approach we compress are able to compress the network to 20K parameter reducing the classification accuracy only from 75% to 70% on UrbanSound8k.
CLOUD ASR
CloudASR is a cloud-based speech recognition service available in 5 languages. The tool offers state-of-the-art performance on standard benchmarks. The recognition engine is highly customizable and easy to deploy, offering a very effective solution when speech recognition has to be applied in specific domains.
CAV 3D Dataset
he CAV3D (Co-located Audio-Visual streams with 3D tracks) dataset was collected for 3D speaker tracking with a sensing platform consisting of a monocular colour camera co-located with an 8-element circular microphone array.
The related AV3T code is also available here: AV3T github
DIRHA DATA
Several datasets where collected during the DIRHA project, whose goal was to develop a speech enabled automation system for smart home appliances (https://dirha.fbk.eu). Some of this datasets have been made publicly available. Details are available below.
READING ASSESSMENT
Speech recognition technology applied to eductation to evalaute the reading capacity of kids at primary schools