MARVEL delivers a disruptive Edge-to-Fog-to-Cloud ubiquitous computing framework that enables multi-modal perception and intelligence for audio-visual scene recognition, event detection in a smart city environment.

The goal of the project is the development and validation of advanced algorithms, with special attention to Machine Learning based solutions, for the following tasks: voice activity detection, speech enhancement, and speaker diarization. In particular, the application case study is focused on call-center communications, in which the suitable processing of the available single-channel recording should result in the separation of the vocal contributions of the operator and the customer, and in the correct detection of temporal boundaries in which the speakers are active.  The project is partially funded by Fondazione CariTro and involves as partners PerVoice spa and Universita’ Politecnica delle Marche.

Other Projects

  • AUDIO VISUAL SCENE ANALYSIS – This project is part of a wider cooperation between FBK-ICT and the Centre for Intelligent Sensing of Queen Mary University London. In particular, the research activities focus on advanced solutions for audio-visual processing, using heterogeneous devices in challenging and unconstrained environments. Specifically, the 2 institutions have allocated 2 joint PhD grants on these tasks.
  • CITY SENSING – One of the goals of the Smart Cities and Communities is high impact initiative is to help administrators and citizens understand their city and how it evolves. Therefore, the research line is developing of pervasive, collaborative, multi-source, multi-level monitoring of the city. In particular, at SpeechTek we are working on neural solutions for the detection and classification of acoustic events in public open spaces.

Previous Projects

  • SMARTERP – The goal of SmarTerp  is to reduce inefficiencies in interpreting by developing a set of AI-powered tools embedded in a Remote Simultaneous Interpreting system that automates the human task of extracting information in real-time to prevent the mistakes and loss of quality derived from the adoption of remote technologies.
  • EIT CONVERSATIONAL BANKING – The project aims to develop conversational agents interacting, by voice and text, with users asking financial information. Therefore, SpeechTeK will develop ASR systems, in English and Hungarian, capable of dynamically activating proper language models for human-machine interaction.
  • IPRASE – The goal is to automatically estimate the language proficiency in English and German of Italian native-language students in Trentino.
  • PERVOICE-SD – This project investigates on speaker diarization solutions based on DNN embedded representations of speaker identities. PerVoice has partially funded this project with a post-doc grant.
  • Smart Subtitling and Dubbing System (SSDS) – The goal of the project is the development of solutions for automatic translation and dubbing of TV products in different languages. In thigh collaboration with the HLT-MT research unit, SpeechTek efforts will focus on extracting audio features that can improve the quality of both the translation and the dubbing. The project, partially funded by Regione Lazio, is lead by the Italian companies Translated and Sedif.