You are here
Brutti Home Page
In this page you can find more details on my current and former research activities on audio signal processing.
- Localization of acoustic sources in multi-microphone environments:
Compact microphone arrays and distributed microphone networks.
Sound source localization and tracking: acoustic maps as the Global Coherence Field (GCF), multi-source scenarios and generative bayesian approaches.
Estimation of source orientation based on Oriented Global Coherence Field (OGCF).
BSS based tracking of multiple sources.
Environment aware processing: position and orientation estimation and characterization of the emission patter.
Check out some demos from our youtube channel:
- Speaker identification and verification:
The speaker recognition problem has been addressed focusing in particular on reverberant distant speech, attacking reverberation through model adaptation and by combining multiple distributed microphone. Currently, I am working on speaker diarization, also considering telephone speech, implementing the most advanced state-of-the-art approaches based on deep learning (i.e. speaker embedding, speaker2vec).
- Audio-Video people tracking:
This activity was conducted in cooperation with the TEV research unit. To goal was to track multiple subjects in an environment equipped with multiple distribued microphones and cameras.
Audio and Video information is combined at likelihood level in a generative bayesian framework to track the position and the head pose of multiple targets. This way we can substantially improve the robustness of single modalities.
A couple of video-clips are available at our youtube channel:
Currently, in collaboration with Queen Mary University London, we are investigating similar paradigms to achieve 3D localization of a person using 1 single camera co-located with a compact microphone array. NEW DATASET WITH CO-LOCATED SENSORS AVAILABLE HERE.
- Audio-Video Person Identification:
Recently, in collaboration with Queen Mary University London I have been investigating the person recognition problem with multi-modal (audio and video) solutions for person-centered scenarios (i.e. using wearable devices). The main focus was on unsupervised on-line adaptation of the target models.
- Speech and audio digital signal processing:
covering a large variety of topics, in particular: activity detection, speech enhancement for ASR, event classification
- DIRHA: development of the multi-room, multi-microphone front-end. demonstrator
- SCENIC: environment aware localization of multiple sources and estimation of the source emission pattern
- DICIT: source localization
- Visiting researcher at Queen Mary University London during summer 2015
- PhD committee at Vrije Universiteit Brussel
- PhD committee at Tampere University of Technology
- PhD committee at University of Alcala'