Alessio Brutti

Head of Unit

E-mail: brutti@fbk.eu
Website: https://speechtek.fbk.eu/alessio-brutti-research-interests
Phone: 0461314529

Google Scholar: My citations

Twitter: Profile

LinkedIn: Profile

Short bio

Alessio Brutti is a tenured researcher at FBK. He is currently the head of the SpeechTek research unit of the Centre for Augmented Intelligence. After graduating in Telecommunication engineering at the University of Padova in 2001, in 2003 he joined FBK-irst being part of the SHINE research group. In 2007 he completed his PhD in Computer Science at University of Trento. He has recently collaborated with the University of Bolzano for the master degree in computational linguistics and is collaborating with the Faculty of Science of the University of Padova teaching the master course Speech Technologies.
His research interests focus mainly on the use of machine learning approaches for language technologies and speech processing, including a variety of application domains: speech recognition, speaker verification and recognition, speech enhancement, source localization and sound event detection. In particular, his research addresses the development of efficient, compact and dynamic neural models and their applications in low-resourced and multilingual settings. Recently, he has been investigating multi-modal signal processing for audio-video person tracking and for biometrics. In 2015 he was visiting researcher at Queen Mary University London (Center for Intelligent Sensing).

Main publications

Papers (more exhaustive list here: google scholar)

M Nabih et al, Fed-SpeechLLM: Federated Learning Speech Language Models for Multilingual ASR, Interspeech 2026

S. Fong et al Towards Enabling Multilingual Multitask SpeechLLMs in Data-Scarce Settings, Interspeech 2026

S. Sepanta, From Game-Based Annotation to Representation Probing: Cross-Validated Prosodic Speech and Privacy Implications, Interspeech 2026

S. Sepanta, A. Brutti, "Sensitive Speaker Attribute Leakage in Speech–LLM Pipelines", Odissey 2026

Ali, M.N. et al. "SpeechLLM Meets Federated Learning for End-to-End ASR: English and Italian Case Studies", FLICS2026

Concina et al. "Scalable Expansion of Multilingual Speech LLMs for ASR: a Continual Learning Approach", Speakable LREC- Workshop 2026

Gretter et al. "Phonetic-based Ranking for Improved Pseudo-Labeling in Low-Resource ASR", LREC 2026

Abdul Hannan et al, "Distillation based Layer Dropping (DLD): Effective end-to-end framework for dynamic speech networks", ICASSP 2026

Xinyuan Qian et al, "EgoAdapt: Enhancing Robustness in Egocentric Interactive Speaker Detection Under Missing Modalities", ACM Transactions on Multimedia Computing Communications and Applications, 2026

Ali, M.N., Falavigna, D. & Brutti, A. Federating dynamic models using early-exit architectures for automatic speech recognition on heterogeneous clients. Prog Artif Intell(2025)

Sara Papi et al, "FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian", Clic-it 2025

Marco Gaido, et al "The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence", IWSLT 2025

Abdul Hannan, Alessio Brutti, Daniele Falavigna, “Input Conditioned Layer Dropping in Speech Foundation Models”, MLSP 2025

Maxence Lasbordes, Daniele Falavigna, Alessio Brutti, “Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices”, EUSIPCO 2025

Umberto Cappellazzo, Minsu Kim, Stavros Petridis, Daniele Falavigna, Alessio Brutti, Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach, Interspeech 2025

Nithin Rao Koluguri et al, "Granary: Speech Recognition and Translation Dataset in 25 European Languages", Interspeech 2025

Seraphina Fong, Marco Matassoni, "Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages", Interspeech 2025

Abdul Hannan, Alessio Brutti, Shah Nawaz, Mubashir Noman, "An Effective Training Framework for Light-Weight Automatic Speech Recognition Models", Interspeech 2025

Darline Marx, Marco Matassoni, Alessio Brutti, "Automatic detection of speech sound disorders in German speaking children: augmenting the data with typically developed speech", Interspeech 2025

Umberto Cappellazzo et al, "Large Language Models Are Strong Audio-Visual Speech Recognition Learners", ICASSP 2025

Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti. "EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR", ICASSP 2025

Marco Gaido et al., MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages, EMNLP main 2024

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers, MLSP, 2024

Abdul Hannan, Alessio Brutti, Daniele Falavigna, “LDASR: An Experimental Study on Layer Drop using Conformer-based Architecture”, EUSIPCO 2024

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti, Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters, Interspeech 2024

Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj, Continual Contrastive Spoken Language Understanding. ACL Findings (long), 2024

G. Morrone et al, End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations, Speech Communication, May 2024

M. Matassoni, S. Fong, A. Brutti, Speaker Anonymization: Disentangling Speaker Features from Pre-Trained Speech Embeddings for Voice Conversion, Applied Science, May 2024

George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti, "Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch", IEEE ICASSP 2024 workshop Self-supervision in Audio, Speech and Beyond (SASB)

Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti. "Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures", 3rd Neurips Workshop on Efficient Natural Language and Speech Processing, 2024 (pdf )

L. Serafini et al, "An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings", Computer Speech and Languages, 2023

M. Nabih Ali, D. Falavigna, A. Brutti, "Direct Enhancement of Pre-trained Speech Embeddings for Speech Processing In Noisy Conditions", Computer Speech and Languages, 2023

U. Cappellazzo, D. Falavigna, A. Brutti, AN INVESTIGATION OF THE COMBINATION OF REHEARSAL AND KNOWLEDGE DISTILLATION IN CONTINUAL LEARNING FOR SPOKEN LANGUAGE UNDERSTANDING, Interspeech 2023

U. Cappellazzo, M. Yang, D. Falavigna, A. Brutti, Sequence-Level Knowledge Distillation for ClassIncremental End-to-End Spoken Language Understanding, Interspeech 2023

Xinyuan, Qian et. a. "Speaker Front-back Disambiguity using Multi-channel Speech Signals", Electonics Letters, 2022

G. Morrone, et al. "Low-Latency Speech Separation Guided Diarization for Telephone Conversations", SLT, 2022.

M. Costante, M. Matassoni, A. Brutti, "Using seq2seq voice conversion with pre-trained representations for audio anonymization: experimental insights", MAVD Worskshop (IEEE Smart City Conference), 2022.

A. Brutti, F. Paissan, A. Ancilotto, E. Farella, "Optimizing PhiNet architectures for the detection of urban sounds on low-end devices", EUSIPCO 2022.

L. Zanella et al. "Responsible AI at the edge: towards privacy-preserving smart cities", Ital-IA 2022 Convegno del Laboratorio nazionale CINI-AIIS.

I. Martín-Morató, et. al. "Low-complexity acoustic scene classification in DCASE 2022 Challenge", DCASE 2022.

M. Nabih Ali, D. Falavigna, A. Brutti, “Enhancing Embeddings for Speech Classification in Noisy Conditions”, Interspeech 2022.

V. Rajan, A. Brutti, A. Cavallaro, "IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?", ICASSP 2022

F. Paissan, A. Ancilotto, A. Brutti, E. Farella, "SCALABLE NEURAL ARCHITECTURES FOR END-TO-END ENVIRONMENTAL SOUND CLASSIFICATION", ICASSP 2022

E. T. Mekonnen, A. Brutti, D. Falavigna, "END-TO-END LOW RESOURCE KEYWORD SPOTTING THROUGH CHARACTER RECOGNITION AND BEAM-SEARCH RE-SCORING", ICASSP 2022

G. Morrone, S. Cornell, E. Zovato, A. Brutti, S. Squartini, Conversational Speech Separation: an Evaluation Study for Streaming Applications", 152nd AES convention, 2022

M. Nabih Ali, D. Falavigna, A. Brutti, "Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models", Special Issue Artificial Intelligence Based Audio Signal Processing, Sensors, 2022

X. Qian, A. Brutti, O. Lanz, M. Omologo, A. Cavallaro, "Audio-visual tracking of concurrent speakers", IEEE Transactions on Multimedia, 2021

D. Bajovic et al., "MARVEL: Multimodal extreme scale data analytics for smart cities environments', International Balkan Conference on Communications and Networking, 2021

S. Cornell, A. Brutti, M. Matassoni, S. Squartini, "Learning to Rank Microphones for Distant Speech Recognition", Interspeech 2021

Veronica Juliana Schmalz and Alessio Brutti, "Automatic assessment of English CEFR levels using BERT embeddings", CLIC-it 2021

Ali Mohamed Nabih, Brutti Alessio, Veronica Schmalz, Daniele Falavigna, "A Speech Enhancement Front-End for Intent Classification in Noisy Environments", EUSIPCO 2021

V. Rajan, A. Brutti, A. Cavallaro, "ROBUST LATENT REPRESENTATIONS VIA CROSS-MODAL TRANSLATION AND ALIGNMENT", ICASSP 2021 https://arxiv.org/abs/2011.01631

G. Cerutti, R. Prasad, A. Brutti, Elisabetta Farella, "Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms", IEEE Journal on Selected Topics on Signal Processing, 2020 (arxiv)

E. Fini and A. Brutti. "Supervised online diarization with sample mean loss for multi-domain data", ICASSP 2020 https://arxiv.org/pdf/1911.01266

V. Rajan, A. Brutti, A. Cavallaro, "ConflictNET: End-to-End Learning for Speech-based Conflict Intensity Estimation", IEEE Signal Processing Letters, 2019 ( pdf ).

X. Qian, A. Brutti, O. Lanz, M. Omologo, A. Cavallaro, "Multi-speaker tracking from an audio-visual sensing device", IEEE Transactions on Multimedia, 2019 [ieeexplore][PDF]GitHub Repo

G. Cerutti, R. Prasad, A. Brutti, E. Farella, "Neural Network Distillation on IoT Platforms for Sound Event Detection', Interspeech 2019 [PDF]

O. Lanz, A. Brutti, A. Xompero, X. Qian, M. Omologo, A. Cavallaro, "Accurate target annotation in 3D from multimodal streams", ICASSP, 2019 [PDF]

A. Cavallaro and A. Brutti, "Chapter 5: Audio-visual learning for body-worn cameras", in Edited Book "Multimodal Behaviour Analysis in the Wild", Editors, X. Alameda-Pineda, N. Sebe, E. Ricci, Elsevier, 2018.

P. Pertilä, A. Brutti, P. Svaizer, and M. Omologo, "Multichannel Source Activity Detection, Localization, and Tracking", in Edited Book "Audio Source Separation and Speech Enhancement", Editors, E. Vincent, T. Virtanen, S. Gannot, Wiley, 2018.

X. Qian, A. Xompero, A. Brutti, O. Lanz, M. Omologo, A. Cavallaro, "3D MOUTH TRACKING FROM A COMPACT MICROPHONE ARRAY CO-LOCATED WITH A CAMERA", ICASSP, 2018,[PDF]

A. Brutti, A. Cavallaro, "Unsupervised cross-modal deep-model adaptation for audio-visual re-identification with wearable cameras", ICCV Workshop CVAVM, 2017 [PDF]

M. Matassoni, A. Brutti, D. Falavigna, "Optimizing DNN adaptation for recognition of enhanced speech", Interspeech 2017 [PDF]

X. Qian, A. Brutti, M. Omologo, A. Cavallaro, "3D Audio-visual Speaker Tracking with an Adaptive Particle Filter", ICASSP 2017

A. Brutti, A. Cavallaro, "On-line cross-modal adaptation for audio-visual person identification with wearable cameras", IEEE Transactions on Human-Machine Systems, 2016 [PDF]

P. Pertila, A. Brutti, "Increasing the environment-awareness of rake beamforming for directive acoustic sources", IWAENC 2016 [Poster]

A. Brutti, A. Tsiami, N. Katsamanis, P. Maragos, "A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments", Interspeech, 2016 [PDF]

A. Brutti, A. Abad, "Multi-channel i-vector combination for robust speaker verification in multi-room domestic environments", Speaker Odissey, 2016 [PDF]

A. Brutti, M. Matassoni, "On the relationship between Early-to-Late Ratio of Room Impulse Responses and ASR performance in reverberant environments", Speech Communication, September 2015

J. Correia, A. Brutti, A. Abad, "Multi-channel speaker verification based on total variability modelling", Interspeech 2015 [PDF]

P. Giannoulis et al. "Multi-room speech activity detection using a distributed microphone network in domestic environments", EUSIPCO 2015 [PDF]

A. Brutti, M. Matassoni, "On the use of Early-to-Late Reverberation Ratio for ASR in reverberant environments", ICASSP 2014

A. Brutti,M. Ravanelli, P. Svaizer and M. Omologo, "A speech event detection and localization task for multiroom environments", HSCMA 2014

M. Matassoni, A. Brutti, P. Svazier, "Acoustic modeling based on Early-to-Late Reverberation Ratio for robust ASR", IWAENC 2014

A. Brutti, F. Nesta, "Tracking of multidimensional TDOA for mutliple sources with distributed microphone pairs", Computers, Speech and Languages, Volume 27, Issue 3, May 2013

A. Brutti, P. Svaizer, M. Omologo, "An Environment aware ML estimation of acoustic radiation pattern with distributed microphone pairs", Signal Processing, Volume 93, Issue 4, April 2013

A. Brutti, M. Omologo, "Geometric Contamination for GMM/UBM speaker verification in reverberant environments", Interspeech 2013 [PDF]

P. Svaizer, A. Brutti, M. Omologo, "Environment Estimation of the Orientation of Acoustic Sources using a Line Array", EUSIPCO 2012 [PDF]

A. Brutti, P. Svaizer, M. Omologo, "Maximum A Posteriori Trajectory Estimation for Acoustic Source Tracking", IWAENC 2012

F. Nesta and A. Brutti, "Self-clustering non-Euclidean kernels for improving the estimation of multidimensional TDOA of multiple sources", HSCMA 2011

P. Svaizer, A. Brutti, M. Omologo, "Use of reflected wavefronts for acoustic source localization with a line array", HSCMA 2011

A. Brutti, M. Omologo and P. Svaizer, "Inference of Acoustic Source Directivity Using Environment Awareness", EUSIPCO 2011 [PDF]

A. Brutti and F. Nesta, "Multiple Source Tracking by Sequential Posterior Kernel Density Estimation Through GSCT", EUSIPCO 2011 [PDF]

A. Brutti, M. Omologo and P. Svaizer, "Multiple Source Localization based on Acoustic Map De-Emphasis", EURASIP, Journal on Audio, Speech, and Music Processing, 2010 [PDF]

A. Brutti and O. Lanz, "A joint particle filter to track the position and head orientation of people using audio visual cues", EUSIPCO 2010 [PDF]

P. Svaizer, A. Brutti, M. Omologo, "Analysis of reflected wavefronts by means of a line microphone array", IWAENC 2010

A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt and M. Omologo, "WOZ Acoustic Data Collection For Interactive TV", Language Resources and Evaluation Journal, Special Issue LREC2008, Volume 44, Issue 3, September 2010

A. Brutti, M. Omologo, P. Svaier, "A Sequential Monte Carlo Approach for Tracking of Overlapping Acoustic Sources", EUSIPCO 2009 [PDF]

L. Marquardt, P. Svaizer et al., "A natural acoustic front-end for Interactive TV in the EU-Project DICIT", Pacific Rim Conference 2009

A. Brutti, M. Omologo, P. Svaizer, "Localization of multiple speakers based on a two step acoustic map analysis", IEEE ICASSP 2008, March 30-April 4, Las Vegas, USA.

A. Brutti, M. Omologo, P. Svazier, "Comparison between different sound source localization techniques based on a real data collection", HSCMA, May 2008, Trento

A. Brutti, L. Cristoforetti et al, "WOZ acoustic Data Collection For Interactive TV", LREC 2008, May, Marrakech, Morocco.

A. Brutti, M. Omologo, P. Svaizer, "Classification of Acoustic Maps to determine speaker position and orientation from a distributed microphone network", ICASSP 2007, April 15-19, Honolulu, Hawaii, USA.

A. Brutti, M. Omologo, P. Svaizer, "Localizzazione di parlatori con una rete distribuita di microfoni", 34 Convegno AIA, June 13-15, Firenze, Italy

A. Brutti, M. Omologo, P. Svaizer, "A PATTERN CLASSIFICATION APPROACH TO SOUND SOURCE LOCALIZATION", Workshop Toni Mian, Padova, October 2007

A. Brutti, M. Omologo, P.G. Svaizer, "Estimation of talker's head orientation based on Oriented Global Coherence Field", 120th Audio Engineering Society, May 20-23, 2006, Paris.

A. Brutti, M. Omologo, P.G. Svaizer, "Speaker Localization based on Oriented Global Coherence Field", Interspeech 2006, September 17-21, 2006, Pittsburgh, Pennsylvania, USA.

A. Brutti, M. Omologo, P.G. Svaizer, "Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays". Eurospeech 2005, Lisboa.

M. Omologo, P.G. Svaizer, A. Brutti, L. Cristoforetti, "Speaker Localization in CHIL lectures: Evaluation Criteria and Results". MLMI/NIST Evaluation 2005, Edimburgh.

Other publications:

PhD Thesis: "Distributed Microphone Networks for Sound Source Localization in Smart Rooms", Trento, March 2007

A. Brutti and O. Lanz, "An Audio-Visual Particle Filter for Monitoring Interactive People Behaviour", PRAI*HBA, December 2009, Reggio Emilia.

A. Brutti, "A Person tracking system for CHIL meetings", CLEAR 2007, Baltimore, USA, April 2007

R. Brunelli, A. Brutti, P. Chippendale, O. Lanz, M. Omologo, P. Svaizer, F. Tobia, "A Generative Approach to Audio-Visual Person Tracking", CLEAR'06 Evaluation Workshop, April 6-7, 2006, Southampton, UK.

A. Brutti, M. Omologo et all, "On The Development of an In-Car interation system at IRST", SWIM, Maui Hawai, January, 12-14 2004.

A. Brutti, M. Omologo et all, "Use of Multiple Speech Recognition Units in a In-car Assistance System", invited contribution, chapter 6, in "DSP for Vehicle and Mobile Systems", Kluwer publishers.

Alessio Brutti

Head of Unit

Recent Posts