You are here

Open calls

  • DNN adaptation for acoustic modeling in speech recognition

DNNs are nowadays the state-of-the-art approach for acoustic models, able to provide results also in severe conditions that are often better than Gaussian Mixture Models (GMM) systems with various compensation strategies; nevertheless, although DNNs provide data representations usually invariant to some perturbations in the input features,  there are scenarios where performance of a hybrid DNN-HMM (hidden Markov Model) system is significantly worse compared to the matched counterpart: for example in case of either distant-talking or  children speech recognition. Therefore a large number of studies are addressing DNN adaptation for ASR, investigating various approaches such as linear transformations, regularization in DNN training,  noise-aware or speaker-aware training and decoding.The work focus on methods for effectively adapting the DNN at hand (considering also recent architectures like CNN, RNN, LSTM), using a small set of adaptation data in supervised or unsupervised fashion. In particular, this latter modality calls for strategies capable to disregard or weigh the training samples in order to mitigate the impact of errors in the automatically generated transcriptions. Moreover, the studied adaptation techniques can be applied and evaluated in combination with other compensations methods according to the applicative scenario under analysis (e.g., signal enhancement in case of noisy conditions,  feature transformation in case of spontaneous or children speech).This research area is challenging. If significant results will be achieved during the course of the studentship, there will be the possibility to publish them in top conferences and journals. Furthermore, there will be the opportunity of collaborating with important companies and labs  operating in the field of ASR.Tutor: Marco Matassoni (matasso@fbk.eu) and Daniele Falavigna (falavi@fbk.eu) 

  • Deep machine learning for speaker diarization

The problem of identifying who is speaking and when is a relevant task for many applications, ranging from electronic surveillance and telephone taping to speech recognition in cocktail party scenarios. This task is extremely challenging in presence of high environmental noise, reverberation and interfering sources, in particular when a single channel is available. However, recent progresses in the use of deep learning for single channel speech enhancement and source separation has opened the way to a variety of novel solutions towards the development of robust and effective solutions for speaker diarization in such critical operational conditions. The goal of this studentship is to develop state-of-the-art solutions for speaker diarization, possibly using open source toolkits, such as theano or tensorflow, taking into account possible issues related to both the computational cost and the training size . The algorithms will be developed and tested on both publicly available datasets (i.e.santa barbara corpus) as well as on internal datasets of telephone recordings, comparing them with free tools currently available (e.g., ALIZE and  LIUM).This research area is challenging and topical. If significant results will be achieved during the course of the studentship, there will be the possibility to publish them in top conferences and journals. Furthermore, since the speaker diarization module is strategic for improving the performance of speech recognition engines, there will be the opportunity of collaborating with important companies and labs  operating in the field.Tutor: Alessio Brutti (brutti@fbk.eu)