You are here
Joint FBK-QMUL PhD Studentship in Understanding Audio-visual Interactions through Multi-view Mapping
The Centre for Intelligent Sensing at Queen Mary University of London invites applications for a PhD Studentship to undertake research in the area of deep learning for scene understanding from moving and heterogeneous audio-visual sensors (e.g. body cameras). The PhD project will focus on audio and video re-identification from multiple heterogeneous devices, on recognizing actions and spotting audio-visual keywords.
All nationalities are eligible to apply for this studentship, to be ideally started in or after June 2018. The studentship is for up to four years, and covers student fees as well as a tax-free stipend.
This PhD project is part of an interdisciplinary collaboration between the Centre for Intelligent Sensing (http://cis.eecs.qmul.ac.uk) at Queen Mary University of London (QMUL) and the Centre for Information Technology (http://ict.fbk.eu) at the Fondazione Bruno Kessler (FBK), Trento, Italy. The PhD student will spend approximatively half of their time in London and half of their PhD time in Trento and will have access to state-of-the-art laboratories, including aerial and ground robotic sensors, and multi-camera and multi-microphone installations. The PhD student will be based at Centre for Intelligent Sensing in the School of Electronic Engineering and Computer Science at Queen Mary, University of London and will be supervised by Professor Andrea Cavallaro and Dr Alessio Brutti.
Candidates should have a first-class honours degree or equivalent, or a good MSc Degree, in Computer Science, Physics, Mathematics or Electronic Engineering. Candidates must be confident in applied mathematics, and should have good programming experience, in particular C/C++, Python and MATLAB environment. Previous knowledge of Signal Processing or Deep Learning/Machine Learning is required.
For more information and to apply, please visit: http://www.eecs.qmul.ac.uk/phd/apply.php
Informal enquiries can be made by email to Professor Andrea Cavallaro (email@example.com).
The closing date for the applications is 31 January 2018.
PAST CALLS (NOW CLOSED)
DNN adaptation for acoustic modeling in speech recognition
DNNs are nowadays the state-of-the-art approach for acoustic models, able to provide results also in severe conditions that are often better than Gaussian Mixture Models (GMM) systems with various compensation strategies; nevertheless, although DNNs provide data representations usually invariant to some perturbations in the input features, there are scenarios where performance of a hybrid DNN-HMM (hidden Markov Model) system is significantly worse compared to the matched counterpart: for example in case of either distant-talking or children speech recognition. Therefore a large number of studies are addressing DNN adaptation for ASR, investigating various approaches such as linear transformations, regularization in DNN training, noise-aware or speaker-aware training and decoding.
The work focus on methods for effectively adapting the DNN at hand (considering also recent architectures like CNN, RNN, LSTM), using a small set of adaptation data in supervised or unsupervised fashion. In particular, this latter modality calls for strategies capable to disregard or weigh the training samples in order to mitigate the impact of errors in the automatically generated transcriptions. Moreover, the studied adaptation techniques can be applied and evaluated in combination with other compensations methods according to the applicative scenario under analysis (e.g., signal enhancement in case of noisy conditions, feature transformation in case of spontaneous or children speech).
This research area is challenging. If significant results will be achieved during the course of the studentship, there will be the possibility to publish them in top conferences and journals. Furthermore, there will be the opportunity of collaborating with important companies and labs operating in the field of ASR.
Deep machine learning for speaker diarization
The problem of identifying who is speaking and when is a relevant task for many applications, ranging from electronic surveillance and telephone taping to speech recognition in cocktail party scenarios. This task is extremely challenging in presence of high environmental noise, reverberation and interfering sources, in particular when a single channel is available. However, recent progresses in the use of deep learning for single channel speech enhancement and source separation has opened the way to a variety of novel solutions towards the development of robust and effective solutions for speaker diarization in such critical operational conditions.
The goal of this studentship is to develop state-of-the-art solutions for speaker diarization, possibly using open source toolkits, such as theano or tensorflow, taking into account possible issues related to both the computational cost and the training size . The algorithms will be developed and tested on both publicly available datasets (i.e.santa barbara corpus) as well as on internal datasets of telephone recordings, comparing them with free tools currently available (e.g., ALIZE and LIUM).
This research area is challenging and topical. If significant results will be achieved during the course of the studentship, there will be the possibility to publish them in top conferences and journals. Furthermore, since the speaker diarization module is strategic for improving the performance of speech recognition engines, there will be the opportunity of collaborating with important companies and labs operating in the field.
Tutor: Alessio Brutti (firstname.lastname@example.org)