Projet ODESSA: Online Diarization Enhanced by recent Speaker identification and Sequential learning Approaches

01/03/2016 to 01/09/2019

Speaker diarization is an unsupervised process that aims at identifying each speaker within an audio stream and determining when each speaker is active. It considers that the number of speakers, their identities and their speech turns are all unknown. Speaker diarization has become an important key technology in many domains such as content-based information retrieval, voice biometrics, forensics or social-behavioural analysis. Current state-of-the-art systems suffer from many limitations. Such systems are extremely domain-dependent ane experience drastically degraded performance when tested on a different type of recordings. In the recent years, state-of-the-art speaker recognition systems have shown good improvement, thanks to the emergence of new recognition paradigms such as i-vectors and deep learning. Therefore, one goal of the project is to adapt those techniques for speaker diarization. Furthermore, most existing work addresses the problem of offline speaker diarization, which is not admissible in real-time applications. Since our main application is related to security, designing an online speaker diarization system with low latency is necessary. A third goal of the project is to take into account the inherent temporal structure of interactions between speakers and rely on structured prediction techniques. In a context of reproducible research, we will evaluate the proposed algorithms on standard databases (NIST SRE, REPERE, ETAPE, AMI...) and collect a medium-size database that suits our main application of fight against cyber-criminality.

