Open Innovation Campus

Digital Life

Enhancing Voice Assistants with Voice Activity Detection

Disponible

Resources

Access to datasets: Libriparty and in-domain dataset

¿Te interesa?

Si eres profesor o estudiante universitario y tienes interés en participar en el programa TUTORÍA, registra tus datos para que podamos iniciar el programa.

Registro Alumnos
Registro Profesores

Area of development

Proficiency in Python programming language and understanding of the fundamentals of machine learning are recommended for this project.

Challenge Intro

The popularity of voice-based interfaces has significantly increased due to their ability to enable hands-free communication with a wide range of devices. In this context, deep learning technologies have emerged as the standard approach to enhance naturalness and efficiency in device interactions.  
Edge AI plays a crucial role in the development of such interfaces as it helps to minimize delays and protect user privacy. There are various approaches to improving communication between humans and devices. One of these approaches involves working at the signal level, specifically with speech audio. Typically, this signal is contaminated by the presence of background noise. The task of Voice Activity Detection aims to enhance users' interactions with devices by effectively distinguishing between unwanted noises and valid user utterances.

Challenge Description

The research project objective is to evaluate existing Voice Activity Detection tools and integrate them into the human-device communication chain. The project can be divided into the following subtasks:  
Researching the State-of-the-Art on Voice Activity Detection.
Select an existing tool with free commercial use availability based on the research.
Exploring and preprocessing an out-of-domain audio dataset: Libriparty.
Using that data to assess out-of-domain model performance.
Studying operating point selection and post-processing for real-time models.
Exploring and preprocessing an in-domain audio dataset: domestic scenario.
Using that data to assess in-domain model performance.
Selecting in-domain operating point and post-processing strategy.
Integrating the model into the human-device communication chain with virtual assistants.

¿Quién te plantea este desafío?

Los Tutores Industriales de Telefónica, te acompañan en el desarrollo del TFG/TFM, aportando su visión real de la industria. Compartirán su conocimiento y experiencia, ofreciéndote feedback para que puedas desarrollar un proyecto con impacto innovador.
Fernando López Telefónica

Fernando López Gavilánez

Product Exploration and Prototyping - Digital Home - CDO / Telefónica