Open Innovation Campus

Disruptive Technologies

Advanced Signal Processing for Ultra-Realistic Speech Synthesis

Unavailable

Resources

Some references from arxiv: https://arxiv.org/pdf/2305.07243.pdf

Some repositories for speech synthesis and voice cloning:

https://github.com/afiaka87/tortoise-tts/blob/main/tortoise_tts.ipynb

(https://github.com/afiaka87/tortoise-tts).

Are you interested?

If you are a professor or university student and you are interested in participating in the TUTORING program, register your information so that we can start the program.

Student registration
Academic registration

Subject Area

The ideal candidate should have an intense and abiding desire to learn advanced learning techniques, coupled with a solid educational background in mathematics, engineering or signal processing related fields.

Introduction

In the last few years, the discipline of image generation has undergone a significant transformation due to the adoption of autoregressive transformers and diffusion models coming from the deep learning field.

These approaches treat image synthesis as a stochastic process and take advantage of extensive computational resources and data to learn the image distribution.

Challenge Description

This challenge is intended to build upon and apply recent developments, in the field of generative image creation, to speech synthesis.

Some outcomes have already been made available to the public, demonstrating a high level of expressivity in multi-voice text-to-speech and voice cloning systems, like tortoise (https://replicate.com/afiaka87/tortoise-tts?input=python).

In this challenge, we propose to apply this methodology, for enhancing performance in image generation, by using speech spectrograms obtained by the discrete Fourier transform (DFT), both in their real and complex parts, for the purpose of improving speech synthesis.

Speech spectrograms can be employed as an "image representation" of both audio and speech, thereby enabling the "easy" transfer of many advances from the video and image processing fields to the speech processing domain.

Who is this challenge for you?

Telefónica's Industrial Tutors accompany you in the development of the TFG/TFM, providing their real vision of the industry. They will share their knowledge and experience, offering you feedback so that you can develop a project with an innovative impact.
Jordi Luque Serrano

Jordi Luque Serrano

Research - Discovery / TID