Deep learning methods for voice processing: Neural vocoding for voice transformation

resource center

Do you notice a mistake?

information

event: Traitement du signal pour la voix (Action Audio)
Type: Séminaire / Conférence
performance location: Ircam, Salle Igor-Stravinsky (Paris)
duration: 56 min
date: October 20, 2022

For some years, the state-of-the-art in speech synthesis and processing has been dominated by data-driven methods and deep neural networks. The use of ever larger amounts of data allows the exploitation of ever more parameters, leading to ever better results. Unfortunately, the increasing computational complexity hinders the widespread application of these models.

In the first part of the talk, we will present our research into data and computationally efficient voice transformation with deep neural networks. We will introduce the Multi-band Excited WaveNet, a deep neural network that integrates a WaveNet into a classical source-filter model. The discussion will motivate model structure and training losses. We will describe the deficiencies of the proposed model and briefly reflect on perspectives considering the rapidly evolving state of the art in neural vocoding.

The second part will then demonstrate ongoing research into applications of the neural vocoder, combining it with dedicated models for intensity, pitch, expressivity or identity transformation.

Bio: Axel Roebel is director of research IRCAM and head of the Analysis/Synthesis team. His research activities center around voice and music synthesis and transformation with strong focus on artistic and industrial applications. After many years or research into various signal processing algorithms he now has shifted his focus towards data driven methods.

speakers

Axel Roebel

From the same archive

Acoustic-articulatory modeling: from assistive technologies to the study of speech development mechanisms

Video

October 20, 2022 01:01:59

Video

De la théorie source-filtre aux interactions pneumo-phono-résonantiels : la complexité de la voix humaine

Video

October 20, 2022 01:05:09

Video

Présentation des doctorants en salle

Video

October 20, 2022 00:26:53

Video

Prédiction de la forme géométrique du conduit vocal à partir de la suite de phonèmes à articuler

Video

October 20, 2022 01:09:20

Video

Do you notice a mistake?

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Deep learning methods for voice processing: Neural vocoding for voice transformation

information

speakers

From the same archive

Acoustic-articulatory modeling: from assistive technologies to the study of speech development mechanisms

De la théorie source-filtre aux interactions pneumo-phono-résonantiels : la complexité de la voix humaine

Présentation des doctorants en salle

Prédiction de la forme géométrique du conduit vocal à partir de la suite de phonèmes à articuler

share

IRCAM

opening times

subway access