340212 VU Speech Technologies (2025S)

6.00 ECTS (3.00 SWS), SPL 34 - Translationswissenschaft

Continuous assessment of course work

Moodle

Th 08.05. 16:45-19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Registration is open from Mo 10.02.2025 09:00 to Fr 21.02.2025 17:00
Registration is open from Mo 10.03.2025 09:00 to Fr 14.03.2025 17:00
Deregistration possible until Fr 21.03.2025 23:59

Details

max. 40 participants

Language: English

Lecturers

Classes (iCal) - next class is marked with N

The lecture starts on 13.3.

Thursday 13.03. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 20.03. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 27.03. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 03.04. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 10.04. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
N Thursday 08.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 15.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 22.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 05.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 12.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 26.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG

Information

Aims, contents and method of the course

Goals:

This course introduces basic techniques and methods of speech technology with a special focus on speech synthesis and speech recognition. Linguistic basics of speech production and basics of signal processing are also presented. Current areas of research as well as the current state of research are highlighted and discussed. Both classical methods, which are still relevant in hybrid architectures, and the latest methods based on neural networks will be presented.

Content:

13.3.:
Lecture 1
1. Introduction
2. Phonetics

Lecture 2
3. Signal Processing and classical vocoder
4. Minimum Edit Distance (MED) and Dynamic Time Warping (DTW)

Lecture 3
5. Hidden-Markov-models (HMM)
6. N-gram language models

Exercise 1

Lecture 4
7. Vector semantics and embeddings
8. Feed-forward Neural Networks (NN)

Lecture 5
9. Convolutional NN, RNN and LSTM
10. Transformer

Lecture 6
11. Speech synthesis: DNN based vocoders
12. Speech synthesis: DNN based acoustic models

Lecture 7
13. Speech recognition: DNN based acoustic models
14. Speech recognition: DNN based language models

Exercise 2

Programming exercise

Methodology:

Theoretical presentation of the basics of the field of language technology.
Development and implementation of a practical application to a current task in the field of the course.
Independent solving of exercises

Assessment and permitted materials

Exercise 1: Written test with questions from lecture 1-3 (no aids allowed).

Exercise 2: Written test with questions from lecture 4-7 (no aids allowed).

Programming exercise (Handout on TBD, Handin on TBD): Develop an accent recognition system, that allows for the recognition of the spoken accent from a speech signal, in a group of 3-4 students and present the results.

Minimum requirements and assessment criteria

You have to achieve 50% of the total points for a positive grade.

The grade depends on the points for the two exercises (30% each), and on the programming exercise (40%).

You have to be present, at most 2 missed lecture units are possible.

Examination topics