Warning! The directory is not yet complete and will be amended until the beginning of the term.

340212 VU Speech Technologies (2024S)

6.00 ECTS (3.00 SWS), SPL 34 - Translationswissenschaft

Continuous assessment of course work

Moodle

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Registration is open from Mo 12.02.2024 09:00 to Fr 23.02.2024 17:00
Registration is open from Mo 11.03.2024 09:00 to Fr 15.03.2024 17:00
Deregistration possible until Su 31.03.2024 23:59

Details

max. 40 participants

Language: English

Lecturers

Classes (iCal) - next class is marked with N

Thursday 14.03. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 21.03. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 11.04. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 18.04. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 25.04. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 02.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 16.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 23.05. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 06.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 13.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG
Thursday 20.06. 16:45 - 19:00 Medienlabor II ZfT Gymnasiumstraße 50 4.OG

Information

Aims, contents and method of the course

Goals:

This course introduces basic techniques and methods of speech technology with a special focus on speech synthesis and speech recognition. Linguistic basics of speech production and basics of signal processing are also presented. Current areas of research as well as the current state of research are highlighted and discussed. Both classical methods, which are still relevant in hybrid architectures, and the latest methods based on neural networks will be presented.

Content:

14.3.:
Lecture 1
1. Introduction
2. Phonetics

11.4.:
Lecture 2
3. Signal Processing and classical vocoder
4. Minimum Edit Distance (MED) and Dynamic Time Warping (DTW)

18.4.:
Lecture 3
5. Hidden-Markov-models (HMM)
6. N-gram language models

25.4.:
Exercise 1

2.5.:
Lecture 4
7. Vector semantics and embeddings
8. Feed-forward Neural Networks (NN)

16.5.:
Lecture 5
9. Convolutional NN, RNN and LSTM
10. Transformer

23.5.:
Lecture 6
11. Speech synthesis: DNN based vocoders
12. Speech synthesis: DNN based acoustic models

6.6.:
Lecture 7
13. Speech recognition: DNN based acoustic models
14. Speech recognition: DNN based language models

13.6.:
Exercise 2

20.6.:
Programming exercise

Methodology:

Theoretical presentation of the basics of the field of language technology.
Development and implementation of a practical application to a current task in the field of the course.
Independent solving of exercises

Assessment and permitted materials

Exercise 1 (25.4.): Written test with questions from lecture 1-3 (no aids allowed).

Exercise 2 (13.6.): Written test with questions from lecture 4-7 (no aids allowed).

Programming exercise (Handout on 25.4., Handin on 20.6.): Develop an accent recognition system, that allows for the recognition of the spoken accent from a speech signal, in a group of 3-4 students and present the results.

Minimum requirements and assessment criteria

You have to achieve 50% of the total points for a positive grade.

The grade depends on the points for the two exercises (30% each), and on the programming exercise (40%).

You have to be present, at most 2 missed lecture units are possible.

Examination topics