052321 VU Recent Developments in Knowledge Discovery in Databases (2025S)

6.00 ECTS (4.00 SWS), SPL 5 - Informatik und Wirtschaftsinformatik

Prüfungsimmanente Lehrveranstaltung

Moodle

Di 29.04. 15:00-16:30 Seminarraum 8, Währinger Straße 29 1.OG

An/Abmeldung

Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").

Anmeldung von Mo 10.02.2025 09:00 bis Fr 21.02.2025 09:00
Abmeldung bis Fr 14.03.2025 23:59

Details

max. 25 Teilnehmer*innen

Sprache: Englisch

Lehrende

Termine (iCal) - nächster Termin ist mit N markiert

Dienstag 04.03. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 06.03. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 11.03. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 13.03. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 18.03. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 20.03. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 25.03. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 27.03. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 01.04. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 03.04. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 08.04. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 10.04. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
N Dienstag 29.04. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Dienstag 06.05. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 08.05. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 13.05. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 15.05. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 20.05. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 22.05. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 27.05. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Dienstag 03.06. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 05.06. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 10.06. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 12.06. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG
Dienstag 17.06. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Dienstag 24.06. 15:00 - 16:30 Seminarraum 8, Währinger Straße 29 1.OG
Donnerstag 26.06. 13:15 - 14:45 Seminarraum 5, Währinger Straße 29 1.UG

Information

Ziele, Inhalte und Methode der Lehrveranstaltung

The amount of data gathered every year is steadily increasing, and mankind is curious for more and more knowledge that they can discover in these vast amounts of data. Even though data is a mighty resource, we are not using its full potential:

This course aims at teaching recently developed, important state-of-the-art methods to discover knowledge in databases– from a theoretical point of view as well as their implementation and application. We learn how to discover, assess, and deeply understand novel methods that are more complex than fundamental methods taught in other courses. We address different aspects of learning new methods from the field of knowledge discovery in databases: learning lecture-style by listening to talks, creating a small data base for benchmarking, discovering a new method by reading a scientific paper and teaching it to others in a talk as well as discussing it in groups and preparing a small tutorial to explain it for laymen.

This semester, we focus on data-driven causality and clustering methods. Both, causality and clustering are well-represented in top AI international conferences, as AAAI Conference on Artificial Intelligence, or IEEE International Conference on Data Mining, or ICLR.

Methods/ Course:

The course will have two parts: causality and clustering.

In Causality, recent methods for causal discovery in both temporal and non-temporal data will be presented. Concretely Granger causal methods and their more recent variations. After this part, bivariate causality in non-temporal data will be presented including the benchmark data sets and methods. In the second part of the course, the students will be assigned a causal challenge project, in which they create their small data base for benchmarking on real-world causal inference problems. Moreover, the students will solve an exercise sheet and present a paper on causality which they select in the beginning of the course.
The goal of this course is by active learning to understand und be creative in this awesome field of knowledge discovery.

In the second part, we focus on clustering. We build upon existing knowledge from FDA and Data Mining and regard recent developments in the field and approaches to open challenges like fairness, noisy data sets, or data with uncertainty.
As a project, students can choose between more theoretical or practical work:
For the theory project, they focus on a recent paper, create a tutorial for it that makes it easy to understand for non-computer scientists, and present it to the group.
If you prefer a more practical project, we give the option to take part in a challenge like the KDD CUP (which is going to be published on March 1st, as a reference, you can regard challenges from last year, e.g.: https://www.biendata.xyz/kdd2024/)
We end with a small test about the topics from the second half of the semester.

Art der Leistungskontrolle und erlaubte Hilfsmittel

100 points in total.
Causality: a small test at the end of the Causality course; Exercise sheet; Paper presentation; Causal challenge (= creating a small database).

Clustering: either theory or practical project (25P); Test in the end (25P).

Mindestanforderungen und Beurteilungsmaßstab

This course is for master students only.

We recommend to have visited the basic bachelor courses as well as
- Foundations of Data Analysis (required)
- Data Mining

Components:
50% from the Causality part
25% Project for clustering
25% Test about clustering

Grading:
>87,00 %: 1
between 75,00 % and 86,99 %: 2
between 63,00 % and 74,99 %: 3
between 50,00 % and 62,99 %: 4
< 50%: 5

Prüfungsstoff

Literatur

For the Causal Inference part, this literature provides the background to better understand the taught models and methods:

Sayed, Ali H. Inference and Learning from Data: Learning. Vol. 1- 3. Cambridge University Press, 2022.

Volume I: Chapters Matrix Theory, Random Variable, Exponential Distributions, pp. 1-195; Random Processes, pp. 240-259; Volume II: Chapters MSE Inference, pp. 1053-1090, Linear Regression, pp. 1121-1153; Maximum Likelihood, pp. 1211-1273, Inference in Graphs: 1682-1737; Volume III: Chapters Regularization, pp. 2221-2257, Logistic Regression, pp. 2457-2496.

Access to the book via Library of University of Vienna (website) or Cambridge University Press (website).

Zuordnung im Vorlesungsverzeichnis

Letzte Änderung: Di 25.03.2025 14:25