Achtung! Das Lehrangebot ist noch nicht vollständig und wird bis Semesterbeginn laufend ergänzt.

122251 AR Advanced Course in Linguistics (2020S)

Applied Data Science for Linguists

5.00 ECTS (2.00 SWS), SPL 12 - Anglistik

Prüfungsimmanente Lehrveranstaltung

Moodle

An/Abmeldung

Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").

Anmeldung von Mi 19.02.2020 00:00 bis Di 25.02.2020 23:59
Abmeldung bis Do 30.04.2020 23:59

Details

max. 25 Teilnehmer*innen

Sprache: Englisch

Lehrende

Andreas Baumann

Termine (iCal) - nächster Termin ist mit N markiert

Donnerstag 26.03. 18:30 - 20:00 Class Room 3 ZID UniCampus Hof 7 Eingang 7.1 2H-O1-25
Samstag 28.03. 09:45 - 14:45 Seminarraum 6 UniCampus Hof 7 Eingang 7.1 OG01 2H-O1-33
Donnerstag 23.04. 18:30 - 20:00 Class Room 3 ZID UniCampus Hof 7 Eingang 7.1 2H-O1-25
Samstag 25.04. 09:45 - 14:45 EDV-Raum 4 2C502 5.OG UZA II
Donnerstag 07.05. 18:30 - 20:00 Class Room 3 ZID UniCampus Hof 7 Eingang 7.1 2H-O1-25
Samstag 09.05. 09:45 - 14:45 Seminarraum 6 UniCampus Hof 7 Eingang 7.1 OG01 2H-O1-33
Donnerstag 04.06. 18:30 - 20:00 Class Room 3 ZID UniCampus Hof 7 Eingang 7.1 2H-O1-25
Samstag 06.06. 09:45 - 14:45 PC-Seminarraum 1 Oskar-Morgenstern-Platz 1 1.Untergeschoß

Information

Ziele, Inhalte und Methode der Lehrveranstaltung

This course provides an introduction to the interdisciplinary field of data science with a particular focus on applications in linguistics. Data science is the study of the methods and technologies required to gain insights from data. This subsumes the analysis of data to detect patterns and relationships among interesting variables (e.g. word frequency, class and length) as well as the training and usage of statistical models to make predictions (viz. machine learning; e.g. automatically predicting the author of a text).

This course is divided into four blocks, one per month: The first part provides an introduction to data types and descriptive statistics. The second part introduces techniques from text mining and natural language processing. In the third part, we will cover basic methods of statistical inferential modeling (univariate and multivariate linear and logistic regression), which are useful to study relationships among certain properties in your data (variables/features). In the fourth and final block, we will study techniques from machine learning and predictive modeling (unsupervised and supervised). We will apply these techniques to hands-on use-cases like (automatized) word classification or sentiment analysis of texts.

In this course, we will make use of the scripting language R together with its frontend RStudio. Both are pre-installed on the computers in the lab, but you might want to install them on your own computers as well (e.g. for doing the exercises at home). You will learn how to use R as we go along. Further instructions and literature will be provided on Moodle.

This is an introductory course. As such, no previous knowledge of statistics, statistical software, machine learning or programming is required, but a solid knowledge of high school mathematics (at least Unterstufe) will prove useful (linear functions, basic arithmetic operations, fractions, percentages, probability etc.). Since this course is aimed at a linguistically trained audience, I will take knowledge of fundamental linguistic concepts for granted.

UPDATE: The first sessions of the course will be conducted via moodle. There will be online tasks, discussions and online conferencing sessions in which we discuss exercises in R.

Art der Leistungskontrolle und erlaubte Hilfsmittel

Pre-course exercise (to be handed in online), four home assignments (R and RStudio on your own computer), and participation in class

Mindestanforderungen und Beurteilungsmaßstab

The ability to analyze linguistic data as well as the ability to understand and interpret statistical analyses, to fit statistical models and to use these models for making predictions. The ability to use R and RStudio for this purpose.

Assessment:
Pre-course exercise: 5%
Four home assignments: 20% each
Participation in class: 15%
Minimum pass grade: 60% in total

Prüfungsstoff

Literatur

Baayen, R. H. (2008) Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
Butler, C. (1985). Statistics in linguistics. Oxford: Blackwell.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745-766.
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266.
Feinerer, I. (2018). Introduction to the tm Package Text Mining in R. http://cran.uib.no/web/packages/tm/vignettes/tm.pdf

Zuordnung im Vorlesungsverzeichnis

Studium: UF 344; MA 812 [2]; UF MA 046/507
Code/Modul: UF 4.2.3-223-225, MA M04, MA M05, UF MA 4B
Lehrinhalt: 12-0260

Letzte Änderung: Do 14.11.2024 00:12