136038 UE Doing research with text corpora (2024S)

5.00 ECTS (2.00 SWS), SPL 13 - Europäische und Vergleichende Sprach- und Literaturwissenschaft

Continuous assessment of course work

Moodle

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Registration is open from Mo 05.02.2024 08:00 to Tu 27.02.2024 23:59
Deregistration possible until Su 31.03.2024 23:59

Details

max. 25 participants

Language: English

Lecturers

Klaus Hofmann

Classes (iCal) - next class is marked with N

Wednesday 13.03. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 20.03. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 17.04. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 15.05. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 29.05. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 12.06. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde
Wednesday 26.06. 09:45 - 13:00 Seminarraum 1 2H316 UZA II Rotunde

Information

Aims, contents and method of the course

The course introduces students to the study of text corpora. A corpus, in its broadest sense, is a structured collection of texts. In modern usage, this usually refers to a digital text collection that is annotated with respect to a pre-defined set of analytically relevant features. Although the systematic study of machine-readable text corpora as an empirically based method has mostly been developed within the field of linguistics, text corpora can be useful for investigating all sorts of research questions within the Digital Humanities.
The course will first introduce students to a number of corpora that are available online. Students will learn to apply various browser-based search and analysis tools. Next, students will learn how to compile and annotate their own corpus using tools based in machine-learning. Students will become acquainted with the various formats that different corpora are encoded in, with a particular focus on XML formats. Students will learn how to apply various methods for analysing corpus-derived data, including basic statistical testing, regression modelling, and network analysis. They will also learn how to visualize their results and present their research in the form of a poster or oral presentation.

The software used for all corpus construction and analysis is R (including wrappers of Python-based tools).

The approach is both theoretical and practical, with hands-on exercises in project planning and prototyping. Students are expected to have some familiarity with digital environments, and previous practice with programming is desired, but not strictly mandatory. The course will be mostly held in English, but some code-switching between English and German will invariably occur.

Assessment and permitted materials

Attendance and participation in class
Home exercises and assignments
Oral or poster presentation
Written project portfolio

Minimum requirements and assessment criteria

Assessment will be based on:
regular attendance and participation (20%)
home exercises and assignments (20%)
oral or poster presentation (30%)
written project portfolio (30%)

Examination topics

There is no exam for the course.

Reading list

Gries, Stefan T. (2021). Statistics for linguistics with R: A practical introduction (Third edition). De Gruyter.
Levshina, Natalia. (2015). How to do linguistics with R. Data exploration and statistical analysis. John Benjamins.
McEnery, Tony & Wilson, Andrew. (2022). Corpus linguistics. Edinburgh University Press.
Meyer, Charles F. (2023). English corpus linguistics: An introduction (Second edition). Cambridge University Press.
Winter, Bodo. (2019). Statistics for linguists: An introduction using R. Routledge.

Association in the course directory

DH-S II

Last modified: We 03.07.2024 15:05