Universität Wien

400022 SE Text analysis in R (2019S)

Prüfungsimmanente Lehrveranstaltung

An/Abmeldung

Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").

Details

max. 15 Teilnehmer*innen
Sprache: Englisch

Lehrende

Termine (iCal) - nächster Termin ist mit N markiert

Lehrender: Wouter van Atteveldt (University of Amsterdam)

  • Montag 08.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Montag 08.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Dienstag 09.04. 09:00 - 11:15 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Dienstag 09.04. 13:30 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Mittwoch 10.04. 09:00 - 13:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Donnerstag 11.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Donnerstag 11.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Freitag 12.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Freitag 12.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien

Information

Ziele, Inhalte und Methode der Lehrveranstaltung

The explosion of digital communication and increasing efforts to digitize existing material has produced a deluge of material such as digitized historical news archives, policy and legal documents, political debates or millions of social media messages by politicians, journalists, and citizens. This has the potential of putting theoretical predictions about the societal roles played by information, and the development and effects of communi¬cation to rigorous quantitative tests that were impossible before. Besides providing an opportunity, the analysis of such “big data” sources also poses methodological challenges. Traditional manual content analysis does not scale to very large data sets due to high cost and complexity. For this reason, many researchers turn to automatic text analysis using techniques such as dictionary analysis, automatic clustering and scaling of latent traits, and machine learning.

Course aims and structure: To properly use such techniques, however, requires a very specific skillset. This course aims to give interested PhD (and advanced Master) students an introduction to text analysis. R will be used as platform and language of instruction, but the basic principles and methods are easily generalizable to other languages and tools such as python. Participants will be given handouts with examples based on pre-existing data to follow along, but are encouraged to work on their own data and problems using the techniques offered.
Course outline per day (a=morning, b=afternoon):
1. Introduction to R
a. R, Rstudio, variables, data, functions, packages
b. Inspirational: analysing and visualizing simple data
2. R for data analysis
a. Organizing and cleaning data with tidyverse
b. Aggregating, tabulating, and visualizing data
3. Quantitative text analysis in R
a. Simple quantitative text analysis: Reading, cleaning, and preprocessing text with quanteda and readtext
4. Scraping and cleaning text
a. Dictionary-based text analysis
b. API’s: scraping twitter, nytimes and friends
5. Advanced text analysis
a. LDA and structural topic models
b. Supervised machine learning and scaling

Art der Leistungskontrolle und erlaubte Hilfsmittel

Evaluation will be based on a small in-class written quiz (10%), a practical assignment to be handed in on Wednesday (20%), and a larger project on a topic of you choice to be handed in after the last class (70%).

Mindestanforderungen und Beurteilungsmaßstab

Prüfungsstoff

Literatur

Course Literature:
- Welbers, K., Wouter van Atteveldt, and Ken Benoit (2017), Text Analysis in R. Communication Methods and Measures, 11 (4), 245-265, doi: 10.1080/19312458.2017.1387238
- Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.".

Background literature:
- Wouter van Atteveldt and Tai-Quan Peng (2018), When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science, Communication Methods and Measures 12 (2-3), pp. 81-92.
- Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems(pp. 288-296).
- Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.
- Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.
- Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., ... & Rand, D. G. (2014). Structural Topic Models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064-1082.
- Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.

Zuordnung im Vorlesungsverzeichnis

Letzte Änderung: Mo 07.09.2020 15:47