Universität Wien

400022 SE Text analysis in R (2019S)

Continuous assessment of course work

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Details

max. 15 participants
Language: English

Lecturers

Classes (iCal) - next class is marked with N

Lehrender: Wouter van Atteveldt (University of Amsterdam)

  • Monday 08.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Monday 08.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Tuesday 09.04. 09:00 - 11:15 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Tuesday 09.04. 13:30 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Wednesday 10.04. 09:00 - 13:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Thursday 11.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Thursday 11.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Friday 12.04. 09:00 - 12:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien
  • Friday 12.04. 14:00 - 16:00 C0628A Besprechung SoWi, NIG Universitätsstraße 7/Stg. III/6. Stock, 1010 Wien

Information

Aims, contents and method of the course

The explosion of digital communication and increasing efforts to digitize existing material has produced a deluge of material such as digitized historical news archives, policy and legal documents, political debates or millions of social media messages by politicians, journalists, and citizens. This has the potential of putting theoretical predictions about the societal roles played by information, and the development and effects of communi¬cation to rigorous quantitative tests that were impossible before. Besides providing an opportunity, the analysis of such “big data” sources also poses methodological challenges. Traditional manual content analysis does not scale to very large data sets due to high cost and complexity. For this reason, many researchers turn to automatic text analysis using techniques such as dictionary analysis, automatic clustering and scaling of latent traits, and machine learning.

Course aims and structure: To properly use such techniques, however, requires a very specific skillset. This course aims to give interested PhD (and advanced Master) students an introduction to text analysis. R will be used as platform and language of instruction, but the basic principles and methods are easily generalizable to other languages and tools such as python. Participants will be given handouts with examples based on pre-existing data to follow along, but are encouraged to work on their own data and problems using the techniques offered.
Course outline per day (a=morning, b=afternoon):
1. Introduction to R
a. R, Rstudio, variables, data, functions, packages
b. Inspirational: analysing and visualizing simple data
2. R for data analysis
a. Organizing and cleaning data with tidyverse
b. Aggregating, tabulating, and visualizing data
3. Quantitative text analysis in R
a. Simple quantitative text analysis: Reading, cleaning, and preprocessing text with quanteda and readtext
4. Scraping and cleaning text
a. Dictionary-based text analysis
b. API’s: scraping twitter, nytimes and friends
5. Advanced text analysis
a. LDA and structural topic models
b. Supervised machine learning and scaling

Assessment and permitted materials

Evaluation will be based on a small in-class written quiz (10%), a practical assignment to be handed in on Wednesday (20%), and a larger project on a topic of you choice to be handed in after the last class (70%).

Minimum requirements and assessment criteria

Examination topics

Reading list

Course Literature:
- Welbers, K., Wouter van Atteveldt, and Ken Benoit (2017), Text Analysis in R. Communication Methods and Measures, 11 (4), 245-265, doi: 10.1080/19312458.2017.1387238
- Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.".

Background literature:
- Wouter van Atteveldt and Tai-Quan Peng (2018), When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science, Communication Methods and Measures 12 (2-3), pp. 81-92.
- Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems(pp. 288-296).
- Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.
- Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.
- Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., ... & Rand, D. G. (2014). Structural Topic Models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064-1082.
- Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.

Association in the course directory

Last modified: Mo 07.09.2020 15:47