Universität Wien

136010 UE Introduction to DH Tools and Methods (2021S)

Continuous assessment of course work

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Details

max. 25 participants
Language: English

Lecturers

Classes (iCal) - next class is marked with N

Topics/Dates

1) Overview on digital ecosystem for DH: OCR & NLP Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis, Sentiment Analysis, Temporal Series Analysis, SQL and NoSQL, Database Management, RDF triplestores, etc.

2) The basics of Programming:
IDEs and Digital Research Frameworks
Programming Languages: why Python?
First steps into programming
Hands on: Installing Anaconda and programming the first “Hello World”

3) The basics of Versioning:
Versioning Code: Git, Github, Gitlab
Versioning Data: Dolthub
Hands on: Installing Github Desktop. Full cycle on versioning files.

4) The basics of Python language I:
Python objects and methods
Python native structures: data containers
Hands on: small scripts with native structures.

5) The basics of Python language II:
Python control flow
Python functions
Python scopes
Hands on: Mining text with simple functions.

6) The basics of Python language III:
Python packages and modules
Python data persistence
Hands on: Exploring common packages from the standard library: os, time, sys, string, pickle, dill, random
Suggested readings:
Python generators and comprehensions

7)Advanced topics of Python language:
Python classes and OOP
Hands on: Creating classes to represent DH entities
Suggested readings:
Python decorators
Code Optimization

8) Topics in Exploratory Data Analysis (EDA) I
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

9) Topics in Exploratory Data Analysis (EDA) II
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

10) NLP Intro:
Motivations, Tasks, Goals and Challenges
NLP and the Humanities
Python NLP Packages (NLTK, Spacy, TextBlob, Wordnet)
Hands on: Python NLP Pipeline for Corpus Acquisition and Cleaning - Information extraction and Scraping, Regular Expressions, Frequency Analysis, N-grams, POS Tagging, Syntax parsing, NER, Summarization

11) Topic Modeling,
Text Classification
Sentiment Analysis
Python NLP Packages (Gensim, PyLDAVis)
Hands on: Python NLP Pipelines for Text Representation (BoW, TfIDf)

12) Word Embeddings
Dimensionality Reduction
Text Visualization
Hands on: Detection of Biases in Corpora

13) Semantic Web Technologies
Knowledge Organization Systems
Thesauri and Ontologies
Hands on: Python and Text Annotation - TEI/XML (XML, LXML, XLPATH)

14) Social Network Analysis
Graphs
Knowledge Graphs,
Hands on: SNA with Python Graph Structures (NetworkX, PyVIS)

15) Where to go from here:
Temporal Series Analysis
Image analysis and Computer Vision
Machine Learning and Deep Learning for NLP
Text Classification
Text Clustering
Stylistic Analysis
Information Extraction
Information Retrieval Systems

  • Friday 05.03. 09:45 - 11:15 Digital
  • Friday 19.03. 09:45 - 11:15 Digital
  • Friday 26.03. 09:45 - 11:15 Digital
  • Friday 16.04. 09:45 - 11:15 Digital
  • Friday 23.04. 09:45 - 11:15 Digital
  • Friday 30.04. 09:45 - 11:15 Digital
  • Friday 07.05. 09:45 - 11:15 Digital
  • Friday 14.05. 09:45 - 11:15 Digital
  • Friday 21.05. 09:45 - 11:15 Digital
  • Friday 28.05. 09:45 - 11:15 Digital
  • Friday 04.06. 09:45 - 11:15 Digital
  • Friday 11.06. 09:45 - 11:15 Digital
  • Friday 18.06. 09:45 - 11:15 Digital
  • Friday 25.06. 09:45 - 11:15 Digital

Information

Aims, contents and method of the course

The course is aimed at providing students with the skills necessary to understand the sheer potential of the digital methods for the humanities, using the Python Programming Language for a handful of common tasks in the domain. The course will present a broad overview of methods and tools, specifically covering the following: OCR & Natural Language Processing (NLP) Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis (SNA), Sentiment Analysis, SQL and NoSQL Database Management. The course approach is both theoretical and practical, with an intense load of hands-on exercises. The students are expected to have familiarity with digital environments, and previous practice with programming is desired, but not mandatory.

Assessment and permitted materials

Course evaluation will be a combination of in-class participation (30%), weekly homework assignments (40%), and the final project (30%).

Minimum requirements and assessment criteria

Attendance is required; regular participation is the key to completing the course; all students must provide their computing environment; homework assignments must be submitted on time (some can be completed later as a part of the final project, but this must be discussed with the instructor whenever the issue arises); the final project must be submitted on time.

Examination topics

There is no examination for the course.

Reading list

Learning Python, 5th Edition by Mark Lutz, O'Reilly Media, 2013. ISBN 978-1-4493-5573-9.

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinny, O'Reilly Media, 2012. ISBN 978-1-4493-1979-3

Github Repository - https://github.com/rsouza/Python_Course

Programming historian → relevant courses
https://programminghistorian.org/en/lessons/

TED Talk - https://www.ted.com/talks/reshma_saujani_teach_girls_bravery_not_perfection

Association in the course directory

DH-S I

Last modified: Th 04.07.2024 00:13