Universität Wien FIND

Due to the COVID-19 pandemic, changes to courses and exams may be necessary at short notice. Inform yourself about the current status on u:find and check your e-mails regularly.

Please read the information on https://studieren.univie.ac.at/en/info.

Warning! The directory is not yet complete and will be amended until the beginning of the term.

136010 UE Introduction to DH Tools and Methods (2021S)

Continuous assessment of course work

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Details

max. 25 participants
Language: English

Lecturers

Classes (iCal) - next class is marked with N

Topics/Dates

1) Overview on digital ecosystem for DH: OCR & NLP Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis, Sentiment Analysis, Temporal Series Analysis, SQL and NoSQL, Database Management, RDF triplestores, etc.

2) The basics of Programming:
IDEs and Digital Research Frameworks
Programming Languages: why Python?
First steps into programming
Hands on: Installing Anaconda and programming the first “Hello World”

3) The basics of Versioning:
Versioning Code: Git, Github, Gitlab
Versioning Data: Dolthub
Hands on: Installing Github Desktop. Full cycle on versioning files.

4) The basics of Python language I:
Python objects and methods
Python native structures: data containers
Hands on: small scripts with native structures.

5) The basics of Python language II:
Python control flow
Python functions
Python scopes
Hands on: Mining text with simple functions.

6) The basics of Python language III:
Python packages and modules
Python data persistence
Hands on: Exploring common packages from the standard library: os, time, sys, string, pickle, dill, random
Suggested readings:
Python generators and comprehensions

7)Advanced topics of Python language:
Python classes and OOP
Hands on: Creating classes to represent DH entities
Suggested readings:
Python decorators
Code Optimization

8) Topics in Exploratory Data Analysis (EDA) I
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

9) Topics in Exploratory Data Analysis (EDA) II
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

10) NLP Intro:
Motivations, Tasks, Goals and Challenges
NLP and the Humanities
Python NLP Packages (NLTK, Spacy, TextBlob, Wordnet)
Hands on: Python NLP Pipeline for Corpus Acquisition and Cleaning - Information extraction and Scraping, Regular Expressions, Frequency Analysis, N-grams, POS Tagging, Syntax parsing, NER, Summarization

11) Topic Modeling,
Text Classification
Sentiment Analysis
Python NLP Packages (Gensim, PyLDAVis)
Hands on: Python NLP Pipelines for Text Representation (BoW, TfIDf)

12) Word Embeddings
Dimensionality Reduction
Text Visualization
Hands on: Detection of Biases in Corpora

13) Semantic Web Technologies
Knowledge Organization Systems
Thesauri and Ontologies
Hands on: Python and Text Annotation - TEI/XML (XML, LXML, XLPATH)

14) Social Network Analysis
Graphs
Knowledge Graphs,
Hands on: SNA with Python Graph Structures (NetworkX, PyVIS)

15) Where to go from here:
Temporal Series Analysis
Image analysis and Computer Vision
Machine Learning and Deep Learning for NLP
Text Classification
Text Clustering
Stylistic Analysis
Information Extraction
Information Retrieval Systems

Friday 05.03. 09:45 - 11:15 Digital
Friday 19.03. 09:45 - 11:15 Digital
Friday 26.03. 09:45 - 11:15 Digital
Friday 16.04. 09:45 - 11:15 Digital
Friday 23.04. 09:45 - 11:15 Digital
Friday 30.04. 09:45 - 11:15 Digital
Friday 07.05. 09:45 - 11:15 Digital
Friday 14.05. 09:45 - 11:15 Digital
Friday 21.05. 09:45 - 11:15 Digital
Friday 28.05. 09:45 - 11:15 Digital
Friday 04.06. 09:45 - 11:15 Digital
Friday 11.06. 09:45 - 11:15 Digital
Friday 18.06. 09:45 - 11:15 Digital
Friday 25.06. 09:45 - 11:15 Digital

Information

Aims, contents and method of the course

The course is aimed at providing students with the skills necessary to understand the sheer potential of the digital methods for the humanities, using the Python Programming Language for a handful of common tasks in the domain. The course will present a broad overview of methods and tools, specifically covering the following: OCR & Natural Language Processing (NLP) Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis (SNA), Sentiment Analysis, SQL and NoSQL Database Management. The course approach is both theoretical and practical, with an intense load of hands-on exercises. The students are expected to have familiarity with digital environments, and previous practice with programming is desired, but not mandatory.

Assessment and permitted materials

Course evaluation will be a combination of in-class participation (30%), weekly homework assignments (40%), and the final project (30%).

Minimum requirements and assessment criteria

Attendance is required; regular participation is the key to completing the course; all students must provide their computing environment; homework assignments must be submitted on time (some can be completed later as a part of the final project, but this must be discussed with the instructor whenever the issue arises); the final project must be submitted on time.

Examination topics

There is no examination for the course.

Reading list

Learning Python, 5th Edition by Mark Lutz, O'Reilly Media, 2013. ISBN 978-1-4493-5573-9.

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinny, O'Reilly Media, 2012. ISBN 978-1-4493-1979-3

Github Repository - https://github.com/rsouza/Python_Course

Programming historian → relevant courses
https://programminghistorian.org/en/lessons/

TED Talk - https://www.ted.com/talks/reshma_saujani_teach_girls_bravery_not_perfection

Association in the course directory

DH-S I

Last modified: Fr 02.04.2021 11:28