Universität Wien

136010 UE Introduction to DH Tools and Methods (2021S)

Prüfungsimmanente Lehrveranstaltung

An/Abmeldung

Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").

Details

max. 25 Teilnehmer*innen
Sprache: Englisch

Lehrende

Termine (iCal) - nächster Termin ist mit N markiert

Topics/Dates

1) Overview on digital ecosystem for DH: OCR & NLP Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis, Sentiment Analysis, Temporal Series Analysis, SQL and NoSQL, Database Management, RDF triplestores, etc.

2) The basics of Programming:
IDEs and Digital Research Frameworks
Programming Languages: why Python?
First steps into programming
Hands on: Installing Anaconda and programming the first “Hello World”

3) The basics of Versioning:
Versioning Code: Git, Github, Gitlab
Versioning Data: Dolthub
Hands on: Installing Github Desktop. Full cycle on versioning files.

4) The basics of Python language I:
Python objects and methods
Python native structures: data containers
Hands on: small scripts with native structures.

5) The basics of Python language II:
Python control flow
Python functions
Python scopes
Hands on: Mining text with simple functions.

6) The basics of Python language III:
Python packages and modules
Python data persistence
Hands on: Exploring common packages from the standard library: os, time, sys, string, pickle, dill, random
Suggested readings:
Python generators and comprehensions

7)Advanced topics of Python language:
Python classes and OOP
Hands on: Creating classes to represent DH entities
Suggested readings:
Python decorators
Code Optimization

8) Topics in Exploratory Data Analysis (EDA) I
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

9) Topics in Exploratory Data Analysis (EDA) II
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn

10) NLP Intro:
Motivations, Tasks, Goals and Challenges
NLP and the Humanities
Python NLP Packages (NLTK, Spacy, TextBlob, Wordnet)
Hands on: Python NLP Pipeline for Corpus Acquisition and Cleaning - Information extraction and Scraping, Regular Expressions, Frequency Analysis, N-grams, POS Tagging, Syntax parsing, NER, Summarization

11) Topic Modeling,
Text Classification
Sentiment Analysis
Python NLP Packages (Gensim, PyLDAVis)
Hands on: Python NLP Pipelines for Text Representation (BoW, TfIDf)

12) Word Embeddings
Dimensionality Reduction
Text Visualization
Hands on: Detection of Biases in Corpora

13) Semantic Web Technologies
Knowledge Organization Systems
Thesauri and Ontologies
Hands on: Python and Text Annotation - TEI/XML (XML, LXML, XLPATH)

14) Social Network Analysis
Graphs
Knowledge Graphs,
Hands on: SNA with Python Graph Structures (NetworkX, PyVIS)

15) Where to go from here:
Temporal Series Analysis
Image analysis and Computer Vision
Machine Learning and Deep Learning for NLP
Text Classification
Text Clustering
Stylistic Analysis
Information Extraction
Information Retrieval Systems

Freitag 05.03. 09:45 - 11:15 Digital
Freitag 19.03. 09:45 - 11:15 Digital
Freitag 26.03. 09:45 - 11:15 Digital
Freitag 16.04. 09:45 - 11:15 Digital
Freitag 23.04. 09:45 - 11:15 Digital
Freitag 30.04. 09:45 - 11:15 Digital
Freitag 07.05. 09:45 - 11:15 Digital
Freitag 14.05. 09:45 - 11:15 Digital
Freitag 21.05. 09:45 - 11:15 Digital
Freitag 28.05. 09:45 - 11:15 Digital
Freitag 04.06. 09:45 - 11:15 Digital
Freitag 11.06. 09:45 - 11:15 Digital
Freitag 18.06. 09:45 - 11:15 Digital
Freitag 25.06. 09:45 - 11:15 Digital

Information

Ziele, Inhalte und Methode der Lehrveranstaltung

The course is aimed at providing students with the skills necessary to understand the sheer potential of the digital methods for the humanities, using the Python Programming Language for a handful of common tasks in the domain. The course will present a broad overview of methods and tools, specifically covering the following: OCR & Natural Language Processing (NLP) Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis (SNA), Sentiment Analysis, SQL and NoSQL Database Management. The course approach is both theoretical and practical, with an intense load of hands-on exercises. The students are expected to have familiarity with digital environments, and previous practice with programming is desired, but not mandatory.

Art der Leistungskontrolle und erlaubte Hilfsmittel

Course evaluation will be a combination of in-class participation (30%), weekly homework assignments (40%), and the final project (30%).

Mindestanforderungen und Beurteilungsmaßstab

Attendance is required; regular participation is the key to completing the course; all students must provide their computing environment; homework assignments must be submitted on time (some can be completed later as a part of the final project, but this must be discussed with the instructor whenever the issue arises); the final project must be submitted on time.

Prüfungsstoff

There is no examination for the course.

Literatur

Learning Python, 5th Edition by Mark Lutz, O'Reilly Media, 2013. ISBN 978-1-4493-5573-9.

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinny, O'Reilly Media, 2012. ISBN 978-1-4493-1979-3

Github Repository - https://github.com/rsouza/Python_Course

Programming historian → relevant courses
https://programminghistorian.org/en/lessons/

TED Talk - https://www.ted.com/talks/reshma_saujani_teach_girls_bravery_not_perfection

Zuordnung im Vorlesungsverzeichnis

DH-S I

Letzte Änderung: Fr 12.05.2023 00:16