Universität Wien

270148 PR Data science in metabolomics and proteomics (2020W)

3.00 ECTS (3.00 SWS), SPL 27 - Chemie
Continuous assessment of course work

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Details

max. 12 participants
Language: German

Lecturers

Classes

This will be a block course in the week of 16-20 November, 9am-4pm, with students working on problems from 11am onwards. This course will take place online; there will be pre-recorded videos as well as livestreamed lectures (Moodle/BBB).


Information

Aims, contents and method of the course

This course will be taught in English.

Aims: Modern LC-HRMS based metabolomics or proteomics studies tend to generate large experimental datasets (several GB of raw data). Especially in hypothesis-generating experimental approaches it is impossible to manually sight these data or to manually process it in an efficient, controlled, and repeatable and thus consistent manner. For this specific and tailored software tools and programs need to be used. Furthermore, the users of these tools need to know their inner working mechanisms (algorithms) in order to efficiently and reliably evaluate and judge the outcome of such a data evaluation.
In this lecture the students will get to know the fundamental approaches of untargeted metabolomics or proteomics experiments based on LC-HRMS measurements. Using a large metabolomics dataset, the students will carry out the initial data processing of the raw-data (peak-picking, grouping, optional retention-time alignment, integration of peak areas) with XCMS in the programming language R. The students will gain an understanding of the parameters and functions of the XCMS software package and be able to visualize and evaluate the detected compounds. Moreover, they will be able to query these compounds in a large database and export them into a comprehensive data matrix. Additionally, a rough overview of basic and currently frequently used statistical methods, which can be used to generated new biological hypothesis, will be presented.

Contents:
The lecture will cover the following topics:
• Introduction to the programming language R
• Import of LC-HRMS datasets into R
• Algorithms and functions of XCMS and CAMERA
• Explanation of parameters of XCMS
• Annotation of detected compounds with compounds databases
• Export of the detected compounds into a data matrix
• Brief overview of basic and advanced statistical methods using the evaluated dataset
The course is organized into a lecture part as well as practical work in R, where the students will evaluate the dataset themselves.

Methods: The contents of the course will be assisted via presentations, practical work either alone or in form of small groups, discussions between the students and student presentations among others.

The students will use a central R-Installation. Each student needs to bring a laptop, which can access the university’s network.

Prerequisites:
• Confident handling of PCs
• Confidence with MS Office Excel with a special focus on formulas

Assessment and permitted materials

Methods used to evaluate the students’ performances:
• Active participation during the course
• Written final exam of the (presented) contents
• Presentation of the carried-out dataset evaluation

Should it be necessary, students may be invited to an interview with the course instructor. This interview will then also count for the final mark.

Permitted aids: R-Documentation (offline)

Minimum requirements and assessment criteria

Students must be present during the first lecture of the course. Moreover, they have to participate in at least 75% of the course. Thus, students are not allowed to accumulate more than XX hours of absence during the entire course.

In order to positively complete the course, the students must reach at least 50% of the maximum possible points. Additionally, each evaluation criteria must be evaluated positively.

Assessment criteria:
Students can earn a maximum of 100 points during the lecture. These are divided into:
• Active participation: 30 points
• Final exam: 40 points
• Presentation of the dataset evaluation: 30 points

The final marks are:
• 1 (A): 100 - 89 points
• 2 (B): 88 - 76 points
• 3 (C): 75 - 63 points
• 4 (D): 62 - 50 points
• 5 (F): 49 - 0 points
(It is rounded in favour of the student)

Examination topics

The contents of the lectures and practical work.

Reading list

The programming language is open source and thus a large number of teaching materials are available free of charge for the students, which should be read and studied prior to the course. Some of these are:
https://www.statmethods.net/r-tutorial/index.html
https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf

XCMS and CAMERA are two commonly used software packages. They are also freely available and more information about them can be found at:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-504
https://pubs.acs.org/doi/10.1021/ac202450g
https://www.bioconductor.org/packages/release/bioc/vignettes/CAMERA/inst/doc/CAMERA.pdf

Association in the course directory

AN-2, BC-1, CHE II-1

Last modified: We 03.02.2021 09:31