Machine Learning on Microbiome Data: Theory and Practice

Code and data for the workshop are available HERE

Co-organizers: Tatiana Lenskaia and Sambhawa Priya

The workshop will focus on the theory and application of machine learning to microbiome datasets, including bacterial and viral communities. The following topics will be addressed:

Introduction to supervised machine learning, and how to implement a machine learning workflow using R and Python.
Applying machine learning algorithms (such as Random Forest) on microbiome datasets for disease risk prediction in humans.
Introduction to bacteria-phage interactions, and how it contributes to bacterial pathogenicity.
Applying machine learning methods to predict bacterial pathogenicity induced by phages.

Target audience

This workshop is aimed at graduate students and postdocs (senior undergraduate students are also welcome). Attendees should have some familiarity with programming in R or Python.

Learning objectives

This workshop will provide an opportunity to explore machine learning in theory and practice, in the context of microbiome datasets. The attendees will learn how to:

Apply machine learning methods on genomic data from bacteria and viruses.
Implement a machine learning pipeline using R and Python on microbiome datasets. This will include data wrangling/preprocessing, training and cross-validation, prediction and visualization of prediction performance.
Perform biological interpretation of predicted outcome.

Session plan

We plan to conduct a 4-hour tutorial-style workshop that includes two sessions. In the first session, we will cover an introduction to supervised machine learning, and its application on human microbiome data, and in the second session, we will go over the application of machine learning in bacteria-phage interactions. Each session will be structured as a theory segment followed by a hands-on tutorial.

(~ 45 mins, Sambhawa) Introduction to supervised machine learning and its application to biological datasets. We will cover some basic concepts, terminology/notations, workflow of a machine learning pipeline, and model performance assessment. We will end with a brief overview of microbiome and its implications in human health and disease.
(~ 45 mins, Sambhawa) Hands-on tutorial on applying machine learning (using R) to predict disease status using human microbiome data.
(~ 45 mins, Tatiana) Background on bacteria-virus interactions, and how bacteriophage insertions in bacterial genomes can be used to predict pathogenicity of bacteria.
(~ 45 mins, Tatiana) Hands-on tutorial on applying machine learning (using python) to predict pathogenicity of bacteria using bacterial and viral genomic datasets.

Contact us

Email: glbio.mlmb@gmail.com

Please contact us directly with questions related to a specific section:

Tatiana Lenskaia (lensk010@umn.edu), Sambhawa Priya (priya030@umn.edu)