Transcriptomics 3 is a course dedicated to advanced analysis methods of transcriptomic data that will allow us to find meaningful patterns in big datasets -especially complex patterns that are typical for RNA-seq experiments. The course builds upon what we learned in Transcriptomics 1 and 2. In Transcriptomics 1, we learned how to convert “raw reads” produced by a Next Generation Sequencer into a table of expression – and then visualize it to develop a hypothesis.
In Transcriptomics 2, we looked at statistical methods for determining differentially expressed elements in known groups of samples. We explored Student’s t-test, Bayesian methods such as Deseq and EdgeR – and Factor Regression Analysis to dissect the influence of multiple factors.
In Transcriptomics 3, we will turn to a new problem – the problem of complex patterns in big datasets. The complexity of gene expression patterns across a variety of samples makes it more challenging to apply “straightforward” methods of analysis. There are many unknowns, for example exactly how many groups we expect to find in a given dataset, as well as how to identify a set of genes that will consistently identify a specific class of samples based on a dataset we have analyzed. That is why in this course, we will explore different methods for identifying groups of samples without prior knowledge (clustering) and then examine methods for developing classifiers from known samples to classify unknown samples (classification). Together, these two approaches are referred to as “data mining” and “machine learning”. These are vague terms that will be clarified further in our series on Machine Learning for Biomedical Data.
In the previous dataset we used, we selected samples that were of 2 types of breast cancer. In this course, we will use another dataset, one that contains multiple cell lines of different subtypes of cancer taken from cell lines. The example is taken from a publication by Daemen et. al and a team from Genentech research, Modeling precision treatment of breast cancer. We will not repeat all of the analysis presented in the paper; rather we will re-analyze the dataset the paper presents and later will be able to compare the author’s and our own results. Let’s review this publication to get familiar with the data we will be analyzing and the questions this paper poses.
- Lectures 13
- Quizzes 2
- Duration 6 hours
- Skill level All levels
- Language English
- Students 3503
- Certificate Yes
- Assessments Yes
Data Details and Pre-Processing
Clustering: Unsupervised Analysis
Classification: Supervised Analysis
Introduction to Machine learning
Transcriptomics 3 encompasses the concepts of unsupervised and supervised analysis of biological data. This course contains explanations of various ML algorithms such as LDA,swLDA and SVM. We also learn about the development of classifiers from known samples to classify unknown samples. The course allows us to perform the hands-on practical on the t-bio-server platform where we can deal with a real world biological data for preparatory and other additional analysis.
Put on your Machine Learning goggles!
Transcriptomics-3 offers corresponding examples to run on the t-bio server platform along with great conceptual explanations to ML algorithms, such as; LDA, swLDA, and SVM. The course covers both supervised & unsupervised learning algorithms and uses an actual cancer-dataset to perform exploratory and predictive analysis on the same.
The ML for Biological Data
This course culminates into the actual algorithms of Machine Learning which when applied sequentially gives meaningful insight from the data. The carefully curated quiz acts as a benchmark to the learning process.
This course gave an detailed knowledge about clustering and the classification methods using a biological example from the publication .
Introduction to ML
Learn to analyze your transcriptomics data using Machine Learning techniques effectively!