I have been a bioinformatics student throughout my academic career as I majored in bioinformatics for my undergraduate and graduate studies. Later I did my Ph.D. in computational chemistry and biology where I extensively used numerous bioinformatics tools to analyze data, implemented machine learning algorithms to develop predictive applications and performed a lot of structural modelling based long range simulation that generates a large volume of data to answer different biological questions. After finishing my Ph.D., I joined a company that provided bioinformatics services to the healthcare industry by collaborating with biotech and pharmaceutical companies working towards drug development. You can see my profile here: https://edu.t-bio.info/members/mazumder-mohit/
The Transcriptomics Series – Online Courses on RNA-Seq
Transcriptomics 1 (T1)
A transcriptome is a commonly used word, especially in today’s bioinformatics world, but limited or no information has been added about the transcriptomic analysis in bioinformatics courses in India. It is still a jargon word for many students that study bioinformatics like me. I joined the course with limited information about the analysis of next-generation sequencing data (NGS) or high throughput sequencing data. My understanding of the transcriptome comes from biology and for the analysis of data, I always thought of such analysis to be comparative i.e. to understand difference based on comparison with a reference genome.
What did I learn from the first Transcriptomics course:
Large-scale biology projects such as the human genome project made genomic sequencing possible at scale. Today, we can generate gene expression data using a similar technique, called RNA-Se. This type of sequencing and microarrays before that are extensively used in a variety of research projects. As a result, there is an abundance of data for biologists to study and try to understand what is happening at the gene expression level.
However, the challenge facing scientists is analyzing and even accessing this data to extract useful information pertaining to the system of interest. By taking this course, I have learned about the kind of raw data we get by looking at the example data sets. This course also talks about how we can select the best approaches for analysis to get meaningful insights from data. The introduction to transcriptomics course gives a brief overview of the techniques widely used to obtain and analyze raw data to a comparative table of expression. This course also familiarizes the user (me) to the t-bioinfo platform. In my personal experience, I usually read about tools & techniques on the web via some scientific article, presentation, video or other media. Here, for the first time, I am reading about the transcriptomics data and also understanding the data, using the data to generate a table of expression all of these are hands-on practical. T1 uses PDX cancer dataset on breast cancer biomarkers as an analysis example which is relevant to biologist and it’s good to develop an understanding with respect to disease. The practical aspect of the transcriptomics course imprints a lot of confidence to the user to go out and do independent analysis in very short or no time. This is extremely useful not only for the students but also for the researchers who want to answer a question using a multi-disciplinary approach which is a common practice nowadays. The course along with the platform helps users to adapt quickly to the data type and analytics that injects a lot of confidence and overall understanding to address the unique biological question.
The tools in the pipeline are a mixture of popular open-access software’s (Tophat, Bowtie, cufflinks) and Tauber’s in-house tools many of them use GPU for faster processing. This course focuses on employing existing bioinformatic resources and focuses on the analysis of data rather than developing new tools. Although it does generate interest amongst everyone to access a wealth of data to answer questions relevant to the research question. The nature of the course is highly hands-on, that uses the t-bioinfo platform which has a dedicated section for RNA Seq analysis. This course is useful to any student considering graduate school in the biological and healthcare sciences, as well as postgraduate students, Ph.D. students, and researchers.
Transcriptomics 2 (T2)
The Transcriptomics 1 course prepares a student that has some basic understanding of cell and molecular biology to learn about next generation sequencing data and RNA-Seq data analysis. After learning about the quantitative and qualitative analysis of RNA and handling of sample data (Breast cancer PDX model), T1 also shows the visualization of data using statistical methods such as principal component analysis. Transcriptomics 2 (T2) is in continuation with T1 and both use the same PDX breast cancer data set.
Transcriptomics 2 is more technical than Transcriptomics 1 and focuses on the analysis of differential gene expression. One good thing that I would like to point out is that the analysis of data (which is a major objective of T2) has been covered in a logical manner. The tutorial starts with a basic statistical test called student’s t-test that is used to find statistical differences between groups. But as it is a basic statistical approach, it has a limitation to compare more than two groups. After making the student understand the limitations of one approach, the course moves on to explain more advanced data analysis methods and different statistical approaches; hence justifying the need for machine learning algorithms for data analysis. The tutorials explain the methods used in much detail so that anyone who goes through the content can replicate the analysis. Followed by the t-test, the tutorial shows the use of simple and user-friendly T-BioInfo platform where tools such as DESeq2 and EDGER are used to run a differential gene expression pipeline followed by factor regression analysis. I found the most interesting part of the tutorial is when you can correlate your results to biology with the significant set of genes that were identified by using the platform which is right towards the end.
What can be improved?
It was an exciting learning experience finishing T1 & T2. In my opinion, what I felt was missing is a bit more information on every tool shown in the pipeline with some short case study. I think this could be an additional part of the supplementary section. We could explain the methods by comparing several pipelines while addressing some biological question.
Transcriptomics 3 (T3)
Once I finished the first 2 transcriptomics courses, Transcriptomics 3 was much more fun as it introduces to the methods for data mining and classification. Although T3 builds on the concepts learned in the previous courses; it uses a more detailed and comprehensive dataset. The dataset is taken from a study by Daemen et al. where the author uses 90 therapeutic compounds on 70 breast cancer cell lines. The objective of the study was to understand and develop criteria for precision modeling in breast cancer using molecular profiling and drug efficacy data. Authors used datasets containing RNA-Seq, exome-seq, genome-wide methylation, and protein abundance data. It was exciting to see how this course utilizes public domain datasets. Using the analysis tools to analyze public-domain research RNA-Seq data made me interested in other projects I could working on. Searching on my own, I was able to find many other interesting datasets on NCBI and started working on my independent project with a group of students.
What methods did I enjoy learning about?
The course gives a great review of methods like Quantile normalization and Principal Component Analysis (PCA) using this new dataset as an example. PCA is a powerful machine learning method useful for analysis and visualization. Following the tutorials provided, I was able to understand PCA in hands-on sessions. Next sections were dedicated to supervised and unsupervised machine learning approaches that show the use of clustering and classification methods in detail.