What is RNA-seq?
RNA Sequencing, also known as RNA-seq, is a method for processing Next Generation Sequencing (NGS) data. RNA transcription occurs in an organism when genes are expressed, and the goal of RNA-seq is to determine the factors that cause genes to become more expressed at certain timepoints and in different environments. For example, a comparison between levels of RNA expression in an organism in two different environments can reveal how the organism reacts to different environments. Another example is to take snapshots of an organism’s gene expression levels at different timepoints. Extracted RNA samples are placed into an Illumina machine and it reads the RNA sequences, which are recorded as FastQ files. FastQ files contain ASCII text and read quality information. Some reads contain errors due to the Illumina sequencing method, so the first step in RNA-seq is error correction. To perform this, reads are aligned and the errors are identified and corrected. Afterwards the reads are mapped onto the genome and the ends of the genes (exons) are detected. Organisms can contain gene sequences with small variations called isoforms, these need to be constructed by detecting paths across links provided by the reads. After the reads are mapped, the expression levels of individual isoforms are measured. In some cases, differential expression is then performed, measuring change in expression levels over time. Finally, using data mining techniques like PCA and clustering, the results can be analyzed and interpreted.
Pine Biotech research partners at the Tauber Bioinformatics Research Center recently published updated results of this project data providing a good insight into the underlying biology of Grosmannia clavigera and its relationship with Dendroctonus ponderosae (mountain pine beetles).
Steps in Grosmannia clavigera Analysis
A Logical scheme of RNA-seq data analysis of Grosmannia clavigera using the T-BioInfo platform. Two RNA-seq analysis pipelines were used. In one pipeline depicted by blue arrows, the NGS reads were mapped on known transcripts, and expression levels of transcripts were calculated. In the red arrow pipeline, the NGS reads were mapped on G. clavigera contigs, and then transcripts were detected as fragments enriched by the mapped reads.
What is Grosmannia clavigera?
Grosmannia clavigera is a fungal pathogen of pine trees that is associated with Dendroctonus ponderosae (mountain pine beetles). When the mountain pine beetle infests a pine tree, the fungus is brought in beneath the bark. It contaminates the stem, eventually causing the tree to die. G. clavigera has a symbiotic relationship with the beetle and the larva it lays underneath the bark. The fungal pathogen maintains the sybiotic relationship with the beetle by converting the monoterpenes, which are normally toxic to most organisms, into a carbon source for the larva, which is essential for its survival.
What were the findings of the original publications?
A paper published by the University of British Columbia, called A specialized ABC efflux transporter GcABC-G1 confers monoterpene resistance to Grosmannia clavigera, a bark beetle-associated fungal pathogen of pine trees, showed that a fungal ABC efflux transporter (a number of proteins around the membranes of a cell which expel toxic substances from the cell) was a major mechanism by which Grosmannia clavigera copes with monoterpenes being introduced into its cells. In a followup paper, Gene Discovery for Enzymes Involved in Limonene Modification or Utilization by the Mountain Pine Beetle-Associated Pathogen Grosmannia clavigera, the same authors describe a second mechanism that allows Grosmannia to deal with monoterpenes, a method that modifies or degrades them. It was also suggested that the initial step of lemonene, one of the monoterpenes in pine oleoresin, degradation might be carried out by cytochrome P450. The authors made their data available publicly on the SRA database.
What did we do with T-BioInfo to get results?
All RNA-seq analysis follows through the same steps, but on T-BioInfo a user can compare existing and newly developed algorithms into flexible pipelines which allows for greater accu racy in error correction, mapping, genome annotation. Especially important are the machine learning algorithms used on the T-BioInfo platform that help to compress the resulting data. Using the data mining procedures of T-BioInfo, we were able to generate a network of associations between genes with strongly intra-linked sub-networks. The gene ontology and pathway annotation showed that the adaptation allowing Grosmannia clavigera to process and thrive in monoterpenes is based on the coordination of four cellular processes: specific stress response, intensive membrane remodeling and lipid biosynthesis, fatty acid catabolism, and active efflux of toxic compounds.
This work has since been published: Adaptation of the pine fungal pathogen Grosmannia clavigera to monoterpenes: Biochemical mechanisms revealed by RNA-seq analysis
Review of publication findings?
Tauber Bioinformatics Research Center performed RNA-seq analysis of raw NGS data from the G. clavigera projects mentioned above. The following results are summarized:
- TBRC findings support previous findings by earlier authors
- New transcriptome regulatory mechanisms and processes that may be important in the adaptation of G. clavigera to terpenes.
- Newly detected putative transcripts in the G. clavigera genome revealed additional suggested regulatory mechanisms that may play an important role in the tolerance of the fungus to terpenes.
If you find this project interesting, we have also included the steps and small datasets for you to run your own pipelines and complete data analysis similar to the researchers in the publication.