In this blog post, we will review a compilation of data sources and scientific literature associated with -omics data for Malaria research. To compile this article, our team has reviewed publicly available datasets generated using Next generation sequencing and case studies that demonstrate the utility of this sequencing technique. The objective of this review is to provide examples and starting points for students and researchers who are interested in bioinformatics for infectious diseases, and especially the genetic diversity, diagnosis and treatment of Malaria.
What Causes Malaria and What Interventions are Being Developed?
Protozoan Plasmodium parasites are the causative agents of malaria, a deadly disease that continues to afflict hundreds of millions of people every year. Infections with malaria parasites can be asymptomatic, with mild or severe symptoms, or fatal, depending on many factors such as parasite virulence and host immune status.
Types of malaria
Four kinds of malaria parasites infect humans: Plasmodium falciparum, P. vivax, P. ovale, and P. malariae. In addition, P. knowlesi, a type of malaria that naturally infects macaques in Southeast Asia, also infects humans, causing malaria that is transmitted from animal to human (“zoonotic” malaria). P. falciparum is the type of malaria that is most likely to result in severe infections and if not promptly treated, may lead to death. Although malaria can be a deadly disease, illness and death from malaria can usually be prevented.
Malaria can be treated with various drugs, with artemisinin-based combination therapies (ACTs) being the first-line choice. Recent advances in genetics and genomics of malaria parasites have contributed greatly to our understanding of parasite population dynamics, transmission, drug responses, and pathogenesis. However, knowledge gaps in parasite biology and host-parasite interactions still remain. Parasites resistant to multiple antimalarial drugs have emerged, while advanced clinical trials have shown partial efficacy for one available vaccine. At the same time, many genes for plasmodium falciparum genome are not fully explored, which requires further research. The complex pathology of malaria and the different stages of disease are also limiting vaccination and anti-malaria drug development efforts.
Recent advances in malaria genomics and epigenomics
Below is a chart of the past 2 decades of research into Malaria genomics.
The landmark of the completion of the genome sequence of a laboratory strain of Plasmodium falciparum (Pf) was achieved over a couple of decades ago. This has since been accompanied, thanks to plummeting costs and advances in next-generation sequencing (NGS) technologies, by the whole-genome sequencing (WGS) of a wide range of species representing all the major clades of the genus, although the genomes of all known human infectious Plasmodium species remain to be sequenced. However, the combination of NGS and WGS has enabled the development of innovative large-scale genomic studies, for example, for genomic epidemiology.
Where can you find Omics Data on Plasmodium falciparum (Pf)
The big data movement has led to major advances in our ability to assess vast and complex datasets related to the host and parasite during malaria infection. While host and parasite genomics and transcriptomics are often the focus of many computational efforts in malaria research, metabolomics represents another big data type that has great promise for aiding our understanding of complex host-parasite interactions that lead to the transmission of malaria. Recent analyses of the complement of metabolites present in human blood, skin and breath suggest that host metabolites play a critical role in the transmission cycle of malaria. Together, these genomic, transcriptomic, proteomic, metabolomic and metagenomic datasets are referred to as -omics.
1. ENA (European Nucleotide Archive) https://www.ebi.ac.uk/ena/
The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
By searching for the term “plasmodium”, we can find numerous results of omics data for this parasite: https://www.ebi.ac.uk/ena/browser/text-search?query=plasmodium
This includes over 170 assemblies, over 400 thousand sequences and almost 34,000 experiments with 30,000 sequencing runs. As a reference for genomes, annotation of genes and examples of transcriptomic studies, this is an invaluable resource for anyone interested in malaria research.
2. PlasmoDB (https://plasmodb.org/)
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB) is one of two Bioinformatics Resource Centers (BRCs) funded by the US National Institute of Allergy and Infectious Diseases (NIAID), with additional support from the Wellcome Trust (UK). The BRC program was initiated in 2004 to provide public access to computational platforms and analysis tools enabling collection, management, integration and mining of genomic information and other large-scale datasets relevant to infectious disease pathogens including their interaction with mammalian hosts and invertebrate vectors of disease. Two BRCs are currently funded: VEuPathDB focuses on eukaryotic pathogens and invertebrate vectors of infectious diseases, , encompassing data from prior BRCs devoted to parasitic species (EuPathDB), fungi (FungiDB) and vector species (VectorBase).
VEuPathDB supports informatics efforts focusing on kinetoplastida and fungal organisms with special emphasis on improving functional annotation for select genome sequences and families of genes. VEuPathDB provides access to diverse genomic and other large scale datasets related to eukaryotic pathogens and invertebrate vectors of disease . Organisms supported by this resource include (but are not limited to) the NIAID list of emerging and re-emerging infectious diseases.Genome Info & Stats provides a list of all organisms available in this website. Data Sets provides a list of all information in this website integrated into VEuPathDB, with relevant references.
All data on these websites are provided freely for public use, through the contributions of many researchers involved in generating genome sequences, functional genomics datasets, and additional information.
4. GeneDB (ftp://ftp.sanger.ac.uk/pub/genedb/)
A genome database containing the latest sequence data and annotation/curation for organisms sequenced by the Pathogen group. GeneDB is a genome database containing the latest sequence data and annotation/curation for organisms sequenced by the Pathogen group. GeneDB currently provides access to more than 40 genomes, at various stages of completion, from early access to partial genomes with automatic annotation through to complete genomes with extensive manual curation.
The processed data for Plasmodium Falciparum Genome is available online at GeneDB (Fasta & GFF)
5. Malaria Host-Pathogen Interaction Center (MaHPIC) http://www.systemsbiology.emory.edu/
The Malaria Host-Pathogen Interaction Center (MaHPIC) was established in September 2012 by the National Institute of Allergy and Infectious Diseases, part of the US National Institutes of Health via a five-year contract award (see original press release and MaHPIC news, data releases, and publications).
The MAHPIC team strives to deliver unique large datasets that hold potential for major breakthroughs that ultimately will advance our understanding of malaria and help diagnose, prevent or treat the disease. The MaHPIC has been innovative in its design of longitudinal infection studies, and using computational tools to integrate clinical, parasitological, functional genomics, proteomics, lipidomics and metabolomics data. The amount and types of malaria research data generated, integrated, and released for use by the broader scientific community has been unprecedented. (see data release updates and publications).
This research program involves clinical studies with human cohorts from many areas of the world, and nonhuman primate infections, all fundamental to developing and evaluating new malaria diagnostic tools, antimalarial drugs and vaccines for different types of malaria. The central unifying hypothesis of this project has been that “Non-Human Primate host interactions with Plasmodium pathogens as model systems provided insight.Please take a look at the publicly available Omics datasets from the experiments :http://www.systemsbiology.emory.edu/research/Public%20Data%20Releases/index.html
Multi Omics and Malaria
Since Pf infection manifests into a disease through a complex cycle of the parasite-vector-host interaction, data from multiple stages of disease as well as interactions with the host cells and its microbiome are needed to understand how the parasite replicates and evades the immune response. Multi-omics data or systems-biology level data typically refers to an assembly of various types of omics data generated from the same experiment. Here we will review a couple of examples that demonstrate the utility of multi-omics integration for malaria research.
Role of Metagenomics: Human Host Metabolome
Who Makes What? Complex Origins of Host Metabolites During Malaria Infection
Metabolites found in the blood, skin and breath of malaria-infected hosts do not all come from the host. The host, its infecting parasite and even its commensal microbes can all contribute to the metabolome during infection. A key question, therefore, about the metabolites identified in the aforementioned approaches is – are these metabolites coming from the parasite or the host (or the host’s microbiome)? Plasmodium exhibits certain metabolic pathways that are not present in the vertebrate host, and chemicals of such pathways are thus known to be produced by the parasite.
Bioinformatics tools are critically important in the analysis of big data that arises from ‘omic technologies. Metabolomics, particularly untargeted metabolomics, can yield a large amount of data on many unique metabolic features which must be sifted out and sorted through in order to determine meaning. A key question here is – how to find signal within the noise in a metabolomics data set?
A combination of targeted and untargeted approaches has been used to explore the question of host metabolites in the context of malaria. The strength of non targeted analyses is that these approaches enable the analysis of an extremely large set of metabolic features, allowing for unbiased assessments without setting an a priori hypotheses about what may be found. Unsupervised statistical analyses can be applied to these big datasets, such as principal components analysis (PCA) and hierarchical clustering analysis (HCA), and metabolic features that are highly associated with the phenomenon of interest can be identified based on their chemical features (e.g. mass:charge ratio). From that point, computational annotation approaches can be applied to give putative identities, and pathway analysis can be performed to test for significantly perturbed metabolic pathways. These approaches can be helpful in the context of exploring new and under-researched questions in the field. Ultimately, however, these approaches should be combined with targeted methods and reference compounds to confirm identities of any putative metabolites.
Reference: Mining the Human Host Metabolome Toward an Improved Understanding of Malaria Transmission.
Biomarker discovery is another key area in which bioinformatic approaches are applied to metabolomics data. While often used in the context of disease pathogenesis, metabolites could also potentially be identified as biomarkers to indicate malaria transmissibility.Potential biomarkers could include those metabolites which have already been found in multiple studies to be associated with enhanced transmissibility. Reference: Front. Microbiol., 14 February 2020 | https://doi.org/10.3389/fmicb.2020.00164
Role of Genomics
Driven by the availability of vast amounts of genome sequence data from Plasmodium species strains, relevant human populations of different ethnicities, and mosquito vectors, researchers can consider any biological component of the malarial process in isolation or in the interactive setting that is infection. In particular, considerable progress has been made in the area of population genomics, with Plasmodium falciparum serving as a highly relevant model. Such studies have demonstrated that genome evolution under strong selective pressure can be detected. These data, combined with reverse genetics, have enabled the identification of the region of the P. falciparum genome that is under selective pressure and the confirmation of the functionality of the mutations in the kelch13 gene that accompany resistance to the major frontline antimalarial, artemisinin. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0343-7
Detailed comparisons of DNA sequences shared between individuals or groups can allow the inference of relatedness among current members of a species and estimate when they may have been part of the same interbreeding population.
Evolutionary relationships of Plasmodium spp. Colors highlight Plasmodium spp. that infect humans (red), chimpanzees (blue) and gorillas (green). Four groups of Plasmodium spp. are shown, with subgenus designations indicated for primate parasites.
Role of Transcriptomics
Transcriptomic analyses of samples obtained in human challenge studies can also deepen our understanding of the immune responses preceding symptom onset, allowing characterization of innate immunity and early gene signatures, which may influence disease outcome. Gene expression analysis facilitates the identification of host factors contributing to disease susceptibility.
Biological function of expressed genes: Host pathogen interaction
To better understand host pathogens and the network correlating with the biological function RNA-seq transcriptomics is a popular technique to apply.RNA-seq, analysis of gene expression that helps us understand which genes, proteins and pathways are involved in host activity. Using such analysis, we can study how human cells of various tissues respond to viral entry.
Expression of Plasmodium homocircumflexum ( first cryptic avian malaria parasite) transcripts differentially expressed between the host bird species by using the hierarchical clustering. Heatmap of gene expression levels of 38 significant transcripts (rows) between parasites in different hosts (columns). Warmer colours indicate higher expression, and blue colours denote lower transcript expression.
Drug resistance. Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance Science. 2015 Jan 23;347(6220):431-5. doi: 10.1126/science.1260403. Epub 2014 Dec 11.
Analyzed the in vivo transcriptomes of 1043 P. falciparum isolates from patients with acute malaria and found that artemisinin resistance is associated with increased expression of unfolded protein response (UPR) pathways involving the major PROSC and TRiC chaperone complexes. https://pubmed.ncbi.nlm.nih.gov/25502316/
Role of Epigenomics
The central role of epigenetic regulation of gene expression and antigenic variation and developmental fate in P. falciparum is becoming ever clearer. Epigenetics lies at the very heart of gene expression, regulating access of the transcriptional machinery to chromatin via
- post-translational modifications (PTMs) of histones,
- nucleosome occupancy, and
- global chromatin architecture.
In the past decade, various histone PTMs have been identified throughout the Plasmodium life cycle and the existing catalog of modifications in Pf was recently extended to 232 distinct PTMs, 88 unique to Plasmodium. The majority of detected PTMs show dynamic changes across the intraerythrocytic developmental cycle (IDC), likely mirroring changes within chromatin organization linked to its transcriptional status. Methylation and acetylation of N-terminal histone tails are by far the most studied regulatory PTMs, linked either to a transcriptionally active chromatin structure (that is, euchromatin) or to transcriptionally inert heterochromatin.
Structural Data from Protein Data Bank (PDB)
The Protein Data Bank (PDB) was established as the 1st open access digital data resource in all of biology and medicine (Historical Timeline). It is today a leading global resource for experimental data central to scientific discovery. Through an internet information portal and downloadable data archive, the PDB provides access to 3D structure data for large biological molecules (proteins, DNA, and RNA). These are the molecules of life, found in all organisms on the planet.
Knowing the 3D structure of a biological macromolecule is essential for understanding its role in human and animal health and disease, its function in plants and food and energy production, and its importance to other topics related to global prosperity and sustainability.RCSB PDB (Research Collaboratory for Structural Bioinformatics PDB) operates the US data center for the global PDB archive, and makes PDB data available at no charge to all data consumers without limitations on usage (Policies). The Vision of the RCSB PDB is to enable open access to the accumulating knowledge of 3D structure, function, and evolution of biological macromolecules, expanding the frontiers of fundamental biology, biomedicine, and biotechnology.Recognized experts in fields, including but not limited to, structural biology, cell and molecular biology, computational biology, information technology, and education serve as advisors to the RCSB PDB.
A Case Study: PfCRT and Drug resistance
Plasmodium falciparum, the deadliest causative agent of malaria, has high prevalence in Nigeria. Drug resistance causing failure of previously effective drugs has compromised anti-malarial treatment. On this basis, there is need for proactive surveillance for resistance markers to the currently recommended artemisinin-based combination therapy (ACT), for early detection of resistance before it becomes widespread.This study assessed anti-malarial resistance genes polymorphism in patients with uncomplicated P. falciparum malaria in Lagos, Nigeria. Sanger and Next Generation Sequencing (NGS) methods were used to screen for mutations in thirty-seven malaria positive blood samples targeting the P. falciparum chloroquine-resistance transporter (Pfcrt), P. falciparum multidrug-resistance 1 (Pfmdr1), and P. falciparum kelch 13 (Pfk13) genes, which have been previously associated with anti-malarial resistance.
Omics Data Sets and Bioprojects for Drug Resistance
A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project.
Sample: Strain 3D7 was grown under blasticidin (BSD) pressure which resulted in a mutation at position c814t in the pfcrt coding sequence (PF3D7_0709000), causing the amino acid mutation L272F. The new strain is designated 3D7-L272F. Two additional SNPs were reported in 3D7-L272F: c5549g in PF3D7_1229100 and t1032a in PF3D7_1462400. https://www.ncbi.nlm.nih.gov/sra/SRX1716873[accn]
Plasmodium falciparum resistance to chloroquine, the former gold standard antimalarial drug, is mediated primarily by mutant forms of the ‘Chloroquine Resistance Transporter’ (PfCRT). These mutations impart upon PfCRT the ability to efflux chloroquine from the intracellular digestive vacuole, the site of drug action. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478492/
Structure and drug resistance of the Plasmodium falciparum transporter PfCRT
The emergence and spread of drug-resistant Plasmodium falciparum impedes global efforts to control and eliminate malaria. For decades, treatment of malaria has relied on chloroquine (CQ), a safe and affordable 4-aminoquinoline that was highly effective against intra-erythrocytic asexual blood-stage parasites, until resistance arose in Southeast Asia and South America and spread worldwide1. Clinical resistance to the chemically related current first-line combination drug piperaquine (PPQ) has now emerged regionally, reducing its efficacy2. Resistance to CQ and PPQ has been associated with distinct sets of point mutations in the P. falciparum CQ-resistance transporter PfCRT, a 49-kDa member of the drug/metabolite transporter superfamily that traverses the membrane of the acidic digestive vacuole of the parasite 3,4,5,6,7,8,9. The study presented the structure, at 3.2 Å resolution, of the PfCRT isoform of CQ-resistant, PPQ-sensitive South American 7G8 parasites, using single-particle cryo-electron microscopy and antigen-binding fragment technology.
Mutations that contribute to CQ and PPQ resistance localize primarily to moderately conserved sites on distinct helices that line a central negatively charged cavity, indicating that this cavity is the principal site of interaction with the positively charged CQ and PPQ. Binding and transport studies reveal that the 7G8 isoform binds both drugs with comparable affinities, and that these drugs are mutually competitive.
Images are Created by Chimera
The 7G8 isoform transports CQ in a membrane potential- and pH-dependent manner, consistent with an active efflux mechanism that drives CQ resistance5, but does not transport PPQ. Functional studies on the newly emerging PfCRT F145I and C350R mutations, associated with decreased PPQ susceptibility in Asia and South America, respectively 6,9, reveal their ability to mediate PPQ transport in 7G8 variant proteins and to confer resistance in gene-edited parasites. Structural, functional and in silico analyses suggest that distinct mechanistic features mediate the resistance to CQ and PPQ in PfCRT variants. These data provide atomic-level insights into the molecular mechanism of this key mediator of antimalarial treatment failures.
In this post, we compiled an overview of available omics data for anyone interested to study malaria and the pathogen causing this disease that affects millions of people around the world. The complex nature of this infectious parasite, the way it interacts with the host as well as vector organisms have cause major challenges in our ability to develop prevention as well as intervention strategies. Several countries around the world have focused their attention on addressing these issues in the coming decade. A major step in this direction will be the understanding of molecular principles behind the infectious process, the replication as well as transmission of malaria. We hope that anyone interested in this challenge will take the time to review the datasets shared here and contribute their effrots to treatment and eradication of malaria.
Disclaimer: This article/ blog post has been compiled for students, researchers, faculty and participants interested in Infectious Diseases research. The text and the images in this blog post have been adopted and made available from publically available online resources and we encourage our readers to visit the original research and peer-reviewed reviews cited in this blog.
How to Analyze Omics Data for Infectious Disease Research?
Bioinformatics for Infectious Diseases
Omics Logic “Bioinformatics for Infectious Diseases” is an online training program designed for biologists, clinicians and students that are interested in virology and immunology and would like to learn about the use of bioinformatics and big data for infectious disease research, diagnostics as well as drug and vaccine development. We will learn about bioinformatics for infectious disease research and present use-cases for bioinformatics analysis methods that are used in viral evolutionary studies, antiviral drug discovery as well as the emergence of drug resistance in bacterial pathogens.The role of bioinformatics in general and in the context of infectious diseases is to allow researchers to access, process, analyze, and visualize these kinds of data.
Watch the video to learn more:
If you are interested in learning more about the training and schedule a call, please fill out the form in this page: https://edu.tbioinfo.com/bioinformatics-for-infectious-diseases-2021