Visualization in Bioinformatics

The development of modern acquisition technologies for biological processes comes along with a vast amount of data. This data may result from experimental measurements as well as simulations of processes of single molecules, the expression of genes, or models of complete populations. Analysis of such data is challenging, as it is oftentimes fraught with uncertainty and errors, heterogeneous, and high-dimensional. Therefore, modern and adequate visualization techniques are essential ingredients for the analysis of such data.

Visual analysis for gene expression data

Visualization of gene expression values of genes of the yeast Saccharomyces cerevisiae that are influenced by the cell cycle. Due to the coloring of gene profiles based on a statistical parameter, typical periodic patterns emerge.

DNA carries the genetic information of individuals, build of two linear strands of nucleic acids. A gene is a sequence of DNA that contains genetic information and can influence the phenotype of an organism. The genetic information in a genome is held within genes, and the complete set of this information in an organism is called its genotype. Based on modern high-throughput methods, by now many genomes of different organisms have been decoded completely.

Gene expression is the process by which information from a gene is transcriped into mRNA and is used in the synthesis functional gene products (translation), such as proteins. Proteins are involved in almost all functions of cells. Gene expression is a highly complex, precisely regulated process that allows the cell to react dynamically on environmental changes as well as its own changing needs. The mechanism of gene expression therefore operates as on/off-switch to control which genes of the cell are expressed, but also as volume control to increase or decrease the degree of gene expression.

During the last years, different technologies have been developed that facilitate the parallel measurement of gene expression of mRNA and proteins on genome-wide scale. Common technologies are microarrays and modern sequences techniques, which facilitate the analysis of gene expression of many different genes for different experimental conditions at a time. The aim of such measurements is oftentimes the comparison of gene expression between different cell types, e.g., the analysis of histoid-specific genes, the expression in healthy and diseased tissue, the influence of environmental changes on the gene expression, or the dependence of gene expression on the stage of the cell cycle. As a result, we obtain expression data for several thousands of genes under numerous different conditions, those manual analysis is not feasible due to the huge amount of data.


Visual Analysis of genome wide association studies (GWAS)

Using iHAT to find sequence positions correlated with virulence in 15 sequences of the neuraminidase protein of H5N1 influenza virus samples. The virulence is represented as meta information of the sequences.

In the search for single-nucleotide polymorphisms (SNPs), genome wide association studies have become an important technique for the identification of associations between genotype and phenotype of a diverse set of sequence-based data. Genome wide association studies (GWAS) are used to study the variation of genes between individuals (the genotype), and their association with a variety of complex traits (the phenotype), e.g. diabetes, heart disease, or arthritis. GWAS have become an established method to alleviate the identification of genetic risk factors of diseases, as they make use of recent technologies that allow a rapid and cost-effective analysis of genetic differences. The huge amount of data produced by GWAS implies a great challenge for data analysis and visualization. The identification of dependencies and correlations calls for adequate visual representations of the data as well as suitable interactions that enable a change of the view onto the data. The latter comprises focus and context techniques as well as techniques to show relevant or hide irrelevant information, e.g., the aggregation of information that can be meaningfully hierarchically organized.

We developed the tool iHAT as visual analytics tool for genome wide association studies. iHAT supports the visualization of multiple sequence alignments, associated metadata, and hierarchical clusterings.


Visual analysis of biological networks

Networks play a central role in the investigation of organisms. They are oftentimes used to model processes in biological systems, to represent interactions between and dependencies of biological entities, such as genes, transcripts, proteins, or metabolites. One mayor application domain of network-based analysis and visualization is the field of systems biology that tries to gain a holistic understanding of transformation and signaling processes in living organisms. Based on the ever increasing knowledge in biology, such networks become more and more complex and bigger. To tackle this problem of complexity and size and to facilitate the analysis and interpretation of such complex interaction networks, the development of suitable and appropriate visualization is indispensable.

For the visualization of biochemical reaction networks, usually standard graph drawing approaches are not suitable as they do not respect the conventions of the bioscience community. For that reason, lots of attention has been devoted to the development of automatic graph drawing methods for such networks. As a result, miscellaneous layout algorithms and a graphical notation standard (Systems Biology Graphical Notation) have been developed. Layout algorithms for biological networks need to evaluate different meta-information, e.g., the cell compartments within that the reactions take place.

Further challenges for the visualization of such networks are the temporal component as well as uncertainty of measured or simulated meta-data describing the biochemical reactions. The temporal component may be related to either the change structure of the graph, as e.g., some reactions occur only at particular time points during the cell cycle, or to the properties of reactions/interactions that may change, e.g., the expression level of genes. Uncertainties of contained within the meta-data describing elements or relations within the graph, such as expression values of genes (nodes) or fluxes between chemical species (edges), should be highlighted visually. We developed a the tool iVUN for the visual analysis of uncertain biochemical reaction networks.

To the top of the page