Gene-set enrichment analysis is a powerful bioinformatics tool to identify the functional processes underlying biological systems. This analysis is often used to functionally annotate gene lists derived from a range of workflows including but not limited to differential expression analysis. Most analyses result in hundreds of significantly enriched gene-sets. Biologists are then tasked with sifting through these lists of gene-sets and extracting relevant knowledge pertaining to their experiment. The process of selecting gene-sets of interest from these results is typically biased by the knowledge and expectations of the biologist and may miss results that could lead to novel hypotheses.
To address this issue, I develop a network-based analysis method that clusters the numerous gene-sets identified from gene-set enrichment analyses into broader biological themes. I then automatically annotate gene-set clusters using text-mining approaches. Harnessing the benefits of network analyses, vissE maps the results of enrichment analysis thus allowing biologists to understand the significance of biological themes identified. Additionally, vissE visualises common genes across gene-set clusters thus providing a common view of both genes and gene-sets.
I demonstrate the application of vissE on cancer and COVID-19 datasets generated using spatial transcriptomics technologies. In the COVID-19 dataset, we identified a strong DNA-damage phenotype in myocardial tissue following infection. A vissE analysis can assist biologists in identifying biological themes in their experiments to drive novel hypotheses. Visualisations generated using vissE combine gene-level statistics with condensed gene-set enrichment analysis results thus providing a more holistic view of the biological system being investigated.