Clear cell renal cell carcinoma (ccRCC) tumours develop and progress via complex remodelling of the kidney epigenome, transcriptome, proteome, and metabolome. Given the subsequent tumour and inter-patient heterogeneity, drug-based treatments report limited success, calling for multi-omics studies to extract regulatory relationships, and ultimately, to develop targeted therapies. However, current methods are unable to extract nonlinear multi-omics perturbations that may define patient subpopulations.
Here, we present SiRCle (Signature Regulatory Clustering), a novel method to integrate DNA methylation, RNA-seq and proteomics data. Applying SiRCle to a case study of ccRCC, we disentangle the layer (DNA methylation, transcription and/or translation) where dysregulation first occurs and find the primary biological processes altered. Next, we detect regulatory differences between patient subsets by using a variational autoencoder (VAE) to integrate omics’ data. The integrated gene representations follow an approximate Gaussian distribution enabling statistical comparisons to elucidate variation between patient cohorts, for example to identify genes with differing multi-omic patterns between early and late-stage tumours. We corroborated our findings by identifying overexpressed immune cell-types within single cell data and methionine metabolism differences in metabolomic data based on the gene relationships extracted using SiRCle. While we apply SiRCle to a cohort of ccRCC patients, our method is broadly applicable to other cancers and complex diseases where understanding multi-omic perturbations is integral. Finally, we benchmark VAE based integration to six other methods across eight cancers finding the VAE either performs equivalently or better than existing approaches at extracting biological meaning from transcriptomic data.