Predicting DNA methylation from RNA-sequencing data in renal clear cell carcinoma: A deep variational autoencoder approach
While RNA-sequencing (RNA-seq) is cost-effective and widely available, obtaining comprehensive DNA methylation profiles remains resource-intensive. To bridge this gap, we developed a deep variational autoencoder (VAE) framework for predicting DNA methylation patterns directly from RNA-seq data in renal clear cell carcinoma. Using The Cancer Genome Atlas–Kidney Renal Clear Cell Carcinoma dataset— sourced from cBioPortal and selected for its clinical relevance in renal clear cell carcinoma and high-quality paired omics data—we implemented a rigorous pre-processing pipeline including quality control, missing data imputation, and RNA feature selection based on variance and predictive power. A patient-wise 80/20 train–test split was employed to prevent data leakage. A deep VAE (NetVAE) with residual encoder blocks was trained to learn a compressed latent representation of the methylome and reconstruct genome-wide beta values directly from RNA-seq input. On the independent test set (n = 55 patients), the NetVAE achieved R2 = 0.627, mean absolute error = 0.135, and Pearson correlation coefficient r = 0.792 (p < 0.001). Spearman correlation was 0.641 (p < 0.001). Scatter and density plots confirmed strong predictive power across the full range of beta values. This framework offers a practical, cost-effective method for inferring methylation landscapes from RNA-seq data, with significant potential for biomarker discovery and retrospective epigenetic studies in renal clear cell carcinoma, where direct methylation profiling is unavailable.
- Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484– 492. doi: 10.1038/nrg3230
- Baylin SB, Jones PA. Epigenetic determinants of cancer. Cold Spring Harb Perspect Biol. 2016;8(9):a019505. doi: 10.1101/cshperspect.a019505
- Feinberg AP, Koldobskiy MA, Göndör A. Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nat Rev Genet. 2016;17(5):284–299. doi: 10.1038/nrg.2016.13
- Sharma S, Kelly TK, Jones PA. Epigenetics in cancer. Carcinogenesis. 2010;31(1):27–36. doi: 10.1093/carcin/bgp220
- Linehan WM, Ricketts CJ. The Cancer Genome Atlas of renal cell carcinoma: findings and clinical implications. Nat Rev Urol. 2019;16(9):539–552. doi: 10.1038/s41585-019-0211-5
- Cancer Genome Atlas Research Network. Comprehensive molecular characterisation of clear cell renal cell carcinoma. Nature. 2013;499(7456):43–49. doi: 10.1038/nature12222
- Ricketts CJ, De Cubas AA, Fan H, et al. The Cancer Genome Atlas comprehensive molecular characterisation of renal cell carcinoma. Cell Rep. 2018;23(1):313–326.e5. doi: 10.1016/j.celrep.2018.03.075
- Morris MR, Latif F. The epigenetic landscape of renal cancer. Nat Rev Nephrol. 2017;13(1):47–60. doi: 10.1038/nrneph.2016.168
- Boerno ST, Grimm C, Lehrach H, Schweiger MR. Next- Generation Sequencing Technologies for Dna Methylation Analyses in Cancer Genomics. Epigenomics. 2010;2(2):199- 207. doi: 10.2217/epi.09.50
- Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012;13(10):705–719. doi: 10.1038/nrg3273
- Hoadley KA, Yau C, Hinoue T, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173(2):291–304. doi: 10.1016/j.cell.2018.03.022
- Huang S, Chaudhary K, Garmire LX. More is better: recent progress in multi-omics data integration methods. Front Genet. 2017;8:84. doi: 10.3389/fgene.2017.00084
- Reel PS, Reel S, Pearson E, et al. Using machine learning approaches for multi-omics data analysis: a review. Biotechnol Adv. 2021;49:107739. doi: 10.1016/j.biotechadv.2021.107739
- Wagner JR, Busche S, Ge B, et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 2014;15(2):R37. doi: 10.1186/gb-2014-15-2-r37
- Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife. 2013;2:e00523. doi: 10.7554/eLife.00523
- Levy JJ, Titus AJ, Petersen CL, et al. MethylNet: an automated and modular deep learning approach for DNA methylation analysis. BMC Bioinform. 2020;21(1):108. doi: 10.1186/s12859-020-3443-8
- Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017;18(1):67. doi: 10.1186/s13059-017-1189-z
- Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol. 2015;19(1A):A68–A77. doi: 10.5114/wo.2014.47136
- Kourou K, Exarchos TP, Exarchos KP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8–17. doi: 10.1016/j.csbj.2014.11.005
- Ding W, Chen G, Shi T. Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis. Epigenetics. 2019;14(1):67–80. doi: 10.1080/15592294.2019.1568178
- Vasquez MM, Hu C, Roe DJ, et al. Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application. BMC Med Res Methodol. 2016;16(1):154. doi: 10.1186/s12874-016-0254-8
- Titus AJ, Gallimore RM, Salas LA, Christensen BC. Cell-type deconvolution from DNA methylation: a review of recent applications. Hum Mol Genet. 2017;26(R2):R216– R224. doi: 10.1093/hmg/ddx275
- Bank D, Koenigstein N, Giryes R. Autoencoders. In: Machine Learning for Data Science Handbook. Springer; 2023:353–374. doi: 10.1007/978-3-031-24628-9_16
- Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv. Preprint posted online December 20, 2013. doi: 10.48550/arXiv.1312.6114
- Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–8037. doi: 10.48550/arXiv.1912.01703
- Dong Y, Zhao H, Li H, Li X, Yang S. DNA methylation as an early diagnostic marker of cancer (Review). Biomed Rep. 2014;2(3):326-330. doi: 10.3892/br.2014.237
- Holder LB, Haque MM, Skinner MK. Machine learning for epigenetics and future medical applications. Epigenetics. 2017;12(7):505–514. doi: 10.1080/15592294.2017.1329068
- Grossman RL, Heath AP, Ferretti V, et al. Toward a shared vision for cancer genomic data. N Engl J Med. 2016;375(12):1109–1112. doi: 10.1056/NEJMp1607591
- Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15– 21. doi: 10.1093/bioinformatics/bts635
- Heinz S, Texari L, Hayes MGB, et al. Transcription elongation can affect genome 3D structure. Cell. 2018;174(6):1522– 1536.e22. doi: 10.1016/j.cell.2018.07.047
- Yip SH, Sham PC, Wang J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief Bioinform. 2019;20(4):1583–1589. doi: 10.1093/bib/bby011
- Bibikova M, Barnes B, Tsan C, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–295. doi: 10.1016/j.ygeno.2011.07.007
- Price ME, Cotton AM, Lam LL, et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. 2013;6(1):4. doi: 10.1186/1756-8935-6-4
- Chen YA, Lemire M, Choufani S, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8(2):203–209. doi: 10.4161/epi.23470
- Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. doi: 10.1093/bioinformatics/17.6.520
- Arechederra M, Berasain C, Avila MA, Fernández-Barrena MG. Chromatin dynamics during liver regeneration. Semin Cell Dev Biol. 2020;97:38–46. doi: 10.1016/j.semcdb.2019.03.004
- Hu W, Guan L, Li M. Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network. PLoS Comput Biol. 2023;19(8):e1011370. doi: 10.1371/journal.pcbi.1011370
- Teschendorff AE, Relton CL. Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet. 2018;19(3):129–147. doi: 10.1038/nrg.2017.86
- Feng H, Jin P, Wu H. Disease prediction by cell-free DNA methylation. Brief Bioinform. 2019;20(2):585–597. doi: 10.1093/bib/bby029
- Zeng H, Gifford DK. Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res. 2017;45(11):e99. doi: 10.1093/nar/gkx177
- Arechederra M, Daian F, Yim A, et al. Hypermethylation of gene body CpG islands predicts high dosage of functional oncogenes in liver cancer. Nat Commun. 2018;9(1):3164. doi: 10.1038/s41467-018-05550-5
- Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–474. doi: 10.1038/nature26000
- Teschendorff AE, Menon U, Gentry-Maharaj A, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE. 2009;4(12):e8274. doi: 10.1371/journal.pone.0008274
- Li F, Liu S, Li K, et al. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med. 2023;160:107030. doi: 10.1016/j.compbiomed.2023.107030
- Aghziel A, Mahraz MA, Tairi H, Aherrahrou N. Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions. Brief Bioinform. 2025;26(5):bbaf468. doi: 10.1093/bib/bbaf468
- Teragawa S, Wang L, Liu Y. DeepPGD: A Deep Learning Model for DNA Methylation Prediction Using Temporal Convolution, BiLSTM, and Attention Mechanism. Int J Molec Sci. 2024;25(15):8146. doi: 10.3390/ijms25158146
- Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet. 2018;19(5):269–285. doi: 10.1038/nrg.2017.117
- Byron SA, Van Keuren-Jensen KR, Engelthaler DM, et al. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet. 2016;17(5):257–271. doi: 10.1038/nrg.2016.10
- Stirzaker C, Taberlay PC, Statham AL, Clark SJ. Mining cancer methylomes: prospects and challenges. Trends Genet. 2014;30(2):75–84. doi: 10.1016/j.tig.2013.11.004
- Issa JP. DNA methylation as a clinical marker in oncology. J Clin Oncol. 2012;30(20):2566–2568. doi: 10.1200/JCO.2012.42.1016
- Mitchell TJ, Turajlic S, Rowan A, et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell. 2018;173(3):611–623.e17. doi: 10.1016/j.cell.2018.02.020
- Pidsley R, Zotenko E, Peters TJ, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17(1):208. doi: 10.1186/s13059-016-1066-1
- Houseman EA, Accomando WP, Koestler DC, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 2012;13:86. doi: 10.1186/1471-2105-13-86
- Jew B, Alvarez M, Pai JK, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun. 2020;11(1):1971. doi: 10.1038/s41467-020-15816-6
- Laird PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet. 2010;11(3):191–203. doi: 10.1038/nrg2732
- Gao Y, Cui Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nat Commun. 2020;11(1):5131. doi: 10.1038/s41467-020-18918-3
- Ciriello G, Miller ML, Aksoy BA, et al. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45(10):1127–1133. doi: 10.1038/ng.2762
- Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–674. doi: 10.1016/j.cell.2011.02.013
- Corces MR, Granja JM, Shams S, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898. doi: 10.1126/science.aav1898
- Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765– 4774. doi: 10.48550/arXiv.1705.07874
- Berdasco M, Esteller M. Clinical epigenetics: seizing opportunities for translation. Nat Rev Genet. 2019;20(2):109– 127. doi: 10.1038/s41576-018-0074-2
