AccScience Publishing / IMO / Online First / DOI: 10.36922/IMO025420055
Cite this article
2
Download
35
Views
Journal Browser
Volume | Year
Issue
Search
News and Announcements
View All
ORIGINAL RESEARCH ARTICLE

Predicting DNA methylation from RNA-sequencing data in renal clear cell carcinoma: A deep variational autoencoder approach

Muhammad Salman1*
Show Less
1 Kabir Medical College, Gandhara University, Peshawar, Pakistan
Received: 17 October 2025 | Revised: 30 November 2025 | Accepted: 30 April 2026 | Published online: 12 June 2026
© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

While RNA-sequencing (RNA-seq) is cost-effective and widely available, obtaining comprehensive DNA methylation profiles remains resource-intensive. To bridge this gap, we developed a deep variational autoencoder (VAE) framework for predicting DNA methylation patterns directly from RNA-seq data in renal clear cell carcinoma. Using The Cancer Genome Atlas–Kidney Renal Clear Cell Carcinoma dataset— sourced from cBioPortal and selected for its clinical relevance in renal clear cell carcinoma and high-quality paired omics data—we implemented a rigorous pre-processing pipeline including quality control, missing data imputation, and RNA feature selection based on variance and predictive power. A patient-wise 80/20 train–test split was employed to prevent data leakage. A deep VAE (NetVAE) with residual encoder blocks was trained to learn a compressed latent representation of the methylome and reconstruct genome-wide beta values directly from RNA-seq input. On the independent test set (n = 55 patients), the NetVAE achieved R2 = 0.627, mean absolute error = 0.135, and Pearson correlation coefficient r = 0.792 (p < 0.001). Spearman correlation was 0.641 (p < 0.001). Scatter and density plots confirmed strong predictive power across the full range of beta values. This framework offers a practical, cost-effective method for inferring methylation landscapes from RNA-seq data, with significant potential for biomarker discovery and retrospective epigenetic studies in renal clear cell carcinoma, where direct methylation profiling is unavailable.

Keywords
DNA methylation
RNA sequencing
Machine learning
Renal clear cell carcinoma
Variational autoencoder
Epigenetics
Funding
None.
Conflict of interest
The author declares no conflicts of interest.
References
  1. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484– 492. doi: 10.1038/nrg3230
  2. Baylin SB, Jones PA. Epigenetic determinants of cancer. Cold Spring Harb Perspect Biol. 2016;8(9):a019505. doi: 10.1101/cshperspect.a019505
  3. Feinberg AP, Koldobskiy MA, Göndör A. Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nat Rev Genet. 2016;17(5):284–299. doi: 10.1038/nrg.2016.13
  4. Sharma S, Kelly TK, Jones PA. Epigenetics in cancer. Carcinogenesis. 2010;31(1):27–36. doi: 10.1093/carcin/bgp220
  5. Linehan WM, Ricketts CJ. The Cancer Genome Atlas of renal cell carcinoma: findings and clinical implications. Nat Rev Urol. 2019;16(9):539–552. doi: 10.1038/s41585-019-0211-5
  6. Cancer Genome Atlas Research Network. Comprehensive molecular characterisation of clear cell renal cell carcinoma. Nature. 2013;499(7456):43–49. doi: 10.1038/nature12222
  7. Ricketts CJ, De Cubas AA, Fan H, et al. The Cancer Genome Atlas comprehensive molecular characterisation of renal cell carcinoma. Cell Rep. 2018;23(1):313–326.e5. doi: 10.1016/j.celrep.2018.03.075
  8. Morris MR, Latif F. The epigenetic landscape of renal cancer. Nat Rev Nephrol. 2017;13(1):47–60. doi: 10.1038/nrneph.2016.168
  9. Boerno ST, Grimm C, Lehrach H, Schweiger MR. Next- Generation Sequencing Technologies for Dna Methylation Analyses in Cancer Genomics. Epigenomics. 2010;2(2):199- 207. doi: 10.2217/epi.09.50
  10. Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012;13(10):705–719. doi: 10.1038/nrg3273
  11. Hoadley KA, Yau C, Hinoue T, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173(2):291–304. doi: 10.1016/j.cell.2018.03.022
  12. Huang S, Chaudhary K, Garmire LX. More is better: recent progress in multi-omics data integration methods. Front Genet. 2017;8:84. doi: 10.3389/fgene.2017.00084
  13. Reel PS, Reel S, Pearson E, et al. Using machine learning approaches for multi-omics data analysis: a review. Biotechnol Adv. 2021;49:107739. doi: 10.1016/j.biotechadv.2021.107739
  14. Wagner JR, Busche S, Ge B, et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 2014;15(2):R37. doi: 10.1186/gb-2014-15-2-r37
  15. Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife. 2013;2:e00523. doi: 10.7554/eLife.00523
  16. Levy JJ, Titus AJ, Petersen CL, et al. MethylNet: an automated and modular deep learning approach for DNA methylation analysis. BMC Bioinform. 2020;21(1):108. doi: 10.1186/s12859-020-3443-8
  17. Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017;18(1):67. doi: 10.1186/s13059-017-1189-z
  18. Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol. 2015;19(1A):A68–A77. doi: 10.5114/wo.2014.47136
  19. Kourou K, Exarchos TP, Exarchos KP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8–17. doi: 10.1016/j.csbj.2014.11.005
  20. Ding W, Chen G, Shi T. Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis. Epigenetics. 2019;14(1):67–80. doi: 10.1080/15592294.2019.1568178
  21. Vasquez MM, Hu C, Roe DJ, et al. Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application. BMC Med Res Methodol. 2016;16(1):154. doi: 10.1186/s12874-016-0254-8
  22. Titus AJ, Gallimore RM, Salas LA, Christensen BC. Cell-type deconvolution from DNA methylation: a review of recent applications. Hum Mol Genet. 2017;26(R2):R216– R224. doi: 10.1093/hmg/ddx275
  23. Bank D, Koenigstein N, Giryes R. Autoencoders. In: Machine Learning for Data Science Handbook. Springer; 2023:353–374. doi: 10.1007/978-3-031-24628-9_16
  24. Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv. Preprint posted online December 20, 2013. doi: 10.48550/arXiv.1312.6114
  25. Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–8037. doi: 10.48550/arXiv.1912.01703
  26. Dong Y, Zhao H, Li H, Li X, Yang S. DNA methylation as an early diagnostic marker of cancer (Review). Biomed Rep. 2014;2(3):326-330. doi: 10.3892/br.2014.237
  27. Holder LB, Haque MM, Skinner MK. Machine learning for epigenetics and future medical applications. Epigenetics. 2017;12(7):505–514. doi: 10.1080/15592294.2017.1329068
  28. Grossman RL, Heath AP, Ferretti V, et al. Toward a shared vision for cancer genomic data. N Engl J Med. 2016;375(12):1109–1112. doi: 10.1056/NEJMp1607591
  29. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15– 21. doi: 10.1093/bioinformatics/bts635
  30. Heinz S, Texari L, Hayes MGB, et al. Transcription elongation can affect genome 3D structure. Cell. 2018;174(6):1522– 1536.e22. doi: 10.1016/j.cell.2018.07.047
  31. Yip SH, Sham PC, Wang J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief Bioinform. 2019;20(4):1583–1589. doi: 10.1093/bib/bby011
  32. Bibikova M, Barnes B, Tsan C, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–295. doi: 10.1016/j.ygeno.2011.07.007
  33. Price ME, Cotton AM, Lam LL, et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin. 2013;6(1):4. doi: 10.1186/1756-8935-6-4
  34. Chen YA, Lemire M, Choufani S, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8(2):203–209. doi: 10.4161/epi.23470
  35. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. doi: 10.1093/bioinformatics/17.6.520
  36. Arechederra M, Berasain C, Avila MA, Fernández-Barrena MG. Chromatin dynamics during liver regeneration. Semin Cell Dev Biol. 2020;97:38–46. doi: 10.1016/j.semcdb.2019.03.004
  37. Hu W, Guan L, Li M. Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network. PLoS Comput Biol. 2023;19(8):e1011370. doi: 10.1371/journal.pcbi.1011370
  38. Teschendorff AE, Relton CL. Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet. 2018;19(3):129–147. doi: 10.1038/nrg.2017.86
  39. Feng H, Jin P, Wu H. Disease prediction by cell-free DNA methylation. Brief Bioinform. 2019;20(2):585–597. doi: 10.1093/bib/bby029
  40. Zeng H, Gifford DK. Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res. 2017;45(11):e99. doi: 10.1093/nar/gkx177
  41. Arechederra M, Daian F, Yim A, et al. Hypermethylation of gene body CpG islands predicts high dosage of functional oncogenes in liver cancer. Nat Commun. 2018;9(1):3164. doi: 10.1038/s41467-018-05550-5
  42. Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–474. doi: 10.1038/nature26000
  43. Teschendorff AE, Menon U, Gentry-Maharaj A, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE. 2009;4(12):e8274. doi: 10.1371/journal.pone.0008274
  44. Li F, Liu S, Li K, et al. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med. 2023;160:107030. doi: 10.1016/j.compbiomed.2023.107030
  45. Aghziel A, Mahraz MA, Tairi H, Aherrahrou N. Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions. Brief Bioinform. 2025;26(5):bbaf468. doi: 10.1093/bib/bbaf468
  46. Teragawa S, Wang L, Liu Y. DeepPGD: A Deep Learning Model for DNA Methylation Prediction Using Temporal Convolution, BiLSTM, and Attention Mechanism. Int J Molec Sci. 2024;25(15):8146. doi: 10.3390/ijms25158146
  47. Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet. 2018;19(5):269–285. doi: 10.1038/nrg.2017.117
  48. Byron SA, Van Keuren-Jensen KR, Engelthaler DM, et al. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet. 2016;17(5):257–271. doi: 10.1038/nrg.2016.10
  49. Stirzaker C, Taberlay PC, Statham AL, Clark SJ. Mining cancer methylomes: prospects and challenges. Trends Genet. 2014;30(2):75–84. doi: 10.1016/j.tig.2013.11.004
  50. Issa JP. DNA methylation as a clinical marker in oncology. J Clin Oncol. 2012;30(20):2566–2568. doi: 10.1200/JCO.2012.42.1016
  51. Mitchell TJ, Turajlic S, Rowan A, et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell. 2018;173(3):611–623.e17. doi: 10.1016/j.cell.2018.02.020
  52. Pidsley R, Zotenko E, Peters TJ, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17(1):208. doi: 10.1186/s13059-016-1066-1
  53. Houseman EA, Accomando WP, Koestler DC, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 2012;13:86. doi: 10.1186/1471-2105-13-86
  54. Jew B, Alvarez M, Pai JK, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun. 2020;11(1):1971. doi: 10.1038/s41467-020-15816-6
  55. Laird PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet. 2010;11(3):191–203. doi: 10.1038/nrg2732
  56. Gao Y, Cui Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nat Commun. 2020;11(1):5131. doi: 10.1038/s41467-020-18918-3
  57. Ciriello G, Miller ML, Aksoy BA, et al. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45(10):1127–1133. doi: 10.1038/ng.2762
  58. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–674. doi: 10.1016/j.cell.2011.02.013
  59. Corces MR, Granja JM, Shams S, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898. doi: 10.1126/science.aav1898
  60. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765– 4774. doi: 10.48550/arXiv.1705.07874
  61. Berdasco M, Esteller M. Clinical epigenetics: seizing opportunities for translation. Nat Rev Genet. 2019;20(2):109– 127. doi: 10.1038/s41576-018-0074-2
Share
Back to top
Innovative Medicines & Omics, Electronic ISSN: 3060-8740 Print ISSN: 3060-8910, Published by AccScience Publishing