AccScience Publishing / TD / Online First / DOI: 10.36922/TD025190035
ORIGINAL RESEARCH ARTICLE

Highly specific and sensitive gene panels for cancer screening: First application of only-normal and only-tumor genes

Gabriel Gil1 Claudia Carricarte2 Julio C. Drake-Pérez3 Yasser Perera4,5 Augusto Gonzalez1*
Show Less
1 Department of Theoretical Physics, Institute of Cybernetics, Mathematics and Physics, Havana, Cuba
2 Group of Computation, Faculty of Biology, University of Havana, Cuba
3 Department of General Physics, Faculty of Physics, University of Havana, Cuba
4 Biomedical Research Division, Center for Genetic Engineering and Biotechnology, Havana, Cuba
5 China-Cuba Biotechnology Joint Innovation Center, Yongzhou Zhong Gu Biotechnology Co., Ltd, Yongzhou, Hunan, China
Tumor Discovery, 025190035 https://doi.org/10.36922/TD025190035
Received: 7 May 2025 | Revised: 6 June 2025 | Accepted: 20 June 2025 | Published online: 17 July 2025
© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

The traditional paradigm of gene expression dysregulation emphasizes log-fold differential expression, with differentially expressed genes presumed to play key roles in relevant biological processes. In cancer, where normal tissue and tumors occupy non-overlapping regions in gene expression space, we propose an alternative and broader framework based on differentially expressed only-tumor genes (T-genes) and non-differentially dysregulated only-normal genes (N-genes). N-genes exhibit expression intervals found exclusively in normal samples, while T-genes display intervals exclusive to tumor samples. These N- and T-genes serve as markers that can be combined into small gene panels capable of perfectly discriminating between normal and tumor tissues. In most cases, these panels highlight biologically significant properties, such as altered glutamine metabolism in tumors. We provide an inventory of perfect gene panels for 12 cancer types, with potential applications in diagnostics and immunotherapy. Significance: Highly specific and sensitive combinatorial gene panels for the identification of 12 types of solid tumors in humans were derived from RNA sequencing expression profiles reported by The Cancer Genome Atlas network (https://www.cancer.gov/ccg/research/genome-sequencing/tcga). The corresponding software is available at the GitHub repository https://github.com/gabriel-gil/GenePan. This study revisits the concept of cancer-related gene expression dysregulation by introducing N-genes and T-genes as novel dysregulation patterns that can be leveraged in diagnosis, tumor classification, and therapeutic interventions.

Keywords
Cancer
Combinatorial gene panel
Expression dysregulation
Only-normal genes
Only-tumor genes
Funding
The research was supported by the Financial and International Projects Office of the Ministry of Sciences, Cuba (project PN692LH007-095).
Conflict of interest
The authors declare that they have no competing interests.
References
  1. Collins FS, Morgan M, Patrinos A. The human genome project: Lessons from Large-scale biology. Science. 2003;300(5617):286-290. doi: 10.1126/science.1084564

 

  1. Chu Y, Corey DR. RNA sequencing: Platform selection, experimental design, and data interpretation. Nucleic Acid Ther. 2012;22(4):271-274. doi: 10.1089/nat.2012.0367

 

  1. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017;9(1):75. doi: 10.1186/s13073-017-0467-4

 

  1. The Cancer Genome Atlas Research Network, Weinstein J, Collisson E, et al. The cancer genome atlas pan-cancer analysis project. Nat Gen. 2013;45(10):1113-1120. doi: 10.1038/ng.2764

 

  1. Hutter C, Zenklusen JC. The cancer genome atlas: Creating lasting value beyond its data. Cell. 2018;173(2):283-285. doi: 10.1016/j.cell.2018.03.042

 

  1. The Cancer Genome Atlas Research Network. The Cancer Genome Atlas. 2006. Available from: https://www.cancer. gov/tcga [Last accessed on 2025 Apr 15].

 

  1. Cheng PF, Dummer R, Levesque MP. Data mining the cancer genome atlas in the era of precision cancer medicine. Swiss Med Wkly. 2015;145:w14183. doi: 10.4414/smw.2015.14183

 

  1. Liñares-Blanco, J, Pazos, A, Fernandez-Lozano, C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci. 2021;7:e584. doi: 10.7717/peerj-cs.584

 

  1. Li Q, Dai W, Liu J, Sang Q, Li YX, Li YY. Gene dysregulation analysis builds a mechanistic signature for prognosis and therapeutic benefit in colorectal cancer. J Mol Cell Biol. 2020;12(11):881-893. doi: 10.1093/jmcb/mjaa041

 

  1. Ali HEA, Lung PY, Sholl AB, et al. Dysregulated gene expression predicts tumor aggressiveness in African- American prostate cancer patients. Sci Rep. 2018;8(1):16335. doi: 10.1038/s41598-018-34637-8

 

  1. Mezlini AM, Das S, Goldenberg A. Finding associations in a heterogeneous setting: Statistical test for aberration enrichment. Genome Med. 2021;13(1):68. doi: 10.1186/s13073-021-00864-4

 

  1. Le Priol C, Azencott CA, Gidrol X. Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression. PLoS Comput Biol. 2023;19(3):e1010342. doi: 10.1371/journal.pcbi.1010342

 

  1. Li H, Khang TF. clrDV: A differential variability test for RNA-Seq data based on the skew-normal distribution. PeerJ. 2023;11:e16126. doi: 10.7717/peerj.16126

 

  1. Roberts AGK, Catchpoole DR, Kennedy PJ. Identification of differentially distributed gene expression and distinct sets of cancer-related genes identified by changes in mean and variability. NAR Genom Bioinform. 2022;4(1):lqab124. doi: 10.1093/nargab/lqab124

 

  1. Andreani TS, Itoh TQ, Yildirim E, Hwangbo DS, Allada R. Genetics of circadian rhythms. Sleep Med Clin. 2015;10(4):413-421. doi: 10.1016/j.jsmc.2015.08.007

 

  1. Gebert J, Motameny S, Faigle U, Forst CV, Schrader R. Identifying genes of gene regulatory networks using formal concept analysis. J Comput Biol. 2008;15(2):185-194. doi: 10.1089/cmb.2007.0107

 

  1. Choi V, Huang Y, Lam V, Potter D, Laubenbacher R, Duca K. Using formal concept analysis for microarray data comparison. J Bioinform Comput Biol. 2008;6(1):65-75. doi: 10.1142/s021972000800328x

 

  1. Motameny S, Versmold B, Schmutzler R. Formal Concept Analysis for the Identification of Combinatorial Biomarkers in Breast Cancer. In: Medina R, Obiedkov S, editors. Formal Concept Analysis. ICFCA 2008. Lecture Notes in Computer Science. Vol. 4933. Berlin, Heidelberg: Springer; 2008. p 229–240. doi: 10.1007/978-3-540-78137-0_17

 

  1. Amin II, Kassim SK, Hassanien A, Hefny HA. Formal Concept Analysis for Mining Hypermethylated Genes in Breast Cancer Tumor Subtypes. In: 12th International Conference on Intelligent Systems Design and Applications (ISDA). Kochi, India; 2012. p. 764-769. doi: 10.1109/ISDA.2012.6416633

 

  1. Kaytoue-Uberall M, Duplessis S, Napoli A. Using Formal Concept Analysis for the Extraction of Groups of Co-expressed Genes. In: Le Thi HA, Bouvry P, Pham Dinh T, editors. Modelling, Computation and Optimization in Information Systems and Management Sciences. MCO 2008. Communications in Computer and Information Science. Vol. 14. Berlin, Heidelberg: Springer; 2008. doi: 10.1007/978-3-540-87477-5_47

 

  1. Kaytoue M, Kuznetsov SO, Napoli A, Duplessis S. Mining gene expression data with pattern structures in formal concept analysis. Inf Sci. 2011;181(10):1989-2001. doi: 10.1016/j.ins.2010.07.007

 

  1. González-Calabozo JM, Valverde-Albacete FJ, Peláez- Moreno C. Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis. BMC Bioinform. 2016;17(1):374. doi: 10.1186/s12859-016-1234-z

 

  1. Singh PK, Kumar CA, Gani AA. Comprehensive survey on formal concept analysis, its research trends, and applications. Int J Appl Math Comput Sci. 2016;26(2):495-516. doi: 10.1515/amcs-2016-0035

 

  1. Raza K. Formal concept analysis for knowledge discovery from biological data. Int J Data Min Bioinform. 2017;18(4):281. doi: 10.1504/IJDMB.2017.088138

 

  1. Ferreira LM, Pinto CLN, Dias SM, Nobre CN, Zárate LE. Extraction of Conservative Rules for Translation Initiation Site Prediction Using Formal Concept Analysis. In: Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS). Vol. 1. SciTePress; 2017. p. 265-271. doi: 10.5220/0006326202650271

 

  1. Zhao M, Zhang S, Li W, Chen G. Matching biomedical ontologies based on formal concept analysis. J Biomed Semantics. 2018;9(1):11. doi: 10.1186/s13326-018-0178-9

 

  1. Roscoe S, Khatri M, Voshall A, Batra S, Kaur S, Deogun J. Formal concept analysis applications in bioinformatics. ACM Comput Surv. 2023;55(8):1-40. doi: 10.1145/3554728

 

  1. Maji P, Paul S. Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason. 2011;52(3):408-426. doi: 10.1016/j.ijar.2010.09.006

 

  1. Midelfart H, Komorowski J, Nørsett K, Yadetie F, Sandovik AK, Lægreid A. Learning rough set classifiers from gene expressions and clinical data. Fundam Inform. 2002;53(2):155-183. doi: 10.3233/FUN-2002-53204

 

  1. Dai J, Xu Q. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput. 2013;13(1):211-221. doi: 10.1016/j.asoc.2012.07.029

 

  1. Li D, Zhang W. Gene selection using rough set theory. In Wang GY, Peters JF, Skowron A, Yao Y, editors. Rough Sets and Knowledge Technology. RSKT 2006. Lecture Notes in Computer Science. Vol. 4062. Berlin, Heidelberg: Springer; 2006. p. 778-785. doi: 10.1007/11795131_113

 

  1. Mishra D, Dash R, Rath AK, Acharya M. Feature selection in gene expression data using principal component analysis and rough set theory. In: Arabnia HR, Tran QN, editors. Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology. Vol. 696. New York: Springer; 2011. p. 91-100. doi: 10.1007/978-1-4419-7046-6_10

 

  1. Pati SK, Das AK, Ghosh A. Gene Selection Using Multi-objective Genetic Algorithm Integrating Cellular Automata and Rough Set Theory. In: Panigrahi BK, Suganthan, PN, Das S, Dash SS, editors. Swarm, Evolutionary, and Memetic Computing. SEMCCO 2013. Lecture Notes in Computer Science. Vol. 8298. Cham: Springer; 2013. p. 144-155. doi: 10.1007/978-3-319-03756-1_13

 

  1. Zhang Q, Xie Q, Wang G. A survey on rough set theory and its applications. CAAI Trans Intell Technol. 2016;1(4):323-333. doi: 10.1016/j.trit.2016.11.001

 

  1. Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y. Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform. 2017;67:59-68. doi: 10.1016/j.jbi.2017.02.007

 

  1. Sun L, Zhang X, Xu J, Wang W, Liu R. A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered. 2018;9(1):144-151. doi: 10.1080/21655979.2017.1403678

 

  1. Saha S, Roy S, Ghosh A, Dey KN. Gene-Gene Interaction Analysis: Correlation, Relative Entropy and Rough Set Theory Based Approach. In: Bioinformatics and Biomedical Engineering: 6th International Work-Conference, IWBBIO 2018. Proceedings, Part II. Granada, Spain: Springer-Verlag; 2018. p. 397-408. doi: 10.1007/978-3-319-78759-6_36

 

  1. Patil S, Balmuri KR, Frnda J, Parameshachari BD, Konda S, Nedoma J. Identification of triple-negative breast cancer genes using rough set-based feature selection algorithm and ensemble classifier. Hum Centric Comput Inf Sci. 2022;12:54. doi: 10.22967/HCIS.2022.12.054

 

  1. Majumder S, Thakran Y, Pal V, Singh K. Fuzzy and rough set theory based computational framework for mining genetic interaction triplets from gene expression profiles for lung adenocarcinoma. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(6):3469-3481. doi: 10.1109/TCBB.2021.3120844

 

  1. Duntsch N, Gediga G. Modal-style Operators in Qualitative Data Analysis. In: 2002 IEEE International Conference on Data Mining Proceedings. Maebashi City, Japan; 2002. p. 155-162. doi: 10.1109/ICDM.2002.1183898

 

  1. Lai H, Zhang D. Concept lattices of fuzzy contexts: Formal concept analysis vs. rough set theory. Int J Approx Reason. 2009;50(5):695-707. doi: 10.1016/j.ijar.2008.12.002

 

  1. Pawlak, Z. Rough sets. Int J Comput Inf Sci. 1982;11(5):341-356. doi: 10.1007/BF01001956

 

  1. Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Dordrecht: Springer; 1991. doi: 10.1007/978-94-011-3534-4

 

  1. Jia X, Shang L, Zhou B, Yao Y. Generalized attribute reduct in rough set theory. Knowl Based Syst. 2016;91:204-218. doi: 10.1016/j.knosys.2015.05.017

 

  1. Zhang W. Attribute reduction theory and approach to concept lattice. Sci China Ser F Inf Sci. 2005;48(6):713-726. doi: 10.1360/122004-104

 

  1. World Health Organization. Cancer. Available from: https://www.who.int/news-room/factsheets/detail/cancer [Last accessed on 2025 April 15].

 

  1. Bengtsson M, Ståhlberg A, Rorsman P, Kubista M. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 2005;15(10):1388-1392. doi: 10.1101/gr.3820805

 

  1. Sha Y, Phan JH, Wang MD. Effect of Low-expression Gene Filtering on Detection of Differentially Expressed Genes in RNA-seq Data. In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2015. p. 6461. doi: 10.1109/EMBC.2015.7319872

 

  1. Fang Z, Martin J, Wang Z. Statistical methods for identifying differentially expressed genes in RNA-Seq experiments. Cell Biosci. 2012;2(1):26. doi: 10.1186/2045-3701-2-26

 

  1. Durães C, Pereira Gomes C, Costa JL, Quagliata L. Demystifying the discussion of sequencing panel size in oncology genetic testing. Eur Med J. 2022;7(2):68-77 doi: 10.33590/emj/22C9259

 

  1. Gonzalez A, Leon DA, Perera Y, Perez R. On the gene expression landscape of cancer. PLoS One. 2023;18(2):e0277786. doi: 10.1371/journal.pone.0277786

 

  1. Mesa-Rodríguez A, Gonzalez A, Estevez-Rams E, Valdes- Sosa PA. Cancer segmentation by entropic analysis of ordered gene expression profiles. Entropy (Basel). 2022;24(12):1744. doi: 10.3390/e24121744

 

  1. Gonzalez A, Quintela F, Leon DA, Bringas Vega ML, Valdes- Sosa P. Estimating the number of available states for normal and tumor tissues in gene expression space. Biophys Rep (NY). 2022;2(2):100053. doi: 10.1016/j.bpr.2022.100053

 

  1. Bradner JE, Hnisz D, Young RA. Transcriptional addiction in cancer. Cell. 2017;168(4):629-643. doi: 10.1016/j.cell.2016.12.013

 

  1. Li Q, Dai W, Liu J, Sang Q, Li YX, Li YY. DysRegSig: An R package for identifying gene dysregulations and building mechanistic signatures in cancer. Bioinformatics. 2021;37(3):429-430. doi: 10.1093/bioinformatics/btaa688

 

  1. Dalman MR, Deeter A, Nimishakavi G, Duan ZH. Fold change and p-value cutoffs significantly alter microarray interpretations. BMC Bioinform. 2012;13(Suppl 2):S11. doi: 10.1186/1471-2105-13-S2-S11

 

  1. Khamas A, Ishikawa T, Shimokawa K, et al. Screening for epigenetically masked genes in colorectal cancer using 5-Aza-2’-deoxycytidine, microarray and gene expression profile. Cancer Genomics Proteomics. 2012;9(2):67-75.

 

  1. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: Archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(D1):D991-D995. doi: 10.1093/nar/gks1193

 

  1. Liu J, Zheng ML, Shi PC, Cao YP, Zhang JL, Xie YP. SCARA5 is a novel biomarker in colorectal cancer by comprehensive analysis. Clin Lab. 2020;66(7).doi: 10.7754/Clin.Lab.2019.191015

 

  1. Xu JY, Zhang C, Wang X, et al. Integrative proteomic characterization of human lung adenocarcinoma. Cell. 2020;182(1):245-261.e17. doi: 10.1016/j.cell.2020.05.043

 

  1. Ruiz-Cordero R, Ma J, Khanna A, et al. Simplified molecular classification of lung adenocarcinomas based on EGFR, KRAS, and TP53 mutations. BMC Cancer. 2020;20(1):83. doi: 10.1186/s12885-020-6579-z

 

  1. Ren H, Ge DF, Yang ZC, Cheng ZT, Zhao SX, Zhang B. Integrated bioinformatics analysis identifies ALDH18A1 as a prognostic hub gene in glutamine metabolism in lung adenocarcinoma. Discov Oncol. 2025;16(1):1. doi: 10.1007/s12672-024-01698-3

 

  1. Zhang L, Zhao X, Wang E, Yang Y, Hu L, Xu H, Zhang B. PYCR1 promotes the malignant progression of lung cancer through the JAK-STAT3 signaling pathway via PRODH-dependent glutamine synthesize. Transl Oncol. 2023;32:101667. doi: 10.1016/j.tranon.2023.101667

 

  1. Liu S, Tian Y, Zheng Y, Cheng Y, Zhang D, Jiang J, Li S. TRIM27 acts as an oncogene and regulates cell proliferation and metastasis in non-small cell lung cancer through SIX3-β-catenin signaling. Aging (Albany NY). 2020;12(24):25564-25580. doi: 10.18632/aging.104163

 

  1. Wang Q, Zhu W, Xiao G, Ding M, Chang J, Liao H. Effect of AGER on the biological behavior of non-small cell lung cancer H1299 cells. Mol Med Rep. 2020;22(2):810-818. doi: 10.3892/mmr.2020.11176

 

  1. Gonzalez A, Nieves J, Leon DA, Bringas Vega ML, Valdes Sosa P. Gene expression rearrangements denoting changes in the biological state. Sci Rep. 2021;11(1):8470. doi: 10.1038/s41598-021-87764-0

 

  1. Nieves J, Gonzalez A. The geometry of normal tissue and cancer gene expression manifolds. Acta Biotheor. 2024;72(3):9. doi: 10.1007/s10441-024-09483-z
Share
Back to top
Tumor Discovery, Electronic ISSN: 2810-9775 Print ISSN: 3060-8597, Published by AccScience Publishing