M2Echem: A multilevel dual encoder-based model for predicting organic chemistry reactions

² Department of Electrical and Computer Engineering, State Key Laboratory of Internet of Things for Smart City, Faculty of Science and Technology, University of Macau, Macau, China

AIH 2026, 3(1), 88–103; https://doi.org/10.36922/AIH025260058

Received: 26 June 2025 | Revised: 21 July 2025 | Accepted: 23 July 2025 | Published online: 5 August 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Chemical reaction prediction is a vital application of artificial intelligence. While Transformer models are widely used for this task, they often overlook deeper-level semantic information. In addition, the traditional Transformer model suffers from a decline in prediction performance and shows poor generalization when faced with different representations of the same molecule. To address these challenges, we propose a dual encoder-based reaction prediction method tailored for multilevel organic chemistry. Our approach began with the introduction of synergistic dual-encoder architecture: The atomic encoder focused on inter-atomic attention weights. In contrast, the molecular encoder employed a molecular maximum dimension reduction algorithm to identify key chemical features. We then performed multilevel feature fusion by combining the outputs from both the atomic and molecular encoders. Finally, we applied an optimized contrast loss to enhance the model’s robustness. The results indicated that this method outperformed existing models across all four datasets, significantly improving generalization performance and contributing to advancements in artificial intelligence-driven drug development and research.

Graphical abstract

Keywords

Forward reaction prediction

Multilevel feature fusion

Machine learning

Simplified molecular input line entry system code

Transformer

Funding

This research was funded by the National Natural Science Foundation of China (62406153, 62471259, and 62371261), the General Program of the Natural Science Research of Higher Education of Jiangsu Province (23KJB520031), and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (SJCX25_2007).

Conflict of interest

Jiashuang Huang is the Youth Editorial Board Member of this journal, but was not in any way involved in the editorial and peer-review process conducted for this paper, directly or indirectly. Separately, other authors declared that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

References

Corey EJ, Wipke WT. Computer-assisted design of complex organic syntheses. Science. 1969;166(3902):178-192. doi: 10.1126/science.166.3902.178

Satoh H, Funatsu K. Further development of a reaction generator in the SOPHIA system for organic reaction prediction. Knowledge-guided addition of suitable atoms and/or atomic groups to product skeleton. J Chem Inform Comput Sci. 1996;36(2):173-184. doi: 10.1021/ci950058a

Coley CW, Barzilay R, Jaakkola TS, Green WH, Jensen KF. Prediction of organic reaction outcomes using machine learning. ACS Cent Sci. 2017;3(5):434-443. doi: 10.1021/acscentsci.7b00064

Duvenaud DK, Maclaurin D, Iparraguirre J, et al. Convolutional networks on graphs for learning molecular fingerprints. In: Proceedings of the 29th International Conference on Neural Information Processing Systems. Volume 2. California: Curran Associates, Inc.; 2015:2224-2232.

Raccuglia P, Elbert KC, Adler PD, et al. Machine-learning-assisted materials discovery using failed experiments. Nature. 2016;533(7601):73-76. doi: 10.1038/nature17439

Segler MH, Waller MP. Modelling chemical reasoning to predict and invent reactions. Chemistry. 2017;23(25):6118-6128. doi: 10.1002/chem.201604556

Weininger DJ. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Infrom Comput Sci. 1988;28(1):31-36. doi: 10.1021/ci00057a005

Schwaller P, Gaudin T, Lanyi D, Bekas C, Laino TJ. “Found in translation”: Predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem Sci. 2018;9(28):6091-6098. doi: 10.1039/c8sc02339e

Vaswani A, Shazeer N, Parmar N, et al. Attention is All you Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010.

Schwaller P, Laino T, Gaudin T, et al. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci. 2019;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576

Tang G, Müller M, Rios A, Sennrich R. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2018:4263-4272. doi: 10.18653/v1/d18-1458

Wu F, Fan A, Baevski A, Dauphin YN, Auli M. Pay Less Attention with Lightweight and Dynamic Convolutions. arXiv. Preprint posted online 2019. doi: 10.48550/arXiv.1901.10430

Schwaller P, Probst D, Vaucher AC, et al. Mapping the space of chemical reactions using attention-based neural networks. Nat Mach Intell. 2021;3(2):144-152. doi: 10.1038/s42256-020-00284-w

Mellah Y, Kocaman V, Haq HU, Talby D. Efficient schema-less text-to-SQL conversion using large language models. Artif Intell Health. 2024;1(2):96-106. doi: 10.36922/aih.2661

Mumtaz U, Ahmed A, Mumtaz S. LLMs-Healthcare: Current applications and challenges of large language models in various medical specialties. Artif Intell Health. 2024;1(2):16-28. doi: 10.36922/aih.2558

Bran AM, Schwaller P. Transformers and large language models for chemistry and drug discovery. In: Drug Development Supported by Informatics. Berlin: Springer; 2024. p. 143-163.

Leon M, Perezhohin Y, Peres F, Popovič A, Castelli M. Comparing SMILES and SELFIES tokenization for enhanced chemical language modeling. Sci Rep. 2024;14(1):25016. doi: 10.1038/s41598-024-76440-8

Xiong J, Zhang W, Wang Y, et al. Bridging chemistry and artificial intelligence by a reaction description language. Nat Mach Intell. 2025;7(5):782-793. doi: 10.1038/s42256-025-01032-8

Lo A, Pollice R, Nigam A, White AD, Krenn M, Aspuru- Guzik AJ. Recent advances in the self-referencing embedded strings (SELFIES) library. Dig Discov. 2023;2(4):897-908. doi: 10.1039/D3DD00044C

Ucak UV, Ashyrmamatov I, Lee J. Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization. J Cheminform. 2023;15(1):55. doi: 10.1186/s13321-023-00725-9

Wu Z, Jiang D, Wang J, et al. Knowledge-based BERT: A method to extract molecular features like computational chemists. Brief Bioinform. 2022;23(3):bbac131. doi: 10.1093/bib/bbac131

Chen S, Jung Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au. 2021;1(10):1612-1620. doi: 10.1021/jacsau.1c00246

Liu Z, Zhang W, Xia Y, et al. MolXPT: Wrapping Molecules with Text for Generative Pre-training. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics; 2023:1606-1616. doi: 10.18653/v1/2023.acl-short.138

Lu J, Zhang Y. Unified deep learning model for multitask reaction predictions with explanation. J Chem Inform Model. 2022;62(6):1376-1387. doi: 10.1021/acs.jcim.1c01467

Guo W, Wang J, Wang S. Deep multimodal representation learning: A survey. IEEE Access. 2019;7:63373-63394. doi: 10.1109/ACCESS.2019.2916887

Ma S, Zhang D, Zhou M. A Simple and Effective Unified Encoder for Document-Level Machine Translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020:3505-3511. doi: 10.18653/v1/2020.acl-main.321

Zhang X, Li P, Li H. AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2008.11869

Zhu J, Xia Y, Wu L, et al. Incorporating BERT into Neural Machine Translation. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2002.06823

Jin D, Jin Z, Zhou JT, Szolovits P. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(05):8018-8025. doi: 10.1609/aaai.v34i05.6311

Singh S, Shingatgeri V, Srivastava P. Revolutionizing new drug discovery: Harnessing AI and machine learning to overcome traditional challenges and accelerate targeted therapies. Artif Intell Health. 2024;2(2):29-40. doi: 10.36922/aih.4423

Gao T, Yao X, Chen D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv. Preprint posted online 2021. doi: 10.48550/arXiv.2104.08821

Chen X, Alamro H, Li M, et al. Target-aware Abstractive Related Work Generation with Contrastive Learning. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery; 2022:373-383. doi: 10.1145/3477495.3532065

Miculicich L, Ram D, Pappas N, Henderson J. Document- Level Neural Machine Translation with Hierarchical Attention Networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Belgium: Association for Computational Linguistics; 2018:2947–2954.

Mao A, Mohri M, Zhong Y. Cross-Entropy Loss Functions: Theoretical Analysis and Applications. In: Proceedings of the 40th International Conference on Machine Learning. 2023:23803-23828.

Jiang S, Zhang Z, Zhao H, et al. When SMILES smiles, practicality judgment and yield prediction of chemical reaction via deep chemical language processing. IEEE Access. 2021;9:85071-85083. doi: 10.1109/ACCESS.2021.3083838

Lowe D. Chemical reactions from US patents (1976- Sep2016). Published online 2017. doi: 10.6084/M9.FIGSHARE.5104873

Liu B, Ramsundar B, Kawthekar P, et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent Sci. 2017;3(10):1103-1113. doi: 10.1021/acscentsci.7b00303

Jin W, Coley C, Barzilay R, Jaakkola TJ. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:2604-2613.

Papineni K, Roukos S, Ward T, Zhu WJ. BLEU. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02. Association for Computational Linguistics; 2001:311-318. doi: 10.3115/1073083.1073135

Wang T. Research on chemical reaction prediction model based on Fairseq. In: 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE). IEEE; 2021:167-171. doi: 10.1109/icbase53849.2021.00039

Bjerrum EJ. SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. arXiv. Preprint posted online 2017. doi: 10.48550/arXiv.1703.07076

Tetko IV, Karpov P, Van Deursen R, Godin G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat Commun. 2020;11(1):5575. doi: 10.1038/s41467-020-19266-y

Khalifa AA, Haranczyk M, Holliday J. Comparison of nonbinary similarity coefficients for similarity searching, clustering and compound selection. J Chem Inf Model. 2009;49(5):1193-1201. doi: 10.1021/ci8004644

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing