BSO: Binary Sailfish Optimization for feature selection in sentiment analysis
Sentiment analysis (SA) plays a critical role in various domains, providing valuable insights into public opinion regarding brands, products, and events. By leveraging this method, companies can enhance customer satisfaction through informed adjustments to their products. This study aims to implement sentiment analysis on user comments from online sales platforms. We propose and evaluate four machine learning (ML) algorithms alongside a deep learning (DL) model. Moreover, our dataset contains noise data that is unsuitable for classification, which negatively impacts performance. To address this issue, feature selection methods are employed to facilitate the algorithms in identifying meaningful patterns more effectively, thereby reducing computational time by focusing on the most contributive features within the dataset. In this context, we apply the binary variant of the Sailfish Optimization Algorithm (SOA), referred to as the Binary Sailfish Optimizer (BSO), as a feature selection technique tailored for our textual dataset, marking its inaugural application in sentiment analysis. To assess the effectiveness of the BSO, we conduct comparative analyses against four other optimization algorithms: Harmony Search (HS), Bat Algorithm (BA), Atom Search Optimization (ASO), and Whale Optimization algorithm (WOA). Our findings indicate that the BSO outperforms the existing algorithms, achieving an F-score of 0.91 while utilizing nearly half of the available features.
[1] Choudhary, M., & Choudhary, P.K. (2018). Sentiment analysis of text reviewing algorithm using data mining. 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT), 532–538.
[2] Khuc, V.N., Shivade, C., Ramnath, R., & Ramanathan, J. (2012). Towards building large-scale distributed systems for twitter sentiment analysis. Proceedings of the 27th annual ACM symposium on applied computing, 459–464.
[3] Subramaniyaswamy, V., Vijayakumar, V., Logesh, R., & Indragandhi, V. (2015). Unstructured data analysis on big data using map reduce. Procedia Computer Science, 50, 456–465.
[4] Kumar, K.L. Santhosh, Desai, J., & Majumdar, J. (2016). Opinion mining and sentiment analysis on online customer review. 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 1–4.
[5] Tang, X., Hao, B., Dang, X., Zhong, B., Wang, R., & Yan, Z. (2020). Text semantic understanding based on knowledge enhancement and multigranular feature extraction. 2020 Chinese Automation Congress (CAC), 337–341.
[6] Guzman, E., & Maalej, W. (2014). How do users like this feature? A fine-grained sentiment analysis of app reviews. 2014 IEEE 22nd International Requirements Engineering Conference (RE), 153–162.
[7] Kumar, A., Srinivasan, K., Cheng, W.-H., & Zomaya, A.Y. (2020). Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management, 57(1), 102141.
[8] Shi, H., Zhou, G., & Qian, P. (2010). An attribute-based sentiment analysis system. Information Technology Journal, 9(8), 1607–1614.
[9] Zeng, J., Li, F., Liu, H., Wen, J., & Hirokawa, S. (2016). A restaurant recommender system based on user preference and location in mobile environment. 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), 55–60.
[10] Zeng, Y., Bi, Y., Wang, J., & Lin, Y. (2015). Collaborative filtering recommendation algorithm optimization based on user attributes. 2015 8th International Symposium on Computational Intelligence and Design (ISCID), 1, 580–583.
[11] Wang, W., Wang, H.-W., & Meng, Y. (2014). The collaborative filtering recommendation based on sentiment analysis of online reviews. Xiton Gongcheng Lilun yu Shijian/System Engineerin Theory and Practice.
[12] Liu, J.-p., Wang, Y., & Yan, F.-h. (2010). An improved collaborative filtering recommendation algorithm. 2010 First International Conference on Networking and Distributed Computing, 194–198.
[13] Visalakshi, S., & Radha, V. (2014). A literature review of feature selection techniques and applications: Review of feature selection in data mining. 2014 IEEE International Conference on Computational Intelligence and Computing Research, 1–6.
[14] Jovi´c, A., Brki´c, K., & Bogunovi´c, N. (2015). A review of feature selection methods with applications. 2015 38th International Convention on Information and Communication Technology Electronics and Microelectronics (MIPRO), 1200–1205.
[15] S´anchez-Maro˜no, N., Alonso-Betanzos, A., & Tombilla-Sanrom´an, M. (2007). Filter methods for feature selection—a comparative study. International Conference on Intelligent Data Engineering and Automated Learning, 178–187.
[16] El Aboudi, N., & Benhlima, L. (2016). Review on wrapper feature selection approaches. 2016 International Conference on Engineering & MIS (ICEMIS), 1–5.
[17] Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 1157–1182.
[18] Karabo˘ga, D., & Akay, B. (2009). A comparative study of artificial bee colony algorithm. Applied Mathematics and Computation, 214(1).
[19] Dorigo, M., Birattari, M., & St¨utzle, T. (2006). Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4), 28–39.
[20] Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95- International Conference on Neural Networks, 4, 1942–1948.
[21] Fister, I., Fister Jr, I., Yang, X.-S., & Brest, J. (2013). A comprehensive review of firefly algorithms. Swarm and Evolutionary Computation, 13, 34–46.
[22] Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in Engineering Software, 95, 51–67.
[23] Shadravan, S., Naji, H.R., & Bardsiri, V.K. (2019). The Sailfish Optimizer: A novel natureinspired metaheuristic algorithm for solving constrained engineering optimization problems. Engineering Applications of Artificial Intelligence, 80, 20–34.
[24] Ahmad, A., Alzaidi, K., Sari, M., & Uslu, H. (2023). Prediction of anemia with a particle swarm optimization-based approach. An International Journal of Optimization and Control: Theories & Applications (IJOCTA), 13(2).
[25] Uzer, M.S., Yilmaz, N., & Inan, O. (2013). Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification. The Scientific World Journal, 2013(1), 419187.
[26] Peng, H., Ying, C., Tan, S., Hu, B., & Sun, Z. (2018). An improved feature selection algorithm based on ant colony optimization. IEEE Access, 6, 69203–69209.
[27] Bektur, G., & Aslan, H.K. (2024). Artificial bee colony algorithm for operating room scheduling problem with dedicated/flexible resources and cooperative operations. An International Journal of Optimization and Control: Theories & Applications (IJOCTA), 14(3), 193–207.
[28] Selvakumar, B., & Muneeswaran, K. (2019). Firefly algorithm based feature selection for network intrusion detection. Computers & Security, 81, 148–155.
[29] Aghdam, M.H., Ghasem-Aghaee, N., & Basiri, M.E. (2009). Text feature selection using ant colony optimization. Expert Systems with Applications, 36(3), 6843–6853.
[30] Marie-Sainte, S.L., & Alalyani, N. (2020). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University-Computer and Information Sciences, 32(3), 320–328.
[31] Basari, A.S.H., Hussin, B., Ananta, I.G.P., & Zeniarja, J. (2013). Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Engineering, 53, 453–462.
[32] Mitra, S., & Jenamani, M. (2021). Helpfulness of online consumer reviews: A multi-perspective approach. Information Processing & Management, 58(3), 102538.
[33] Liu, D., Li, J., Du, B., Chang, J., Gao, R., & Wu, Y. (2021). A hybrid neural network approach to combine textual information and rating information for item recommendation. Knowledge and Information Systems, 63, 621–646.
[34] Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80.
[35] Sharma, S.S., & Dutta, G. (2021). SentiDraw: Using star ratings of reviews to develop domain specific sentiment lexicon for polarity determination. Information Processing & Management, 58(1), 102412.
[36] Nagarajan, S.M., & Gandhi, U.D. (2019). Classifying streaming of Twitter data based on sentiment analysis using hybridization. Neural Computing and Applications, 31, 1425–1433.
[37] Rumelli, M., Akku,s, D., Kart, ¨ O., & I,sık, Z. (2019). Sentiment analysis in Turkish text with machine learning algorithms. In 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), 1–5.
[38] Hepsiburada.com. (2023). Hepsiburada.com. Online Alı,sveri,s Sitesi [online]. Available from: https://www.hepsiburada.com/ [Accessed 04-July-2023].
[39] Dehkharghani, R., Saygin, Y., Yanikoglu, B., & Oflazer, K. (2016). SentiTurkNet: A Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation, 50, 667–685.
[40] AKIN AA. (2023). Zemberek-NLP [online]. Available from: https://github.com/ahmetaa/zemberek-nlp[Accessed 04-July-2023].
[41] Zhao, H., Liu, Z., Yao, X., & Yang, Q. (2021). A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach. Information Processing & Management, 58(5), 102656.
[42] Demircan, M., Seller, A., Abut, F., & Akay, M.F. (2021). Developing Turkish sentiment analysis models using machine learning and e-commerce data. International Journal of Cognitive Computing in Engineering, 2, 202–207.
[43] Basiri, M.E., Nemati, S., Abdar, M., Cambria, E., & Acharya, U.R. (2021). ABCDM: An attentionbased bidirectional CNN-RNN deep model for sentiment analysis. Future Generation Computer Systems, 115, 279–294.
[44] Li, W., Zhu, L., Shi, Y., Guo, K., & Cambria, E. (2020). User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models. Applied Soft Computing, 94, 106435.
[45] Bilen, B., & Horasan, F. (2021). LSTM network based sentiment analysis for customer reviews. Politeknik Dergisi, 25(3), 959–966.
[46] Cai, Y., Ke, W., Cui, E., & Yu, F. (2022). A deep recommendation model of cross-grained sentiments of user reviews and ratings. Information Processing & Management, 59(2), 102842.
[47] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
[48] Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 426–434.
[49] Bao, Y., Fang, H., & Zhang, J. (2014). Topicmf: Simultaneously exploiting ratings and reviews for recommendation. Proceedings of the AAAI conference on artificial intelligence, 28(1).
[50] Zheng, L., Noroozi, V., & Yu, P.S. (2017). Joint deep modeling of users and items using reviews for recommendation. Proceedings of the tenth ACM international conference on web search and data mining, 425–434.
[51] Chen, C., Zhang, M., Liu, Y., & Ma, S. (2018). Neural attentional rating regression with reviewlevel explanations. Proceedings of the 2018 world wide web conference, 1583–1592.
[52] Sumathi, T., Karthik, S., & Marikkannan, M. (2014). Artificial bee colony optimization for feature selection in opinion mining. Journal of Theoretical & Applied Information Technology, 66(1).
[53] Wahyudi, M., & Kristiyanti, D.A. (2016). Sentiment Analysis Of Smartphone Product Review Using Support Vector Machine Algorithm-Based Particle Swarm Optimization. Journal of Theoretical & Applied Information Technology, 91(1).
[54] Yuvaraj, N., & Sabari, A. (2017). Twitter sentiment classification using binary shuffled frog algorithm. Intelligent Automation & Soft Computing, 23(2), 373–381.
[55] Naz, M., Zafar, K., & Khan, A. (2019). Ensemble based classification of sentiments using forest optimization algorithm. Data, 4(2), 76.
[56] Mustopa, A., Pratama, E. B., Hendini, A., Risdiansyah, D., & others. (2020). Analysis of user reviews for the pedulilindungi application on google play using the support vector machine and naïve bayes algorithm based on particle swarm optimization. 2020 Fifth International Conference on Informatics and Computing (ICIC), 1–7.
[57] Yıldırım, S., Yıldırım, G., & Alatas, B. (2021). A new plant intelligence-based method for sentiment analysis: Chaotic sunflower optimization. Computer Science, (Special), 35–40.
[58] Kristiyanti, D.A., & Wahyudi, M. (2017). Feature selection based on Genetic algorithm, particle swarm optimization and principal component analysis for opinion mining cosmetic product review. 2017 5th International Conference on Cyber and IT Service Management (CITSM), 1–6.
[59] Ahmad, S.R., & Bakar, A.A. (2019). Ant colony optimization for text feature selection in sentiment analysis. Intelligent Data Analysis, 23(1), 133–158.
[60] Trendyol. (2023). One Stop Fashion Shop, www.trendyol.com. [online]. Available from: http s://www.trendyol.com [Accessed 04-July-2023].
[61] n11. (2023). n11 - Online Alı,sveri,s Sitesi [online]. Available from: https://www.n11.com [Accessed 04-July-2023].
[62] Fırat University. (2004). Fırat University [online]. Available from: http://buyukveri.firat.edu. tr/veri-setleri/ [Accessed 04-July-2023].
[63] Fairlie, R., & Fossen, F.M. (2021). The early impacts of the COVID-19 pandemic on business sales. Small Business Economics, 1–12.
[64] Sayyida, S., Hartini, S., Gunawan, S., & Husin, S.N. (2021). The impact of the COVID-19 pandemic on retail consumer behavior. Aptisi Transactions on Management (ATM), 5(1), 79–88.
[65] Alexandropoulos, S.A.N., Kotsiantis, S.B., & Vrahatis, M.N. (2019). Data preprocessing in predictive data mining. The Knowledge Engineering Review, 34, e1.
[66] Kumar, V., & Minz, S. (2014). Feature selection: a literature review. SmartCR, 4(3), 211–229.
[67] Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of feature selection and feature extraction techniques in machine learning. 2014 Science and Information Conference, 372–378.
[68] Vijayarani, S., Ilamathi, M.J., & Nithya, M. (2015). Preprocessing techniques for text miningan overview. International Journal of Computer Science & Communication Networks, 5(1), 7–16.
[69] Nayak, A.S., Kanive, A.P., Chandavekar, N., & Balasubramani, R. (2016). Survey on preprocessing techniques for text mining. International Journal of Engineering and Computer Science, 5(6), 16875–16879.
[70] Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. Proceedings of the International Joint Conference on Neural Networks, 3, 1661–1666.
[71] Kaur, J., & Buttar, P.K. (2018). A systematic review on stopword removal algorithms. International Journal on Future Revolution in Computer Science & Communication Engineering, 4(4), 207–210.
[72] Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
[73] Jivani, A.G. (2011). A comparative study of stemming algorithms. International Journal of Computer Technology and Applications, 2(6), 1930–1938.
[74] Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 22–31.
[75] M.F. Porter. (2001). Snowball: A language for stemming algorithms [online]. Available from: http://snowball.tartarus.org/texts/introduction.html [Accessed 04-July-2023].
[76] Krovetz, R. (1993). Viewing morphology as an inference process. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191–202.
[77] Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. Proceedings of the First Instructional Conference on Machine Learning, 242(1), 29–48.
[78] Zhang, Y., & Mo, Y. (2021). Dynamic optimization of chemical processes based on modified sailfish optimizer combined with an equal division method. Processes, 9(10), 1806.
[79] Hammouti, I., Lajjam, A., Merouani, M., & Tabaa, Y. (2019). A modified sailfish optimizer to solve dynamic berth allocation problem in conventional container terminal. International Journal of Industrial Engineering Computations, 10(4), 491–504.
[80] Kumar, B.S., Santhi, S., & Narayana, S. (2022). Sailfish optimizer algorithm (SFO) for optimized clustering in wireless sensor network (WSN). Journal of Engineering, Design and Technology, 20(6), 1449–1467.
[81] Li, M., Li, Y., Chen, Y., & Xu, Y. (2021). Batch recommendation of experts to questions in community-based question-answering with a sailfish optimizer. Expert Systems with Applications, 169, 114484.
[82] Ghosh, K.K., Ahmed, S., Singh, P.K., Geem, Z.W., & Sarkar, R. (2020). Improved binary sailfish optimizer based on adaptive β-hill climbing for feature selection. IEEE Access, 8, 83548–83560.
[83] Siami-Namini, S., Tavakoli, N., & Namin, A.S. (2019). The performance of LSTM and BiLSTM in forecasting time series. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), 3285–3292.
[84] Huang, Y., Jiang, Y., Hasan, T., Jiang, Q., & Li, C. (2018). A topic BiLSTM model for sentiment classification. Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence, 143–147.
[85] Zhang, Y., & Rao, Z. (2020). n-BiLSTM: BiLSTM with n-gram features for text classification. 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), 1056–1059.
[86] Hameed, Z., & Garcia-Zapirain, B. (2020). Sentiment classification using a single-layered BiLSTM model. IEEE Access, 8, 73992–74001.
[87] Dwarampudi, M., & Reddy, N.V. (2019). Effects of padding on LSTMs and CNNs. arXiv preprint arXiv:1903.07288.
[88] Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:1912.06059.
[89] Yang, X.S. (2009). Harmony search as a metaheuristic algorithm. Music-inspired harmony search algorithm: theory and applications, 1–14.
[90] Yang, X.-S. (2010). A new metaheuristic batinspired algorithm. Nature inspired cooperative strategies for optimization (NICSO 2010), 65–74.
[91] Ali, E. (2014). Optimization of power system stabilizers using BAT search algorithm. International Journal of Electrical Power & Energy Systems, 61, 683–690.
[92] Geem, Z.W., Tseng, C.L., & Park, Y. (2005). Harmony search for generalized orienteering problem: best touring in China. Proceedings of the International Conference on Natural Computation, 741–750.
[93] Zhao, W., Wang, L., & Zhang, Z. (2019). Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowledge-Based Systems, 163, 283–304.
[94] Reddy, R.A., Rekha, J., Shirisha, M., Hussein, A.H.A., & Kassem, A.L-A. (2023). Semantic feature extraction-based Twitter sentiment analysis using atom search optimizer and ensemble classifier. Proceedings of the 2023 International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics (AIKIIE), 1-6.
[95] Tubishat, M., Abushariah, M.A.M., Idris, N., & Aljarah, I. (2019). Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence, 49, 1688–1707.