AccScience Publishing / AJWEP / Volume 21 / Issue 5 / DOI: 10.3233/AJW240056
RESEARCH ARTICLE

Water Quality Assessment using Machine Learning: A  Focus on Coliform Prediction in Water

Ishleen Kaur4 Archa Gulati1 Puneet Singh Lamba* Achin Jain2 Harsh Taneja3 Jessica Singh Syal2
Show Less
1 Ramjas College, University of Delhi, New Delhi, India
2 Bharati Vidyapeeth’s College of Engineering, New Delhi, India
3 Department of Computer Science & Engineering, Graphic Era (Deemed to be University), Dehradun, India
AJWEP 2024, 21(5), 19–26; https://doi.org/10.3233/AJW240056
Submitted: 27 January 2024 | Revised: 15 May 2024 | Accepted: 15 May 2024 | Published: 7 September 2024
© 2024 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

Water quality assessment is essential for safeguarding public health and protecting water resources. This study focused on predicting water quality, specifically the presence of total coliforms, using various machine-learning techniques. The present study utilises a publicly available dataset encompassing the geographical area of India consisting of various physical water quality parameters. Various regression techniques were applied to the dataset after appropriate pre-processing including feature selection and normalisation. The findings demonstrate that gradient boosting regression outperforms other methods, achieving high accuracy with mean absolute error (MAE) of 0.0349, mean squared error (MSE) of 0.0038, and root mean squared error (RMSE) of 0.0620. Conductivity and temperature emerged as the most influential factors in total coliform prediction, as revealed by feature importance analysis. These results contribute to water quality understanding, aiding water resource management for public health protection. By accurately predicting total coliform presence, proactive measures can be taken timely to mitigate and minimise health risks associated with microbial contamination.

Keywords
Water quality index
conductivity
temperature
machine learning
total coliform
regression
References

Avigliano, E. and N.F. Schenone (2015). Human health  risk assessment and environmental distribution of  trace elements, glyphosate, fecal coliform and total  coliform in Atlantic Rainforest Mountain rivers (South  America). Microchemical Journal, 122: 149-158. Doi:  https://doi.org/10.1016/j.microc.2015.05.004

Benos, L., Tagarakis, A.C., Dolias, G., Berruto, R., Kateris,  D. and D. Bochtis (2021). Machine learning in agriculture:  A comprehensive updated review. Sensors, 21(11): 3758.  doi: 10.3390/s21113758

Bui, D.T., Khosravi, K., Tiefenbacher, J., Nguyen, H.  and N. Kazakis (2020). Improving prediction of water  quality indices using novel hybrid machine-learning  algorithms. Science of the Total Environment, 721: 137612.  doi: https://doi.org/10.1016/j.scitotenv.2020.137612

Di Nunno, F., Zhu, S., Ptak, M., Sojka, M. and F. Granata  (2023). A stacked machine learning model for multi-step  ahead prediction of lake surface water temperature. Science  of The Total Environment, 890: 164323. doi: https://doi. org/10.1016/j.scitotenv.2023.164323

Gafri, H.F., Zuki, F.M., Aroua, M.K. and M.M. Bello  (2019). Enhancing the anti-biofouling properties of  polyethersulfone membrane using chitosan-powder  activated carbon composite. Journal of Polymers and the  Environment, 27: 2156-2166. Doi: 10.1007/s10924-019- 01505-z

Jasti, V.D.P., Kumar, G.K., Kumar, M.S., Maheshwari, V.,  Jayagopal, P., Pant, B., Karthick, A. and M. Muhibbullah  (2022). Relevant-based feature ranking (RBFR) method  for text classification based on machine learning  algorithm. Journal of Nanomaterials, 2022(1): 1-12. Doi:  10.1155/2022/9238968 Kadyan, S.,

Kumar, N., Lawaniya, R., Sharma, P.K., Arora,  B. and N. Tehri (2020). Rapid and miniaturized method  for detection of hygiene indicators, Escherichia coli and  coliforms, in dairy products. Journal of Food Safety, 40(5): 12839 doi: 10.1111/jfs.12839

Kaggle Datasets. Available at: https://www.kaggle.com/ datasets

Kang, J.K., Lee, D., Muambo, K.E., Choi, J.W. and J.E. Oh  (2023). Development of an embedded molecular structurebased model for prediction of micropollutant treatability in  a drinking water treatment plant by machine learning from  three years monitoring data. Water Research, 239: 120037.  https://doi.org/10.1016/j.watres.2023.120037

Kaur, I. and N. Kapoor (2016). Token based approach for  cross project prediction of fault prone modules. In: 2016  International Conference on Computational Techniques  in Information and Communication Technologies  (ICCTICT), New Delhi, India, pp. 215-221, doi: 10.1109/ ICCTICT.2016.7514581. 

Kaur, I., Narula, G.S. and V. Jain (2017). Differential  analysis of token metric and object oriented metrics for fault prediction. International Journal of Information  Technology, 9: 93-100. doi: 10.1007/s41870-017-0004-0

Liu, G., Tian, S., Xu, G., Zhang, C. and M. Cai (2023).  Combination of effective color information and machine  learning for rapid prediction of soil water content. Journal  of Rock Mechanics and Geotechnical Engineering,  15(9): 2441-2457. doi: https://doi.org/10.1016/j. jrmge.2022.12.029

Liu, M. and J. Lu (2014). Support vector machine―an  alternative to artificial neuron network for water quality  forecasting in an agricultural nonpoint source polluted  river? Environmental Science and Pollution Research, 21: 11036-11053. doi: 10.1007/s11356-014-3046-x

Madhav, S., Ahamad, A., Singh, A.K., Kushawaha, J.,  Chauhan, J.S., Sharma, S. and P. Singh (2020). Water  Pollutants: Sources and Impact on the Environment and  Human Health. In: Pooja, D., Kumar, P., Singh, P. and S.  Patil (eds). Sensors in Water Pollutants Monitoring: Role  of Material. Advanced Functional Materials and Sensors.  Springer, Singapore. https://doi.org/10.1007/978-981-15- 0671-0_4

Misaghi, F., Delgosha, F., Razzaghmanesh, M. and B. Myers  (2017). Introducing a water quality index for assessing  water for irrigation purposes: A case study of the Ghezel  Ozan River. Science of the Total Environment, 589: 107- 116. https://doi.org/10.1016/j.scitotenv.2017.02.226

Mittelmann, A.S., Ron, E.Z. and J. Rishpon (2002).  Amperometric quantification of total coliforms and specific  detection of Escherichia coli. Analytical Chemistry, 74(4): 903-907. doi: 10.1021/ac0156215

Pesce, S.F. and D.A. Wunderlin (2000). Use of water quality  indices to verify the impact of Córdoba City (Argentina)  on Suquıa River. Water Research, 34(11): 2915-2926.  https://doi.org/10.1016/S0043-1354(00)00036-1

Shekoohiyan, S., Hadadian, M., Heidari, M. and H.  Hosseinzadeh-Bandbafha (2023). Life cycle assessment  of Tehran municipal solid waste during the COVID-19  pandemic and environmental impacts prediction using  machine learning. Case Studies in Chemical and  Environmental Engineering, 7: 100331. https://doi. org/10.1016/j.cscee.2023.100331

Singh, P. and D.P. Singh (2023). Comparative Analysis of  Machine Learning Classifiers for Heart Disease Prediction  in Cloud Environment. In: 2023 10th International  Conference on Computing for Sustainable Global  Development (INDIACom), New Delhi, India, pp. 552- 556. 

Tomić, A.Š., Antanasijević, D., Ristić, M., Perić-Grujić, A.  and V. Pocajt (2018). A linear and non-linear polynomial  neural network modeling of dissolved oxygen content  in surface water: Inter-and extrapolation performance  with inputs’ significance analysis. Science of the  Total Environment, 610-618: 1038-1046. https://doi. org/10.1016/j.scitotenv.2017.08.192

Tripathi, M. and S.K. Singal (2019). Use of principal  component analysis for parameter selection for development  of a novel water quality index: A case study of river Ganga  India. Ecological Indicators, 96(1): 430-436. https://doi. org/10.1016/j.ecolind.2018.09.025

Wu, Z., Lai, X. and K. Li (2021). Water quality assessment of  rivers in Lake Chaohu Basin (China) using water quality  index. Ecological Indicators, 121: 107021. doi: https:// doi.org/10.1016/j.ecolind.2020.107021

Wu, Z., Wang, X., Chen, Y., Cai, Y. and J. Deng (2018).  Assessing river water quality using water quality  index in Lake Taihu Basin, China. Science of the Total  Environment, 612: 914-922. https://doi.org/10.1016/j. scitotenv.2017.08.293

Zidan, K., Sbahi, S., Hejjaj, A., Ouazzani, N., Assabbane,  A. and L. Mandi (2022). Removal of bacterial indicators  in on-site two-stage multi-soil-layering plant under arid  climate (Morocco): Prediction of total coliform content  using K-nearest neighbor algorithm, Environmental  Science and Pollution Research, 29: 75716-75729. Doi:  10.1007/s11356-022-21194-x

Share
Back to top
Asian Journal of Water, Environment and Pollution, Electronic ISSN: 1875-8568 Print ISSN: 0972-9860, Published by AccScience Publishing