• N.S. Prema 1

  • B. M. Shashikala 2

  • M. Veena 3

  • K. G. Chaithra 4

  1. 1 Department of Information Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, India
  2. 2 Department of Master of Computer Application, SJCE, JSS Science and Technology University, Mysuru, India
  3. 3 Department of Computer Science and Engineering, PES College of Engineering, Mandya, India
  4. 4 Department of Information Science and Engineering, Maharaja Institute of Technology, Mysuru, India

Abstract

Water quality is a critical determinant of ecological and public health, making its regular assessment essential for sustainable development. This study aims to estimate the Water Quality Index (WQI) using multiple water parameters—pH, temperature, dissolved oxygen (DO), conductivity, faecal coliform, and nitrate-nitrite nitrogen. The dataset, sourced from Kaggle, comprises water samples collected across 18 Indian states. A weighted arithmetic WQI approach is employed to compute the index values. To forecast WQI, four regression models, linear regression, decision tree, random forest, and gradient boosting, are applied. Model performance is evaluated using the coefficient of determination (R²). Among all models, gradient boosting achieved the highest prediction accuracy, with an R² value of 0.94, significantly outperforming the others. The results highlight the effectiveness of machine learning in modelling complex environmental parameters and forecasting water quality. This study demonstrates that data-driven approaches can support timely decision-making for water resource management and public health interventions.

Keywords

Subjects

 Artificial Intelligence

Borup, D., Christensen B. J., Mühlbach N. S., &. Nielsen M. S (2023). Targeting predictors in random forest regression. International Journal of  Forecasting, 39(2), 841-868. DOI: https://doi.org/10.1016/j.ijforecast.2022.02.010
Demir, S., & Sahin E. K. (2023). An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications, 35(4), 3173-3190. DOI: https://link.springer.com/article/10.1007/s00521-022-07856-4
Gorgan-Mohammadi, F., Rajaee, T., & Zounemat-Kermani, M. (2023). Decision tree models in predicting water quality parameters of dissolved oxygen and phosphorus in lake water. Sustainable Water Resources Management, 9(1), 1. DOI: https://link.springer.com/article/10.1007/s40899-022-00776-0
Grinsztajn, L., Oyallon, E., & Varoquaux, G.  (2022). Why do tree-based models still outperform deep learning on typical tabular data?. NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems, 35, 507-520. DOI: https://dl.acm.org/doi/10.5555/3600270.3600307
Abdullah Ababakr, F., Shakeri, S., Tand, E., & Kazemi, S. (2024). Interpolation Approaches to Groundwater Quality Mapping: Trends and Techniques in Erbil City. Advances in Civil Engineering and Environmental Science, 1(1), 48-62.DOI:  https://doi.org/10.22034/acees.2024.475804.1007
Kaggle (2020), Indian water quality data, edited. Retrieved [ December 21, 2024, Available at:  https://www.kaggle.com/datasets
Karangoda, R., & Nanayakkara, K. (2023), Use of the water quality index and multivariate analysis to assess groundwater quality for drinking purpose in Ratnapura district, Sri Lanka. Groundwater for Sustainable Development, 21, 100910. DOI: https://doi.org/10.1016/j.gsd.2023.100910
Li, W., Fang H., Qin G., Tan X., Huang Z., Zeng F., Du H., & Li, S. (2020). Concentration estimation of dissolved oxygen in Pearl River Basin using input variable selection and machine learning techniques. Science of Total Environment, 731, 139099. DOI: https://doi.org/10.1016/j.scitotenv.2020.139099
Loureiro, B., Gerbelot C., Cui, H., Goldt S., Krzakala F., Mezard, M., & Zdeborová, L. (2021). Learning curves of generic features maps for realistic datasets with a teacher-student model. Adv. Neural Inform. Process. Systems, 34, 18137-18151. DOI: 10.1088/1742-5468/ac9825
Mohammadpour R., Shaharuddin, S., Chang, C. K, Zakaria, N. A., Ghani, A. A., & Chan, N. W.  (2015). Prediction of water quality index in constructed wetlands using support vector machine. Environmental Science and  Pollution Research, 22, 6208-6219. DOI: https://link.springer.com/article/10.1007/s11356-014-3806-7
Nallakaruppan, M., Gangadevi, E., Shri, M. L., Balusamy, B., Bhattacharya, S., & Selvarajan, S. (2024). Reliable water quality prediction and parametric analysis using explainable AI models. Scientific Reports, 14(1), 7520. DOI: https://www.nature.com/articles/s41598-024-56775-y
Panigrahi, N., Patro S. G. K, Kumar R., Omar M., Ngan T. T., Giang N. L., Thu B. T., & Thang N. T.  (2023). Groundwater quality analysis and drinkability prediction using artificial intelligence. Earth Science Informatics, 16(2), 1701-1725. DOI: 10.1007/s12145-023-00977-x
Prasad, D. V. V., Venkataramana L. Y., Kumar P. S., Prasannamedha G., Harshana S., Srividya S. J, Harrinei K., & Indraganti S. (2022). Analysis and prediction of water quality using deep learning and auto deep learning techniques. Science of the Total Environment, 821, 153311. DOI: https://doi.org/10.1016/j.scitotenv.2022.153311
Quinn, N. W., Tansey, M. K., & Lu, J. (2021). Comparison of deterministic and statistical models for water quality compliance forecasting in the San Joaquin River basin. Cal. Water, 13(19), 2661. DOI: https://doi.org/10.3390/w13192661
Richards, L. A., Guo, S., Lapworth, D. J., White, D., Civil, W., Wilson, G. J., Lu, C., Kumar, A., Ghosh, A., & Khamis, K. (2023). Emerging organic contaminants in the River Ganga and key tributaries in the middle Gangetic Plain, India: Characterization, distribution & controls. Environmental Pollution, 327, 121626. DOI: https://doi.org/10.1016/j.envpol.2023.121626
Rufino, F., Busico G., Cuoco E., Darrah T. H., & Tedesco D. (2019). Evaluating the suitability of urban groundwater resources for drinking water and irrigation purposes: an integrated approach in the Agro-Aversano area of Southern Italy. Environmental Monitoring and Assessment, 191, 1-17. DOI: https://link.springer.com/article/10.1007/s10661-019-7978-y
Sharma, N., Sharma, R., & Jindal, N. (2021). Machine learning and deep learning applications vision. Global Transitions Proceedings, 2(1), 24-28. DOI: https://doi.org/10.1016/j.gltp.2021.01.004
Sharma, R., Kumar, V., Sharma, D. K., Sarkar, M., Mishra, B. K., Puri, V., Priyadarshini, I., Thong, P. H., Ngo, P. T. T., & Nhu, V. H. (2022). Water pollution examination through quality analysis of different rivers: a case study in India. Environment, Development and Sustainability, Dordrecht, 24(6), 7471-7492. DOI: https://doi.org/10.1007/s10668-021-01777-3
Shwartz-Ziv R., & Armon, A.  (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90. DOI: https://doi.org/10.1016/j.inffus.2021.11.011
Tung, T. M., & Yaseen, Z. M. (2020). A survey on river water quality modelling using artificial intelligence models: 2000–2020. Journal of Hydrology, 585, 124670. DOI: https://doi.org/10.1016/j.jhydrol.2020.124670
Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Safety and Environmental Protection, 169, 808-828. https://doi.org/10.1016/j.psep.2022.11.073
Wu, B., Tian, F., Zhang, M., Piao, S., Zeng, H., Zhu, W., Liu, J., Elnashar, A., & Lu, Y. (2022). Quantifying global agricultural water appropriation with data derived from earth observations. Journal of  Cleaner Production, 358, 131891. https://doi.org/10.1016/j.jclepro.2022.131891