
تعداد نشریات | 163 |
تعداد شمارهها | 6,878 |
تعداد مقالات | 74,135 |
تعداد مشاهده مقاله | 137,876,113 |
تعداد دریافت فایل اصل مقاله | 107,235,213 |
A Hybrid Machine Learning Model Based on Deep Learning for Air Quality Prediction | ||
Pollution | ||
دوره 11، شماره 4، دی 2025، صفحه 1199-1215 اصل مقاله (691.9 K) | ||
نوع مقاله: Original Research Paper | ||
شناسه دیجیتال (DOI): 10.22059/poll.2025.388743.2750 | ||
نویسندگان | ||
Mohammad Reza Mehregan1؛ Mohammad Taghi Taghavifard2؛ Amir Mohammad Khani1؛ Arman Rezasoltani* 1؛ Mohammad Ali Nikkhah1 | ||
1Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran | ||
2Department of Information Technology and Operations Management, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran | ||
چکیده | ||
Air pollution is a major global challenge, significantly and directly affecting public health, urban sustainability, and environmental policy. Accurate air quality prediction has increasingly become essential to address the challenges posed by environmental adversities. This study proposes a novel hybrid machine learning model that combines deep learning and advanced ensemble techniques to improve air quality prediction. This model combines Deep Neural Network (DNN), along with ensemble learning algorithms such as XGBoost, CatBoost, LightGBM, and Random Forest as a metamodel to aggregate the predictions. The model was tested on a dataset that included environmental aspects ranging from PM2.5, PM10, CO, and NO2 variables to socio-economic variables such as proximity to industrial areas and population density. Feature selection and data imbalance were handled using RFECV and SMOTE, respectively. The tuning of the hyperparameters in the model was done using both TPE implemented by Optuna and Bayesian optimization by Keras-Tuner. This model can achieve a remarkable accuracy of 97.34%, which is superior to conventional approaches. The results present a case for building hybrid machine learning techniques for air quality prediction as a basis for intelligent global environmental monitoring in an interpretable, accurate, and scalable manner. Future work can integrate the real-time incoming data from the Internet of Things (IOT) and extend the model concept for multi-prediction benchmarks to other environmental indices, thus broadening its horizon and applicability to upcoming global environmental challenges. | ||
کلیدواژهها | ||
Air Quality؛ Deep Learning؛ Ensemble Learning؛ Environmental Monitoring؛ Hybrid Model؛ Hyperparameter Optimization | ||
مراجع | ||
Agbehadji, I. E., & Obagbuwa, I. C. (2024). Systematic Review of Machine Learning and Deep Learning Techniques for Spatiotemporal Air Quality Prediction. Atmosphere, 15(11), 1352. https://doi.org/https://doi.org/10.3390/atmos15111352 Araveeporn, A. (2022). Comparing the linear and quadratic discriminant analysis of diabetes disease classification based on data multicollinearity. International Journal of Mathematics and Mathematical Sciences, 2022(1), 1-12. https://doi.org/https://doi.org/10.1155/2022/7829795 Arifuzzaman, M., Hasan, M. R., Toma, T. J., Hassan, S. B., & Paul, A. K. (2023). An advanced decision tree-based deep neural network in nonlinear data classification. Technologies, 11(1), 1-24. https://doi.org/https://doi.org/10.3390/technologies11010024 Awad, M., & Fraihat, S. (2023). Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. Journal of Sensor and Actuator Networks, 12(5), 67. https://doi.org/https://doi.org/10.3390/jsan12050067 Beaulac, C., & Rosenthal, J. S. (2020). BEST: A decision tree algorithm that handles missing values. Computational Statistics, 35(3), 1001-1026. https://doi.org/https://doi.org/10.1007/s00180-020-00987-z Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967. https://doi.org/https://doi.org/10.1007/s10462-020-09896-5 Bhanja, S., & Das, A. (2021). A hybrid deep learning model for air quality time series prediction. Indonesian Journal of Electrical Engineering and Computer Science, 22(3), 1611-1618. https://doi.org/https://doi.org/10.11591/ijeecs.v22.i3.pp1611-1618 Bhardwaj, D., & Ragiri, P. R. (2024). A Deep Learning Approach to Enhance Air Quality Prediction: Comparative Analysis of LSTM, LSTM with Attention Mechanism and BiLSTM. 2024 IEEE Region 10 Symposium (TENSYMP), Can, R., Kocaman, S., & Gokceoglu, C. (2021). A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam, Turkey. Applied Sciences, 11(11), 4993. https://doi.org/https://doi.org/10.3390/app11114993 Chang, Y.-S., Abimannan, S., Chiao, H.-T., Lin, C.-Y., & Huang, Y.-P. (2020). An ensemble learning based hybrid model and framework for air pollution forecasting. Environmental Science and Pollution Research, 27, 38155-38168. https://doi.org/https://doi.org/10.1007/s11356-020-09855-1 Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of applied science and technology trends, 2(01), 20-28. https://doi.org/https://doi.org/10.38094/jastt20165 Chaturvedi, P. (2024). Air Quality Prediction System Using Machine Learning Models. Water, Air, & Soil Pollution, 235(9), 578. https://doi.org/https://doi.org/10.1007/s11270-024-07390-0 Chowdhury, A. A., Das, A., Hoque, K. K. S., & Karmaker, D. (2022). A comparative study of hyperparameter optimization techniques for deep learning. Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2021, Dey, R., & Mathur, R. (2023). Ensemble learning method using stacking with base learner, a comparison. International Conference on Data Analytics and Insights, Ding, Y., Zhu, H., Chen, R., & Li, R. (2022). An efficient AdaBoost algorithm with the multiple thresholds classification. Applied Sciences, 12(12), 5872. https://doi.org/https://doi.org/10.3390/app12125872 Djeziri, M. A., Djedidi, O., Morati, N., Seguin, J.-L., Bendahan, M., & Contaret, T. (2022). A temporal-based SVM approach for the detection and identification of pollutant gases in a gas mixture. Applied Intelligence, 52(6), 6065-6078. https://doi.org/https://doi.org/10.1007/s10489-021-02761-0 Dong, Y., Li, F., Zhu, T., & Yan, R. (2024). Air quality prediction based on quantum activation function optimized hybrid quantum classical neural network. Frontiers in Physics, 12, 1412664. https://doi.org/https://doi.org/10.3389/fphy.2024.1412664 Du, S., Li, T., Yang, Y., & Horng, S.-J. (2019). Deep air quality forecasting using hybrid deep learning framework. IEEE Transactions on Knowledge and Data Engineering, 33(6), 2412-2424. https://doi.org/https://doi.org/10.1109/tkde.2019.2954510 Emeç, M., & Yurtsever, M. (2025). A novel ensemble machine learning method for accurate air quality prediction. International Journal of Environmental Science and Technology, 22(1), 459-476. https://doi.org/https://doi.org/10.1007/s13762-024-05671-z Fathima, M. D., Donavalli, S., & Kambham, H. (2024). Air Quality Prediction using Deep Learning models. 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI), Ghosh, S., Gourisaria, M. K., Sahoo, B., & Das, H. (2023). A pragmatic ensemble learning approach for rainfall prediction. Discover Internet of Things, 3(1), 13. https://doi.org/https://doi.org/10.1007/s43926-023-00044-3 Gilik, A., Ogrenci, A. S., & Ozmen, A. (2022). Air quality prediction using CNN+ LSTM-based hybrid deep learning architecture. Environmental Science and Pollution Research(29), 1-19. https://doi.org/https://doi.org/10.1007/s11356-021-16227-w Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94. https://doi.org/https://doi.org/10.1186/s40537-020-00369-8 Hastie, T. (2020). Ridge regularization: An essential concept in data science. Technometrics, 62(4), 426-433. https://doi.org/https://doi.org/10.1080/00401706.2020.1791959 Hettige, K. H., Ji, J., Xiang, S., Long, C., Cong, G., & Wang, J. (2024). Airphynet: Harnessing physics-guided neural networks for air quality prediction. arXiv preprint arXiv:2402.03784, 2, 1-16. https://doi.org/https://doi.org/10.48550/arxiv.2402.03784 Hosein, P., & Baboolal, K. (2024). Bayes Classification using an approximation to the Joint Probability Distribution of the Attributes. International Conference on Deep Learning Theory and Applications, Hu, Y., Li, Q., Shi, X., Yan, J., & Chen, Y. (2023). Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data. arXiv preprint arXiv:2401.00521, 1. https://doi.org/https://doi.org/10.48550/arxiv.2401.00521 Jafarnejad Chaghoshi, A., Rezasoltani, A., & Khani, A. M. (2024). Unleashing the Power of Ensemble Learning: Predicting National Ranks in Iran’s University Entrance Examination. Industrial Management Journal, 16(3), 457-481. https://doi.org/https://doi.org/10.22059/imj.2024.381521.1008178 Jayaraman, S., & Abirami, S. (2025). Enhancing urban air quality prediction using time-based-spatial forecasting framework. Scientific Reports, 15(1), 4139. https://doi.org/https://doi.org/10.1038/s41598-024-83248-z Kebriaeezadeh, S., Ghodduosi, J., Alesheikh, A. A., Arjmandi, R., & Mirzahosseini, S. A. (2022). Analyzing trend and factors affecting air quality in urban areas: a case study in Isfahan-metropolis, Iran. Environmental Sciences, 20(2), 171-184. Khamlich, M., Stabile, G., Rozza, G., Környei, L., & Horváth, Z. (2023). A physics-based reduced order model for urban air pollution prediction. Computer Methods in Applied Mechanics and Engineering, 417, 116416. https://doi.org/https://doi.org/10.48550/arxiv.2305.04575 Kim, H. I., Kim, D., Mahdian, M., Salamattalab, M. M., Bateni, S. M., & Noori, R. (2024). Incorporation of water quality index models with machine learning-based techniques for real-time assessment of aquatic ecosystems. Environmental Pollution, 355, 124242. https://doi.org/https://doi.org/10.1016/j.envpol.2024.124242 Kim, H. I., Kim, D., Salamattalab, M. M., Mahdian, M., Bateni, S. M., & Noori, R. (2024). Machine learning-based modeling of surface water temperature dynamics in arctic lakes. Environmental Science and Pollution Research, 31(49), 59642-59655. https://doi.org/https://doi.org/10.1007/s11356-024-35173-x Kramer, O. (2013). Dimensionality reduction with unsupervised nearest neighbors (Vol. 51). Springer. https://doi.org/https://doi.org/10.1007/978-3-642-38652-7_2 Li, F., & Dong, Y. (2024). Air quality prediction based on improved quantum long short-term memory neural networks. Physica Scripta, 99(8), 085035. https://doi.org/https://doi.org/10.1088/1402-4896/ad619a Li, Y., Jiang, T., Gu, H., Lu, W., Wu, Q., & Yu, Y. (2023). Air Quality Index Prediction Based on CNN-LSTM-Attention Hybrid Modeling. 2023 International Conference on the Cognitive Computing and Complex Data (ICCD), Liu, H., Cheng, J., & Liao, W. (2024). Deep neural networks are adaptive to function regularity and data distribution in approximation and estimation. arXiv preprint arXiv:2406.05320, 1. https://doi.org/https://doi.org/10.48550/arxiv.2406.05320 Ma, X., Chen, T., Ge, R., Xv, F., Cui, C., & Li, J. (2023). Prediction of PM2. 5 concentration using spatiotemporal data with machine learning models. Atmosphere, 14(10), 1517. https://doi.org/https://doi.org/10.3390/atmos14101517 Mao, Q., Zhu, X., Zhang, X., & Kong, Y. (2024). Effect of air pollution on the global burden of cardiovascular diseases and forecasting future trends of the related metrics: a systematic analysis from the Global Burden of Disease Study 2021. Frontiers in Medicine, 11, 1472996. https://doi.org/https://doi.org/10.3389/fmed.2024.1472996 Mateen., M. (2024). Air Quality and Pollution Assessment [Data set] (https://doi.org/https://doi.org/10.34740/KAGGLE/DS/6197184 Mengara Mengara, A. G., Park, E., Jang, J., & Yoo, Y. (2022). Attention-based distributed deep learning model for air quality forecasting. Sustainability, 14(6), 3269. https://doi.org/https://doi.org/10.3390/su14063269 Mirzadeh, H., & Omranpour, H. (2024). Extended Random Forest for multivariate air quality forecasting. International Journal of Machine Learning and Cybernetics, 16, 1-25. https://doi.org/https://doi.org/10.1007/s13042-024-02329-7 Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 3, e127. https://doi.org/https://doi.org/10.7717/peerj-cs.127 Natarajan, S. K., Shanmurthy, P., Arockiam, D., Balusamy, B., & Selvarajan, S. (2024). Optimized machine learning model for air quality index prediction in major cities in India. Scientific Reports, 14(1), 6795. https://doi.org/https://doi.org/10.1038/s41598-024-54807-1 Nguyen, A. T., Pham, D. H., Oo, B. L., Ahn, Y., & Lim, B. T. (2024). Predicting air quality index using attention hybrid deep learning and quantum-inspired particle swarm optimization. Journal of big data, 11(1), 71. https://doi.org/https://doi.org/10.1186/s40537-024-00926-5 Noori, R., Hoshyaripour, G., Ashrafi, K., & Araabi, B. N. (2010). Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration. Atmospheric Environment, 44(4), 476-482. https://doi.org/https://doi.org/10.1016/j.atmosenv.2009.11.005 Nukui, T., & Onogi, A. (2023). An R package for ensemble learning stacking. Bioinformatics Advances, 3(1), vbad139. https://doi.org/https://doi.org/10.1093/bioadv/vbad139 Pal, A. (2021). Logistic regression: A simple primer. Cancer Research, Statistics, and Treatment, 4(3), 551-554. https://doi.org/https://doi.org/10.4103/crst.crst_164_21 Petrić, V., Hussain, H., Časni, K., Vuckovic, M., Schopper, A., Andrijić, Ž. U., Kecorius, S., Madueno, L., Kern, R., & Lovrić, M. (2024). Ensemble Machine Learning, Deep Learning, and Time Series Forecasting: Improving Prediction Accuracy for Hourly Concentrations of Ambient Air Pollutants. Aerosol and Air Quality Research, 24(12), 230317. https://doi.org/https://doi.org/10.4209/aaqr.230317 Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31, 1-11. https://doi.org/https://doi.org/10.48550/arxiv.1706.09516 Qiuqian, W., GaoMin, KeZhu, Z., & Chenchen. (2025). A light gradient boosting machine learning-based approach for predicting clinical data breast cancer. Multiscale and Multidisciplinary Modeling, Experiments and Design, 8(1), 75. https://doi.org/https://doi.org/10.1007/s41939-024-00662-6 Quynh, T. P. T., Viet, T. N., Thi, H. D., & Manh, K. H. (2023). Enhancing air quality prediction accuracy using hybrid deep learning. Int J Environ Sci Dev, 14(2), 155-159. https://doi.org/https://doi.org/10.18178/ijesd.2023.14.2.1428 Rahman, M. M., Nayeem, M. E. H., Ahmed, M. S., Tanha, K. A., Sakib, M. S. A., Uddin, K. M. M., & Babu, H. M. H. (2024). AirNet: predictive machine learning model for air quality forecasting using web interface. Environmental Systems Research, 13(1), 44. https://doi.org/https://doi.org/10.1186/s40068-024-00378-z Rajagopal, K., & Narayanan, K. (2024). A Novel Approach for Air Quality Index Prognostication using Hybrid Optimization Techniques. International Research Journal of Multidisciplinary Technovation, 6(2), 84-99. https://doi.org/https://doi.org/10.54392/irjmt2427 Ramadan, M. S., Abuelgasim, A., & Al Hosani, N. (2024). Advancing air quality forecasting in Abu Dhabi, UAE using time series models. Frontiers in Environmental Science, 12, 1393878. https://doi.org/https://doi.org/10.3389/fenvs.2024.1393878 Roy, S., Mehera, R., Pal, R. K., & Bandyopadhyay, S. K. (2023). Hyperparameter optimization for deep neural network models: a comprehensive study on methods and techniques. Innovations in Systems and Software Engineering, 1-12. https://doi.org/https://doi.org/10.1007/s11334-023-00540-3 Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215. https://doi.org/https://doi.org/10.48550/arxiv.1811.10154 Saravani, M. J., Noori, R., Jun, C., Kim, D., Bateni, S. M., Kianmehr, P., & Woolway, R. I. (2025). Predicting chlorophyll-a concentrations in the world’s largest lakes using Kolmogorov-Arnold networks. Environmental Science & Technology, 59(3), 1801-1810. https://doi.org/https://doi.org/10.1021/acs.est.4c11113 Scornet, E. (2023). Trees, forests, and impurity-based variable importance in regression. Annales de l’Institut Henri Poincare (B) Probabilites et statistiques, Shankar, L., & Arasu, K. (2023). Deep Learning Techniques for Air Quality Prediction: A Focus on PM2. 5 and Periodicity. Migration Letters, 20(S13), 468-484. https://doi.org/https://doi.org/10.59670/ml.v20is13.6477 Sharifi, M. S., Aslami, A., Zaheb, H., Abed, I., Shokoori, A. W., & Yona, A. (2024). Modeling the Impact of Socio-Economic and Environmental Factors on Air Quality in the City of Kabul. Sustainability, 16(24), 10969. https://doi.org/https://doi.org/10.3390/su162410969 Sigamani, S. (2024). Air quality index prediction with optimisation enabled deep learning model in IoT application. Environmental Technology, 46(11), 1892–1908. https://doi.org/https://doi.org/10.1080/09593330.2024.2409993 Sun, Q., Zhu, Y., Chen, X., Xu, A., & Peng, X. (2021). A hybrid deep learning model with multi-source data for PM 2.5 concentration forecast. Air Quality, Atmosphere & Health, 14, 503-513. https://doi.org/https://doi.org/10.1007/s11869-020-00954-z Tang, S. (2024). The box office prediction model based on the optimized XGBoost algorithm in the context of film marketing and distribution. Plos one, 19(10), e0309227. https://doi.org/https://doi.org/10.1371/journal.pone.0309227 Tejaswi, M. (2024). AIR MAP- Deep Learning Prediction in Air Quality for Smarter Decisions. Interantional Journal of Scientific Research in Engineering and Management, 08(05), 1-5. https://doi.org/https://doi.org/10.55041/ijsrem35317 Tsokov, S., Lazarova, M., & Aleksieva-Petrova, A. (2022). A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability, 14(9), 5104. https://doi.org/https://doi.org/10.3390/su14095104 Victoria, A. H., & Maragatham, G. (2021). Automatic tuning of hyperparameters using Bayesian optimization. Evolving Systems, 12(1), 217-223. https://doi.org/https://doi.org/10.1007/s12530-020-09345-2 Wang, T. (2024). Air Quality Prediction based on Neural Network. Highlights in Science, Engineering and Technology, 105, 37-43. https://doi.org/https://doi.org/10.54097/2fsfav47 Wang, X., Zhang, S., Chen, Y., He, L., Ren, Y., Zhang, Z., Li, J., & Zhang, S. (2024). Air quality forecasting using a spatiotemporal hybrid deep learning model based on VMD–GAT–BiLSTM. Scientific Reports, 14(1), 17841. https://doi.org/https://doi.org/10.54097/2fsfav47 Wang, Y., Liu, K., He, Y., Wang, P., Chen, Y., Xue, H., Huang, C., & Li, L. (2024). Enhancing air quality forecasting: a novel spatio-temporal model integrating graph convolution and multi-head attention mechanism. Atmosphere, 15(4), 418. https://doi.org/https://doi.org/10.1038/s41598-024-68874-x Wardana, I. N. K., Gardner, J. W., & Fahmy, S. A. (2021). Optimising deep learning at the edge for accurate hourly air quality prediction. Sensors, 21(4), 1064. https://doi.org/https://doi.org/10.3390/s21041064 Wonderling, D., Mariani, A., Samarasekera, E. J., Wilkinson, C., Patel, R. S., & Mills, J. (2024). Secondary prevention of cardiovascular disease, including cholesterol targets: summary of updated NICE guidance. bmj, 384, 1-4. https://doi.org/https://doi.org/10.1136/bmj.q637 Xu, R., Wang, D., Li, J., Wan, H., Shen, S., & Guo, X. (2023). A hybrid deep learning model for air quality prediction based on the time–frequency domain relationship. Atmosphere, 14(2), 405. https://doi.org/https://doi.org/10.3390/atmos14020405 Zhang, Z., Zeng, Y., & Yan, K. (2021). A hybrid deep learning technology for PM 2.5 air quality forecasting. Environmental Science and Pollution Research, 28, 39409-39422. https://doi.org/https://doi.org/10.1007/s11356-021-12657-8 Zhao, M. (2025). Synthetic minority oversampling technique based on natural neighborhood graph with subgraph cores for class-imbalanced classification. The Journal of Supercomputing, 81(1), 1-35. https://doi.org/https://doi.org/10.1007/s11227-024-06655-z Zhao, M., & Ye, N. (2024). High-Dimensional Ensemble Learning Classification: An Ensemble Learning Classification Algorithm Based on High-Dimensional Feature Space Reconstruction. Applied Sciences, 14(5), 1956. https://doi.org/https://doi.org/10.3390/app14051956 Zhao, S., Zhang, B., Yang, J., Zhou, J., & Xu, Y. (2024). Linear discriminant analysis. Nature Reviews Methods Primers, 4(1), 70. https://doi.org/https://doi.org/10.1038/s43586-024-00346-y | ||
آمار تعداد مشاهده مقاله: 46 تعداد دریافت فایل اصل مقاله: 77 |