ABSTRACT
To address the issue of poor prediction accuracy and performance caused by the influence of the original data sequence on the first-order single-variable gray model (GM(1,1)), this study proposes an exponential smoothing gray model (ESGM(1,1)). Taking the Anliu Station situated at the border between Henan and Anhui provinces as an example, ammonia nitrogen and the permanganate index were selected for water quality prediction using the GM(1,1) and ESGM(1,1) models from 2010 to 2021. The fitting accuracy of these models is evaluated by comparing the computed values with the actual monitored water quality index values. The results reveal that the average relative percentage error in the simulation period decreased by 3.01% compared with GM(1,1) and further decreased by 27.41% during the verification period. The mean square error ratio C of GM(1,1) was 0.79, which failed the fitting accuracy test. The C value of ESGM(1,1) was 0.59, which successfully passed the test. The predicted results were consistent with the monitoring data from 2010 to 2021. It is concluded that ESGM(1,1) shows superior accuracy for short-term water quality prediction. This model mitigates the impact of the initial sequence on prediction accuracy and can be utilized for local water pollution control and environmental protection.
HIGHLIGHTS
A coupled exponential smoothing gray model is proposed, integrating the theories of exponential smoothing and gray modeling.
The average relative percentage error from the coupled model is reduced by 3.01% compared to GM(1,1) during the simulation period and by 27.41% during the verification period.
The coupled model demonstrates superior accuracy for short-term water quality forecasting.
INTRODUCTION
Water is a vital component of the natural ecological cycle; however, in recent years, the quality of both surface and groundwater has faced increasingly severe challenges. On one hand, this can be attributed to the rising pollution load. For instance, microplastics, which are emerging contaminants, persist in aquatic systems due to their resistance to degradation, leading to widespread pollution (Cook et al. 2021; Stride et al. 2023). Additionally, specific microalgal species bloom in aquatic ecosystems and produce harmful substances that can adversely affect water quality (Hii et al. 2024). On the other hand, the security of the water environment is increasingly threatened by the impacts of climate change and anthropogenic activities (Mahdian et al. 2023). Consequently, accurate water quality forecasting has become imperative. Water quality prediction uses a mathematical model to infer the changing trend of the water quality for a certain section of the water body in future based on the obtained water quality monitoring data and information (Chen et al. 2022). The predicted results can guide water pollution control and water environment protection, which are of great significance to the health of the water ecological environment.
According to the principles of mathematical models, water quality prediction models can generally be classified into mechanistic models and non-mechanistic models. Mechanistic models were developed earlier and primarily simulate the migration and transformation of pollutants. They track the movement of pollutants from the discharge point downstream, in accordance with the direction of water flow, as governed by control equations. These models aim to predict the law of water quality changes. In 1925, Streeter and Phelps established the first S–P model to describe the variation of biochemical oxygen demand (BOD)–dissolved oxygen (DO) in one-dimensional steady rivers. It was the foundation for subsequent improved water quality models. The Water Quality Analysis Simulation Program (WASP), proposed by the U.S. Environmental Protection Agency in 1983, and the Soil and Water Assessment Tool (SWAT), developed by the U.S. Department of Agriculture in 1994, are two examples of mechanistic models. They can thoroughly consider various factors affecting the change of water quality when predicting the migration and transformation law of pollutants. However, these mechanistic water quality models are relatively complicated because they need to understand the change process of the water quality index (WQI) and establish the corresponding hydrodynamic and water quality control equation. It has many requirements for basic data, making it challenging to calculate individual parameter values. This limitation restricts its application in water quality prediction (Noori et al. 2020; Dong et al. 2023; Huang et al. 2024).
Non-mechanistic models include regression analysis, time-series analysis, machine learning methods, gray prediction methods (Duan & Song 2024; Khosravi et al. 2024), and so on. According to the principles of mathematical statistics, the regression analysis model establishes a regression equation between the dependent variable and the independent variables through the mathematical processing of a large amount of sample data. We then used the equation to predict the change in the dependent variable. This model has been widely used in the fields of economics, electric power, and water engineering (Herrera et al. 2010; Yildiz et al. 2017; Zeng et al. 2024). However, this model requires a large sample size and multiple tests for prediction accuracy. Time-series analysis is a method that predicts future changes by calculating the weighted average of the factors affecting those changes. It has the advantages of simple operation and a small amount of data calculation, making it applicable in the fields of finance and environmental science (Menéndez-García et al. 2024; Toivonen & Räsänen 2024). However, it has the disadvantage of considering only the influence of time change, resulting in a low reliability of the prediction results. Machine learning methods are data-driven approaches that extract patterns or relationships from input data and utilize these learned patterns to make predictions or decisions on new data. In recent years, machine learning techniques have been extensively applied across various industries (Borzooei et al. 2018; Cabaneros et al. 2019; Kwon et al. 2023). Several researchers have integrated deterministic water quality evaluation methods with machine learning approaches to assess the water quality of drinking water in mining areas, demonstrating that machine learning methods are highly effective in predicting WQI values (Mohammadpour et al. 2024). However, machine learning methods exhibit certain limitations. For instance, artificial neural networks (ANNs) often require large volumes of sample data and provide limited interpretability of model results.
The aquatic ecological environment is an open ecosystem. Changes in water quality are the result of the combined influence of various factors, including physical, chemical, and biological processes. The variation in the water quality concentration has some uncertain characteristics, such as randomness, fuzziness, and gray. It provides a basis for predicting the evolution trend of water quality using the gray theory. The gray model (GM) is used to make predictions by transforming irregular original sequences into relatively regular ones through cumulative addition or subtraction. It has advantages, such as requiring less sample information and simple operation. It has wide applications in engineering and economics (Hu 2020; Liu et al. 2023; Xu et al. 2024). Among the various GMs, GM(1,1) is the most widely used (Wang & Zhang 2023; Li et al. 2024). GM(1,1) is a first-order differential equation and is more suitable for monotonically changing long-term trends in the initial data series. If the trend changes abruptly, the error in the GM(1,1) prediction results will be large. Under the influence of human activities, extreme rainfall and other factors, the concentration of the WQI may exhibit irregular changes or sudden rises and falls. To reflect the characteristics of water quality change, some studies have improved the calculation accuracy by modifying the residual results of GM(1,1) or by establishing dynamic models (Zhou et al. 2006; Sun et al. 2023). These methods improve the calculation accuracy by modifying the calculation results, but the original data series fluctuation problem of GM(1,1) has not been solved.
Brown proposed an exponential smoothing method for time-series analysis and prediction. This approach uses a weighted average to assign more weight to the latest data, thereby reducing the influence of historical data on future predictions (Brown 1963). This advantage makes the exponential smoothing method widely used in economics, medicine, and transportation sciences (Guleryuz 2021; Zhao et al. 2022; Todorov & Sánchez Lasheras 2023). Mi et al. (2018) demonstrated the feasibility and rationality of using exponential smoothing for data sequence smoothing by establishing an exponential smoothing GM (ESGM).
From the aforementioned analysis, it can be concluded that the initial sequence significantly influences the prediction accuracy and performance of the GM(1,1) model. In this study, an ESGM(1,1), which integrates the principles of exponential smoothing and the advantages of the GM(1,1) model, has been developed. The ESGM(1,1) model employs a weighted moving average approach based on exponential smoothing to mitigate the fluctuations in the original water quality sequence. Additionally, the background values of the GM(1,1) model are enhanced through the establishment of adaptive sequences. By comparison with the actual monitoring data, the accuracy of the water quality prediction results of the coupling model was verified. Using the coupled model of ESGM(1,1), the water quality change of the Anliu Station in the Guo River, China was predicted. This study demonstrates the feasibility and rationality of using this model for water quality prediction in detail. Additionally, it provides a reference method to enhance the reliability of water quality prediction results and offers a scientific basis for water environment protection and management.
METHODOLOGY
GM(1,1)
The GM(1,1) model is a component of gray system theory, where the notation (1,1) signifies that it is based on a first-order differential equation with a single variable. The model generates a new data sequence by performing a once-accumulated operation on the original series. The least-squares method is used to fit an exponential curve to the accumulated sequence. Based on this fitted curve, the model simulates values that are subsequently adjusted to derive predicted values. The steps involved in this process are as follows:
Based on the actual monitoring data of ammonia nitrogen and the permanganate index at the Anliu Station, the cumulative sequence was first obtained through a one-time accumulation, as described in Equation (1). Subsequently, an adjacent mean sequence for both indicators was generated using Equation (2). The least-squares method was then applied to calculate the parameters a and u for the two models, as outlined in Equation (3) and (4). Following this, the GM(1,1) response equations for both indicators were established based on Equation (5). Finally, the calculated values of the models were derived using Equation (6). The entire computational process was recorded in an Excel spreadsheet.
Exponential smoothing





The initial value of exponential smoothing is related to the number of terms in the data series. When the number of items is greater than 20, the observed value in the first period is generally selected as the initial value; otherwise, when the number of items in the data series is no more than 20, the average of the observed values in the previous three periods can be selected as the initial value (Yanwei 2010).
Exponential smoothing assumes that the predicted value is influenced by given data or information. Moreover, recent data usually have a greater impact on the predicted value than historical data. The longer the prediction time, the smaller the impact of historical data on the predicted value, which generally decreases geometric progression (Yates 1968). Depending on the number of smoothing stages, the exponential smoothing method can be categorized into single, double, and triple exponential.
ESGM(1,1)

Following the computational procedure of GM(1,1), the ESGM(1,1) model was also implemented using an Excel spreadsheet. First, as described in Equation (7), the actual monitoring data for ammonia nitrogen and the permanganate index at the Anliu Station were smoothed to generate an exponentially smoothed sequence. Subsequently, a cumulative sequence was obtained using Equation (12). Next, a new background value sequence was derived using Equations (13)–(15), and a revised formula for the background value was established based on Step 16. Following this, the development coefficient a and the gray action quantity u for the two indicators in the ESGM(1,1) model were calculated using Equation (17) and (18). Finally, the model equations were solved using Equations (19) and (20), and the computed values of the model were obtained through inverse accumulation to facilitate subsequent calculations.
Model verification
Fitting accuracy testing
Posteriori deviation testing was used to verify the fitting accuracy of GM(1,1) and ESGM(1,1) (Gong & Wang 2019). It is performed according to the mean square error C and small error probability P, which is a statistical test. The steps are as follows.
Finally, the accuracy and performance of the prediction model were determined according to the calculated mean square error ratio (C) and the small error probability (P), as detailed in Table 1.
The accuracy and performance of the prediction model
Accuracy level . | C . | P . |
---|---|---|
Excellent | C ≤ 0.35 | P ≥ 0.95 |
Good | 0.35 < C ≤ 0.50 | 0.80 ≤ P < 0.95 |
Qualified | 0.50 < C ≤ 0.65 | 0.70 ≤ P < 0.80 |
Unqualified | C > 0.65 | P < 0.70 |
Accuracy level . | C . | P . |
---|---|---|
Excellent | C ≤ 0.35 | P ≥ 0.95 |
Good | 0.35 < C ≤ 0.50 | 0.80 ≤ P < 0.95 |
Qualified | 0.50 < C ≤ 0.65 | 0.70 ≤ P < 0.80 |
Unqualified | C > 0.65 | P < 0.70 |
Predictive performance testing


The lower the RPE and the ARPE, the better the prediction accuracy of the model.
CASE STUDY
The Guo River, a significant tributary of the Huai River in China, spans 396 km, with a drainage area of 15,890 km2. Serving as the primary river in the northern part of the Huaibei Plain, it traverses Kaifeng, Tongxu, Fugou, Taikang, and Luyi in Henan Province and Bozhou, Guoyang, and Mengcheng in Anhui Province before converging with the Huaihe River near Huaiyuan County. The river basin is economically and industrially developed and serves as an important production base for grain, cotton, and vegetable oils in China. The population density along its banks is relatively high; however, there are numerous sewage outlets along the Guo River, particularly upstream, which results in poor water quality. The discharge of pollutants into the river directly affects the quality of the water environment in the middle reaches of the Huai River.
The annual average data for the 16 water quality parameters at the Anliu Station from 2010 to 2021 were obtained from the Water Information System of the Anhui Provincial Government (http://yc.wswj.net/ahsxx/LOL/?refer=upl&to=public_public). This system boasts a comprehensive quality control and quality assurance system that ensures the reliability and consistency of monitoring data.
RESULTS AND DISCUSSION
Water quality evaluation
Results of water quality concentration divided by Class III water concentration
Water quality parameter . | Year . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 . | 2011 . | 2012 . | 2013 . | 2014 . | 2015 . | 2016 . | 2017 . | 2018 . | 2019 . | 2020 . | 2021 . | Mean value . | |
Total phosphorus | 2.96 | 1.79 | 1.02 | 1.12 | 0.70 | 0.80 | 0.74 | 1.04 | 0.73 | 0.37 | 0.39 | 0.63 | 1.02 |
Ammonia nitrogen | 4.93 | 5.47 | 3.96 | 2.30 | 0.91 | 1.15 | 1.09 | 0.92 | 0.57 | 0.60 | 0.41 | 0.81 | 1.93 |
Volatile phenol | 0.24 | 0.46 | 0.25 | 0.27 | 0.16 | 0.09 | 0.13 | 0.00 | 0.16 | 0.00 | 0.00 | 0.07 | 0.15 |
Permanganate index | 1.11 | 1.26 | 0.94 | 0.83 | 0.77 | 0.88 | 0.96 | 0.92 | 0.88 | 0.82 | 0.81 | 0.77 | 0.91 |
Five-day BOD | 0.92 | 0.87 | 0.88 | 0.96 | 0.89 | 1.08 | 1.32 | 1.13 | 1.01 | 0.81 | 1.17 | 1.15 | 1.02 |
Dissolved oxygen | 0.23 | 0.54 | 0.51 | 0.60 | 0.57 | 0.59 | 0.47 | 0.79 | 0.13 | 0.00 | 0.01 | 0.18 | 0.39 |
Chromium | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
Copper | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Zinc | 0.00 | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 |
Cadmium | 0.00 | 0.02 | 0.10 | 0.10 | 0.10 | 0.09 | 0.08 | 0.03 | 0.02 | 0.02 | 0.03 | 0.04 | 0.05 |
Lead | 0.00 | 0.05 | 0.01 | 0.10 | 0.10 | 0.02 | 0.07 | 0.02 | 0.02 | 0.01 | 0.01 | 0.00 | 0.03 |
Arsenic | 0.21 | 0.11 | 0.11 | 0.09 | 0.08 | 0.09 | 0.08 | 0.05 | 0.07 | 0.10 | 0.05 | 0.08 | 0.09 |
Mercury | 2.68 | 1.58 | 1.08 | 0.67 | 0.65 | 0.56 | 0.50 | 0.21 | 0.36 | 0.23 | 0.04 | 0.13 | 0.72 |
Selenium | 0.16 | 0.02 | 0.02 | 0.02 | 0.03 | 0.03 | 0.04 | 0.09 | 0.01 | 0.00 | 0.00 | 0.00 | 0.04 |
Cyanide | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
Fluoride | 0.96 | 1.52 | 1.02 | 0.99 | 1.03 | 1.10 | 1.10 | 1.13 | 0.97 | 0.93 | 0.93 | 0.88 | 1.05 |
Water quality parameter . | Year . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 . | 2011 . | 2012 . | 2013 . | 2014 . | 2015 . | 2016 . | 2017 . | 2018 . | 2019 . | 2020 . | 2021 . | Mean value . | |
Total phosphorus | 2.96 | 1.79 | 1.02 | 1.12 | 0.70 | 0.80 | 0.74 | 1.04 | 0.73 | 0.37 | 0.39 | 0.63 | 1.02 |
Ammonia nitrogen | 4.93 | 5.47 | 3.96 | 2.30 | 0.91 | 1.15 | 1.09 | 0.92 | 0.57 | 0.60 | 0.41 | 0.81 | 1.93 |
Volatile phenol | 0.24 | 0.46 | 0.25 | 0.27 | 0.16 | 0.09 | 0.13 | 0.00 | 0.16 | 0.00 | 0.00 | 0.07 | 0.15 |
Permanganate index | 1.11 | 1.26 | 0.94 | 0.83 | 0.77 | 0.88 | 0.96 | 0.92 | 0.88 | 0.82 | 0.81 | 0.77 | 0.91 |
Five-day BOD | 0.92 | 0.87 | 0.88 | 0.96 | 0.89 | 1.08 | 1.32 | 1.13 | 1.01 | 0.81 | 1.17 | 1.15 | 1.02 |
Dissolved oxygen | 0.23 | 0.54 | 0.51 | 0.60 | 0.57 | 0.59 | 0.47 | 0.79 | 0.13 | 0.00 | 0.01 | 0.18 | 0.39 |
Chromium | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
Copper | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Zinc | 0.00 | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 |
Cadmium | 0.00 | 0.02 | 0.10 | 0.10 | 0.10 | 0.09 | 0.08 | 0.03 | 0.02 | 0.02 | 0.03 | 0.04 | 0.05 |
Lead | 0.00 | 0.05 | 0.01 | 0.10 | 0.10 | 0.02 | 0.07 | 0.02 | 0.02 | 0.01 | 0.01 | 0.00 | 0.03 |
Arsenic | 0.21 | 0.11 | 0.11 | 0.09 | 0.08 | 0.09 | 0.08 | 0.05 | 0.07 | 0.10 | 0.05 | 0.08 | 0.09 |
Mercury | 2.68 | 1.58 | 1.08 | 0.67 | 0.65 | 0.56 | 0.50 | 0.21 | 0.36 | 0.23 | 0.04 | 0.13 | 0.72 |
Selenium | 0.16 | 0.02 | 0.02 | 0.02 | 0.03 | 0.03 | 0.04 | 0.09 | 0.01 | 0.00 | 0.00 | 0.00 | 0.04 |
Cyanide | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
Fluoride | 0.96 | 1.52 | 1.02 | 0.99 | 1.03 | 1.10 | 1.10 | 1.13 | 0.97 | 0.93 | 0.93 | 0.88 | 1.05 |
Note: Bold font indicates that the water quality concentration exceeds the surface water Class III water quality standard.
Water quality concentration changes from 2010 to 2021. (a) Total phosphorus, ammonia nitrogen, permanganate index, 5-day BOD, DO, and fluoride. (b) Chromium, zinc, arsenic, and cyanide. (c) Volatile phenol, copper, cadmium, lead, mercury, and selenium.
Water quality concentration changes from 2010 to 2021. (a) Total phosphorus, ammonia nitrogen, permanganate index, 5-day BOD, DO, and fluoride. (b) Chromium, zinc, arsenic, and cyanide. (c) Volatile phenol, copper, cadmium, lead, mercury, and selenium.
Setting up ESGM(1,1)
According to the results of single-factor water quality evaluations, BOD5 often exceeds the Class III water quality standard. This indicates the amount of DO consumed by organic matter per unit volume of water over 5 days through aerobic microbial oxidation and decomposition. They are significantly affected by human activities and environmental temperatures. The ammonia nitrogen and permanganate index results frequently approached or exceeded 0.7, highlighting a recurring risk of surpassing the Class III water standard. Therefore, GM(1,1) and ESGM(1,1) were selected for ammonia nitrogen and the permanganate index to predict water quality.
The simulation period spanned from 2010 to 2019, while the validation periods covered 2020 to 2021. The model prediction performance and fitting accuracy were analyzed to compare the calculated and monitored values of the water quality indicators during the simulation and verification periods. Finally, the model with superior predictive performance and accuracy was applied to predict the concentration of ammonia nitrogen and the permanganate index for the subsequent 3 years.
Exponential smoothing
Comparison between monitored data and smoothed values ((a) ammonia nitrogen and (b) permanganate index).
Comparison between monitored data and smoothed values ((a) ammonia nitrogen and (b) permanganate index).
Establishing model of ESGM(1,1)
The original data sequence X (0) (t) is constructed using monitoring data of ammonia nitrogen and the permanganate index from 2010 to 2019. According to Equation (1), the original data sequence X(0)(t) is accumulated once to obtain the cumulative sequence X(1)(t). Subsequently, the background value sequence Z(t) can be obtained by averaging the adjacent data in the cumulative sequence X(1)(t), as outlined in Equation (2). Then, using Equations (3) and (4), the development coefficient a and the gray action quantity u for the ammonia nitrogen and permanganate index are calculated. For the permanganate index, the development coefficient was 0.030 and the gray action quantity was 6.47. For ammonia nitrogen, a was 0.37 and u was 8.13. Finally, the response equations of the two water quality parameters, GM(1,1), were derived using Equation (6), as shown in Table 3.
Response equation of GM(1,1) and ESGM(1,1)
Model . | Water quality parameter . | ||
---|---|---|---|
Ammonia nitrogen . | Permanganate index . | ||
GM(1,1) | ![]() | ![]() | |
ESGM(1,1) | λ = 0.3 | ![]() | ![]() |
λ = 0.5 | ![]() | ![]() | |
λ = 0.9 | ![]() | ![]() |
Model . | Water quality parameter . | ||
---|---|---|---|
Ammonia nitrogen . | Permanganate index . | ||
GM(1,1) | ![]() | ![]() | |
ESGM(1,1) | λ = 0.3 | ![]() | ![]() |
λ = 0.5 | ![]() | ![]() | |
λ = 0.9 | ![]() | ![]() |
To further compare and verify the performance of the model, ESGM(1,1) was established according to Equations (12)–(20). The response equations for ammonia nitrogen and the permanganate index were calculated at λ values of 0.3, 0.5, and 0.9, as shown in Table 3. Table 3 shows that, in the ESGM(1,1) model, a lower λ coefficient corresponds to a smaller development coefficient a. The development coefficient a reflects the changing trend of the original data series. A smaller absolute value of a indicates a gentler development trend in the model. This suggests that ESGM(1,1) effectively smooths the volatility of the original data, reducing its interference with the prediction accuracy.
Model test
Fitting accuracy testing
For the results of GM(1,1) and ESGM(1,1), Equations (21)–(23) are used to calculate the residual value, the variance of the residual, and the variance of the original sequence. The mean square error ratio C and small error probability P were calculated according to Equations (24) and (25) to further quantify the fitting accuracy of the model. The calculation results are listed in Table 4.
Test results of fitting accuracy
Test index . | Mean square error ratio C . | Small error probability P . | ||
---|---|---|---|---|
Ammonia nitrogen | GM(1,1) | 0.19 | 1.00 | |
ESGM(1,1) | λ = 0.3 | 0.17 | 1.00 | |
λ = 0.5 | 0.19 | 1.00 | ||
λ = 0.9 | 0.24 | 1.00 | ||
Permanganate index | GM(1,1) | 0.79 | 0.70 | |
ESGM(1,1) | λ = 0.3 | 0.41 | 0.90 | |
λ = 0.5 | 0.59 | 0.70 | ||
λ = 0.9 | 0.77 | 0.70 |
Test index . | Mean square error ratio C . | Small error probability P . | ||
---|---|---|---|---|
Ammonia nitrogen | GM(1,1) | 0.19 | 1.00 | |
ESGM(1,1) | λ = 0.3 | 0.17 | 1.00 | |
λ = 0.5 | 0.19 | 1.00 | ||
λ = 0.9 | 0.24 | 1.00 | ||
Permanganate index | GM(1,1) | 0.79 | 0.70 | |
ESGM(1,1) | λ = 0.3 | 0.41 | 0.90 | |
λ = 0.5 | 0.59 | 0.70 | ||
λ = 0.9 | 0.77 | 0.70 |
The GM(1,1) of ammonia nitrogen and ESGM(1,1) of the three different λ values passed the fitting accuracy test. The mean square error ratio (C) at λ values for the GM(1,1) model is 0.19. For the ESGM(1,1) model, the C values at λ values were 0.3, 0.5, and 0.9 are 0.17, 0.19, and 0.24, respectively. The small error probability, P, for all models was 1, indicating an excellent grade. This indicates that the two models can effectively predict the changing trend of the ammonia nitrogen concentration. For the permanganate index, the P value of GM(1,1) was 0.70, indicating a qualified grade. However, the mean square error ratio C is 0.79, which exceeds the minimum qualified grade of 0.65, failing to pass the test. When λ = 0.3 and λ = 0.5, the test indexes of the ESGM(1,1) model pass the test. When λ = 0.9, the C value of 0.77 is still greater than 0.65, which failed to pass the test. It is necessary to further select the best value of λ before using the GM for prediction. Thus, ESGM(1,1) can improve the fitting accuracy by introducing different λ parameters. However, the optimal value of λ needs to be selected according to the specific characteristics of the data and the prediction demand; therefore, it is necessary to further determine the optimal value of λ by testing the prediction performance.
Predictive performance testing
Comparison of GM(1,1) and ESGM(1,1) results ((a) ammonia nitrogen and (b) permanganate index).
Comparison of GM(1,1) and ESGM(1,1) results ((a) ammonia nitrogen and (b) permanganate index).
To quantify the prediction performance, the prediction performance indicators of GM(1,1) and optimal ESGM(1,1) were calculated using Equations (26) and (27) and are presented in Table 5. For ammonia nitrogen, the ARPEs of ESGM(1,1) and GM(1,1) were 22.96 and 26.28%, respectively, from 2010 to 2019. In 2020, the RPE of ESGM(1,1) was 14.63%, approximately one-third of the RPE of GM(1,1), which was 53.59%. In 2021, the RPE of ESGM(1,1) was greater than 50%, yet still less than that of GM(1,1), which was 83.77%. The reason may be that the measured ammonia nitrogen concentration increased from 0.41 mg/L in 2020 to 0.81 mg/L in 2021, which is contrary to the downward trend, resulting in a larger prediction error. For the permanganate index, the average relative prediction error values for ESGM(1,1) and GM(1,1) during the simulation period were 9.63 and 8.99%, respectively, indicating relatively small errors. During the validation period, the ARPE of ESGM(1,1) for 2020 and 2021 was 0.63%, lower than the ARPE of GM(1,1), which was 0.93%.
Comparison of results obtained from GM(1,1) and ESIGM(1,1)
Year . | Ammonia nitrogen (mg/L) . | Permanganate index(mg/L) . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Monitoring data . | GM(1,1) . | ESIGM(1,1) . | Monitoring data . | GM(1,1) . | ESIGM(1,1) . | ||||||||||
Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE . | ||||
Simulation period | 2010 | 4.93 | 4.93 | 0.00 | 0.00 | 4.91 | 0.02 | 0.41 | 6.64 | 6.64 | 0.00 | 0.00 | 6.62 | −0.02 | 0.30 |
2011 | 5.47 | 5.32 | 0.15 | 2.81 | 4.75 | 0.72 | 13.16 | 7.57 | 6.25 | 1.32 | 17.44 | 6.48 | −1.09 | 14.40 | |
2012 | 3.96 | 3.67 | 0.29 | 7.27 | 3.56 | 0.4 | 10.10 | 5.61 | 6.06 | −0.45 | 8.11 | 6.27 | 0.66 | 11.76 | |
2013 | 2.3 | 2.54 | −0.24 | 10.28 | 2.66 | −0.36 | 15.65 | 4.96 | 5.89 | −0.93 | 18.66 | 6.06 | 1.1 | 22.18 | |
2014 | 0.91 | 1.75 | −0.84 | 92.53 | 1.99 | −1.08 | 118.68 | 4.6 | 5.71 | −1.11 | 24.17 | 5.87 | 1.27 | 27.61 | |
2015 | 1.15 | 1.21 | −0.06 | 5.23 | 1.49 | −0.34 | 29.57 | 5.29 | 5.54 | −0.25 | 4.78 | 5.68 | 0.39 | 7.37 | |
2016 | 1.09 | 0.84 | 0.25 | 23.31 | 1.11 | −0.02 | 1.83 | 5.75 | 5.38 | 0.37 | 6.45 | 5.49 | −0.26 | 4.52 | |
2017 | 0.92 | 0.58 | 0.34 | 37.24 | 0.83 | 0.09 | 9.78 | 5.53 | 5.22 | 0.31 | 5.60 | 5.31 | −0.22 | 3.98 | |
2018 | 0.57 | 0.40 | 0.17 | 30.03 | 0.62 | −0.05 | 8.77 | 5.31 | 5.07 | 0.24 | 4.60 | 5.14 | −0.17 | 3.20 | |
2019 | 0.6 | 0.28 | 0.32 | 54.09 | 0.47 | 0.13 | 21.67 | 4.92 | 4.92 | 0.00 | 0.08 | 4.97 | 0.05 | 1.02 | |
ARPE | 26.28% | ARPE | 22.96% | ARPE | 8.99% | ARPE | 9.63% | ||||||||
Validation period | 2020 | 0.41 | 0.19 | 0.22 | 53.59 | 0.35 | 0.06 | 14.63 | 4.85 | 4.77 | 0.08 | 1.63 | 4.81 | −0.04 | 0.82 |
2021 | 0.81 | 0.13 | 0.68 | 83.77 | 0.26 | 0.55 | 67.90 | 4.64 | 4.63 | 0.01 | 0.22 | 4.66 | 0.02 | 0.43 |
Year . | Ammonia nitrogen (mg/L) . | Permanganate index(mg/L) . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Monitoring data . | GM(1,1) . | ESIGM(1,1) . | Monitoring data . | GM(1,1) . | ESIGM(1,1) . | ||||||||||
Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE (%) . | Calculated value . | Residual . | RPE . | ||||
Simulation period | 2010 | 4.93 | 4.93 | 0.00 | 0.00 | 4.91 | 0.02 | 0.41 | 6.64 | 6.64 | 0.00 | 0.00 | 6.62 | −0.02 | 0.30 |
2011 | 5.47 | 5.32 | 0.15 | 2.81 | 4.75 | 0.72 | 13.16 | 7.57 | 6.25 | 1.32 | 17.44 | 6.48 | −1.09 | 14.40 | |
2012 | 3.96 | 3.67 | 0.29 | 7.27 | 3.56 | 0.4 | 10.10 | 5.61 | 6.06 | −0.45 | 8.11 | 6.27 | 0.66 | 11.76 | |
2013 | 2.3 | 2.54 | −0.24 | 10.28 | 2.66 | −0.36 | 15.65 | 4.96 | 5.89 | −0.93 | 18.66 | 6.06 | 1.1 | 22.18 | |
2014 | 0.91 | 1.75 | −0.84 | 92.53 | 1.99 | −1.08 | 118.68 | 4.6 | 5.71 | −1.11 | 24.17 | 5.87 | 1.27 | 27.61 | |
2015 | 1.15 | 1.21 | −0.06 | 5.23 | 1.49 | −0.34 | 29.57 | 5.29 | 5.54 | −0.25 | 4.78 | 5.68 | 0.39 | 7.37 | |
2016 | 1.09 | 0.84 | 0.25 | 23.31 | 1.11 | −0.02 | 1.83 | 5.75 | 5.38 | 0.37 | 6.45 | 5.49 | −0.26 | 4.52 | |
2017 | 0.92 | 0.58 | 0.34 | 37.24 | 0.83 | 0.09 | 9.78 | 5.53 | 5.22 | 0.31 | 5.60 | 5.31 | −0.22 | 3.98 | |
2018 | 0.57 | 0.40 | 0.17 | 30.03 | 0.62 | −0.05 | 8.77 | 5.31 | 5.07 | 0.24 | 4.60 | 5.14 | −0.17 | 3.20 | |
2019 | 0.6 | 0.28 | 0.32 | 54.09 | 0.47 | 0.13 | 21.67 | 4.92 | 4.92 | 0.00 | 0.08 | 4.97 | 0.05 | 1.02 | |
ARPE | 26.28% | ARPE | 22.96% | ARPE | 8.99% | ARPE | 9.63% | ||||||||
Validation period | 2020 | 0.41 | 0.19 | 0.22 | 53.59 | 0.35 | 0.06 | 14.63 | 4.85 | 4.77 | 0.08 | 1.63 | 4.81 | −0.04 | 0.82 |
2021 | 0.81 | 0.13 | 0.68 | 83.77 | 0.26 | 0.55 | 67.90 | 4.64 | 4.63 | 0.01 | 0.22 | 4.66 | 0.02 | 0.43 |
Through the application of the two indicators, ammonia nitrogen and the permanganate index, at the Anliu Station on the Guo River, it is evident that ESGM(1,1) aligns with GM(1,1) in effectively predicting time series with limited data. However, ESGM(1,1) demonstrates broader applicability and superior predictive accuracy than GM(1,1). For instance, in the case of the permanganate index, GM(1,1) fails to pass the accuracy test, with a mean-variance ratio (MVR) C of 0.79, indicating that the actual monitoring data at the Anliu Station are unsuitable for GM(1,1). Although the prediction performance of GM(1,1) appears relatively good, with an ARPE of 8.99% during the simulation period, the model's failure to pass the accuracy test makes its computational results meaningless. In contrast, ESGM(1,1) addresses this issue by reducing the mean-variance ratio C to 0.59, thereby passing the accuracy test. The ARPE of ESGM(1,1) is 9.63%, differing by less than 1% from that of GM(1,1), and it exhibits better predictive performance during the validation period. Regarding the ammonia nitrogen index, ESGM(1,1) outperforms GM(1,1) in prediction accuracy. The ARE of ESGM(1,1) decreases by 3.32% compared to GM(1,1) during the simulation period and declines significantly by 27.41% during the validation period. As shown in Figure 5 and Table 5, from 2016 onward, the prediction results of ESGM(1,1) for ammonia nitrogen are closer to the measured values than those of GM(1,1), with reduced relative percentage errors. The relative errors during the validation period are substantially smaller than those of GM(1,1). In summary, ESGM(1,1) exhibits superior performance in both fitting accuracy and predictive capability than GM(1,1), while also expanding its range of applicability.
Water quality prediction
Given the good performance of the ESGM(1,1) model in terms of fitting accuracy and predictive performance, this model was used to predict the future 3-year ammonia nitrogen and permanganate index concentrations at the Anliu Station. The prediction results are listed in Table 6. In the next 3 years, the ammonia nitrogen concentrations will be 0.20, 0.15, and 0.11 mg/L, respectively. For the permanganate index, the concentrations over the next 3 years will be 4.51, 4.36, and 4.22 mg/L. The predicted values aligns with the downward trends observed in the ammonia nitrogen and permanganate indices from 2010 to 2021.
Prediction results of the average concentration of ammonia nitrogen and permanganate
Prediction period . | Ammonia nitrogen (mg/L) . | Permanganate index (mg/L) . |
---|---|---|
The first year | 0.20 | 4.51 |
The second year | 0.15 | 4.36 |
The third year | 0.11 | 4.22 |
Prediction period . | Ammonia nitrogen (mg/L) . | Permanganate index (mg/L) . |
---|---|---|
The first year | 0.20 | 4.51 |
The second year | 0.15 | 4.36 |
The third year | 0.11 | 4.22 |
The observed trends in water quality improvement at the Anliu Station indicate a positive trajectory. To sustain this trend, it is essential to implement targeted measures, including identifying pollutant sources, sting and enforcing a more stringent water resource management system, and establishing a watershed management organization for trans-regional rivers such as the Guo River to coordinate pollution control efforts between upstream and downstream areas.
CONCLUSIONS
Based on the exponential smoothing theory, this study proposes an exponential gray model (ESGM(1,1)) that integrates exponential smoothing with the GM. The model was validated and applied for prediction using ammonia nitrogen and permanganate index data from the Anliu Station on the Guo River. The results demonstrate that ESGM(1,1) exhibits broader applicability and superior predictive performance than GM(1,1), while also reducing the impact of the initial sequence on prediction accuracy. Furthermore, ESGM(1,1) was employed to predict the water quality of the study area for the next 3 years. The results indicate a decreasing trend in both water quality indicators, consistent with the overall trend observed from 2010 to 2021, suggesting an improving trend in water quality. In practice, many monitoring stations initiated water quality monitoring relatively late, resulting in short data series that are unsuitable for machine learning-based prediction methods. Additionally, stricter water quality control measures in recent years have led to increased fluctuations in water quality data, making GM(1,1) unsuitable for accurate predictions. In such cases, ESGM(1,1) can be effectively utilized for local water pollution control and environmental protection.
Similar to GM(1,1), ESGM(1,1) assumes that the original data series follows linear growth or exponential decay characteristics, fitting data changes through response equations. Although ESGM(1,1) can mitigate the impact of data fluctuations on model fitting to some extent, it may still struggle to eliminate disturbances caused by extreme fluctuations or abrupt changes when they are frequent. Therefore, ESGM(1,1) is more suitable for short- to medium-term forecasting with relatively short time series. Future research should focus on extending its application to longer time scales. It is planned to develop a dynamic prediction model by incorporating data trends to further enhance prediction performance and accuracy.
AUTHOR CONTRIBUTION
M.S. conceptualized the project and developed the methodology. J.H. conducted data curation and prepared the original draft. P.L. performed data curation and supervised. J.G. validated the manuscript, and J.L. wrote, reviewed, and edited. All authors have read and approved the final manuscript.
ACKNOWLEDGEMENTS
This work was supported by the Anhui Provincial Natural Science Foundation (Nos. 2208085US07 and 2408085MD093).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.