Using a grey multivariate model to predict impacts on the water quality of the Zhanghe River in China

In order to assess the social factors affecting the water quality of the Zhanghe River and predict the potential impact of growth in primary, secondary, tertiary industries and population on water quality of the Zhanghe River in the next few years, a deformation derivative cumulative grey multiple convolution model (DGMC(1,N)) was applied. In order to improve the accuracy of the model, the accumulation of deformation derivatives is introduced, and the particle swarm optimization algorithm is used to solve the optimal order. The DGMC(1,N) model was compared with GM(1,2) and GM(1,1) models. The results show that the DGMC(1,N) model has the highest prediction accuracy. Finally, DGMC(1,N) model is used to predict the potential impact of growth in primary, secondary, tertiary industries and population on water quality in the Zhanghe River (using chemical oxygen demand (COD) as the water quality indicator).


INTRODUCTION
Due to the continuous development of the economy, growing human activities increase the potential to pollute waterways and degrade the environment. Water pollution not only affects human health and the health of ecosystems, it also restricts social and economic development. Therefore, it is important to understand the relationship between economic development and the pollution of waterways in order to identify water pollution mitigation strategies. In order to improve water quality, extensive research has been conducted on the relationship between water quality and its influencing factors. Kyei & Hassan (2019) analyzed the economic and environmental impact of water pollution taxes in the Olifants Basin, South Africa, using an environmentally scalable general equilibrium model. Nguyen et al. (2018) developed a model to evaluate the relationship between economic activity and water pollution in Viet Nam in order to identify water pollution mitigation strategies within the context of economic development. In the case of the Samarinda River in East Kalimantan in Indonesia, Vita et al. (2018) adopted a field observation method based on interviews of the government of society, industry, public welfare activities along the river and environmental departments, and further used an analytic hierarchy process to establish data, and identify countermeasures for controlling the water pollution. Choi et al. (2015) assessed the relationship between the economic growth in four major river basins in Korea and two key water quality indicators. Li & Lu (2020) tested the impact of regional integration on cross-border pollution under the auspices of the Yangtze River Economic Belt by using the difference-in-difference model. The results showed that regional integration could significantly reduce cross-border water pollution. Liu et al. (2020) used qualitative and quantitative analyses to study the relationship between water pollution and economic growth in the Nansihu River basin in China. Cullis et al. (2019) discuss the increasing risks to water quality in the Begg River basin in South Africa as a result of climate change and rapid urban development, as well as the direct and indirect economic impacts this may have on the agricultural sector. Based on the threat to water quality in the American Midwest posed by agricultural runoff, Floress et al. (2017) proposed and tested a structural equation model based on the dual interest theory to test whether, and to what extent, the relationship between awareness and agribusiness attitudes is regulated by management attitudes. Qualitative assessments of the Lake Merrill basin by Pires et al. (2020), which were performed using discriminant analysis methods, concluded that seasonality mainly affected anthropogenic sources such as agricultural activities and household emissions. An assessment of the current state of water quality in Lake Wadi El-Rajan, particularly following the increase in uncontrolled economic activity within its borders, was presented by Goher et al. (2019). Similarly, De Mello et al. (2020) outline the relationship between land use/land cover and water quality in Brazil and its impact on water quality. Kuwayama et al. (2020) examined long-term trends in surface water quality, nutrient pollution and its potential economic impact in Texas, USA while Du Plessis et al. (2015) quantified the complex relationship between land cover and specific water quality parameters and developed a unique model equation to predict water quality in the Grootdraai Dam catchment due to the importance of water quality within the basin to the country's future economic growth.
While researchers have analyzed the impact of the economy on water pollution from different perspectives, few predict the impact of economic development on water resources in the future.
The accurate prediction of levels of water pollution can inform the identification of countermeasures that are needed in response to the direction of future economic development. Extreme learning machine was used by Saberi-Movahed & Mehrpooya (2020) to predict longitudinal dispersion coefficients and evaluate the pollution status of water pipelines. Najafzadeh & Emamgholizadeh (2019) estimated the biochemical oxygen demand, dissolved oxygen and chemical oxygen demand (COD) using gene expression programming, evolutionary polynomial regression and a model tree while  estimated biochemical oxygen demand and chemical oxygen demand using multivariate adaptive regression splines and least squares support vector machines. Mustafa et al. (2021) used support vector machines to build prediction models of water quality in the Kelantan River based on historical data collected from different sites. Xinzi et al. (2020) applied correlation analysis and path analysis to identify the causal relationship between urbanization and water quality indicators, and then comprehensive water quality indicators and related urbanization parameters were input into a back-propagation neural network for water quality prediction. Bao et al. (2020) predicted the water quality index for free surface wetlands by using three soft computing techniques, namely adaptive neurofuzzy systems, artificial neural networks and group data processing.
Liu & Wu (2021) used a new adjacent non-homogeneous grey model to predict renewable energy consumption in Europe Bilgaev et al. (2020) analyzed the environmental and socio-economic development indicators of Baikal Island region with the method of constructing time series and structural transfer. Zhang et al. (2020) used the grey water footprint to estimate the different water bodies in 31 provinces (autonomous regions) in China. The poor information principle of the grey system theory was used to predict the rural water environment with a network search method to provide support for rural water environmental governance. Shen et al. (2020) compared a residual correction grey model with a grey topology prediction method in order to predict the water quality of the artificial reef area in Haizhou Bay. Yuan et al. (2019) used a fractional grey scale power model to predict water consumption while Jiang et al. (2019) used grey multivariate forecasting models to predict the long-term electricity consumption of power companies.
Sahin (2019) combines linear and non-linear metabolic models with optimization techniques to accurately predict Turkey's greenhouse gas emissions. Zhong et al. (2017) use the grey model of particle swarm optimization algorithm to predict shortterm photovoltaic power generation, which improves the prediction accuracy compared with the traditional grey model. Utkucan (2021) uses genetic algorithms to optimize parameters, and proposes a new nonlinear grey Bernoulli model to study and analyze energy trumpets. Hu (2020) uses the grey multivariate forecasting model to predict bankruptcy, and uses genetic algorithm to avoid the influence of time on the result. Wang & Hao (2016) nonlinearly optimized the background value of the grey convolution model and compared the prediction of industrial energy consumption with the traditional model. Although the research is extensive, few researchers have used a grey multivariable model to analyze the impact of future economic development on water quality. A key feature of a grey model is that is cane been used when there is little information.
In this study, a grey multivariable model was used to analyze and forecast the water pollution in the upper reaches of the Zhanghe River by the added value of the first, second and third industries and the population added value. A new accumulation method was adopted to improve the prediction accuracy of the original grey multivariable model.
In this paper, Section 2 describes the study area and the indicators. Section 3 outlines the forecasting method while Section 4 analyzes the impact of local socioeconomic factors on the water quality of an upstream reach of the Zhanghe River in China.

The study area
The Zhanghe River is a tributary of the Hai River. Its source is located in the Shanxi Province and flows through Shanxi, Hebei and Henan Provinces. The upper reaches of the river mainly comprise two tributaries, namely the Qingzhanghe River and the Zhuozhanghe River. The study area is shown in Figure 1.

Water quality indicator and data
When assessing the level of pollution of a river, chemical oxygen demand is an important and fast-determinable indicator of organic pollution. The COD is a measure of the water quality of a river. The higher the COD level, the greater the degradation of water quality. The COD value is reported once a month. This article takes the average of the 12-month values as the research object. The COD levels in the upper reaches of the Zhanghe River from 2013 to 2018 have been reported by Handan Ecological Environment Bureau.
Primary industry is the foundation of the national economy, while secondary industry is a leading industry of the national economy and tertiary industry is the key to providing employment in China. Population is the main indicator of social factors. The data on socio-economic indicators was obtained from the 'China County Statistical Yearbook' and 'Hebei Economic Yearbook' from 2013 to 2018.
Consequently, the relationship between the socio-economic indicators and COD levels was analyzed.

Deformable grey multivariable convolution (DGMC) model
is a first order accumulation sequence. Considering the definition of the deformable derivative (Wu & Zhao 2019), the a(0 a 1)-order accumulation is The DGMC(1,N) modelling process is described below.
(1) A non-negative sequence is The sequences of the related factors are: (2) The a-order of DGMC(1,N) is and u is the parameter to be estimated. The parameters can be obtained by using the least squares method which minimizes the sum of the squared residuals. The unknown parameters can be solved by the following formulas: The time response formula obtained from the Gaussian formula is: (4) Therefore, the sequencê The a-order accumulative reduction thuŝ (5) Evaluate the model using the mean absolute percentage error (MAPE), as follows: 2.4. GM(1,1) (1) A non-negative original sequence is (Yin et al. 2017) Then the differential equation of GM(1,1) is (2) Assumeâ parameter to be estimated,â ¼ a m , the least squares estimation minimizes the sum of the squared residuals, we can obtain the parameters by using the least squares method. The unknown parameters can be solved by the following formulas: (3) The time response formula obtained from the Gaussian formula iŝ 2.5. GM(1,2) (1) GM(1,2) represents a first-order differential equation with two variables, the differential equation is (Li et al. 2016) (2) Assumeâ parameter to be estimated,â ¼ a, b ½ T , we can obtain the parameters by using the least squares method. The unknown parameters can be solved by the following formulas: . .

COMPARATIVE PREDICTION ACCURACY
DGMC(1,2), GM(1,2) and GM(1,1) models were fitted to the annual COD concentrations for the period 2013 to 2018 reported by the Handan Ecological Environment Bureau. The COD concentrations and the model fitting results are shown in Table 1 and Figure 2. The MAPE value for the DGMC(1,2) model is 4.9% in comparison to 31.8% for the GM(1,2) model and 5.7% for the GM(1,1) model. Compared with traditional grey models, the DGMC(1,2) model improves the prediction accuracy. Consequently, the DGMC(1,2) was used to predict the annual average COD values.

THE INFLUENCE OF SOCIAL DEVELOPMENT ON WATER QUALITY
The next step was to assess the relationship between COD levels and primary, secondary, and tertiary industries and population, respectively.
In order to forecast results, a value for the growth rate of the added value of primary industry was estimated. In the past seven years, the contribution of China's primary industry to GDP has been 0.3%. Noting that Handan City is a fourth-tier city with a large population and mainly relies on agriculture, forestry, animal husbandry and fishery. For the period 2013-2018, the calculated growth rates of the value added to the primary industry were 4.76%, 3.45%, À4.66%, À8.46% and À8.08%, respectively. Consequently, the assumed value-added rate of the primary industry was between 5% and À20%. From the estimated growth rate of the primary industry added value, the primary industry's added value for the period 2019-2022 was estimated, and the DGMC(1,2) model was used to predict the annual average COD value for 2019-2022.
When the growth rate is 5%,x (0) 1 ¼ {11:69, 17:81, 28:38, 45:06}. When the growth rate is À20%,x (0) 1 ¼ {9:53, 8:65, 6:18, 1:95}. As shown in Figure 3, the predicted values of COD rise when the value-added growth rate of the primary industry is 5%. Likewise, the predicted values of COD fall when the rate of the added value of the primary industry falls and the water quality improves. When the growth level was 1, 3 and 10%, respectively, the COD predicted by the model showed an increasing trend. If the growth rate is 10%, the COD levels would reach 65 mg/L by 2022 (in the absence of any pollution reduction strategies).
In order to further analyze this phenomenon, it is necessary to understand the added value of primary industry in the six counties across the upper reaches of the Zhanghe River, respectively. The added value of the primary industry in the six counties (Shexian, Cixian, Weixian, Daming, Linzhang and Cheng'an) from 2013 to 2018 and the resulting COD levels are shown in Table 3. The COD for the six counties predicted by the new DGMC(1,2) model are given in Table 4. It can be seen from Table 4 that the MAPE values for the six counties are all less than 10%. Assuming that the growth rate of primary industry in the six counties is the same as the overall growth rate, i.e. between 5% and À20%, the predicted impact on COD in the six counties is shown in Figure 4.
Handan is an underdeveloped city, where the proportion of the primary industry is large. Therefore, the development of primary industries will not be reduced in order to reduce the pollution of the rivers. Consequently, there will need to be a focus on optimizing the agricultural industrial structure, adjusting the agricultural structure along the river, strengthening publicity and supervision, and eliminating water pollution at its source, possibly through the reduction of the area of cultivated land along the river and the use of pesticides and fertilizers.

Predicting COD under secondary industry
The added value of the secondary industry from 2013 to 2018 is shown in Table 5. Following the calculation procedure set out in Section 4.1, the results are shown in Table 6. The MAPE of DGMC(1,2) is 7.35%.
The growth rate of the added value of the secondary industry from 2013 to 2018 were 0.17%, À1.09%, 0.15%, 19.94% and À19.96%. The results predicted for À5.0%, 5.0%, 10.0% and 15.0% growth rates are shown in Figure 5. When the growth rate was 5%, the results showed that with the growth rate increasing, the COD value was 10.5 mg/L in 2022, and the overall trend was increasing. When the growth rate is 10%, the COD value is 13.33 mg/L by 2022, which does not exceed the national standard of 20 mg/L. When the growth rate is 15%, the COD value will be 16.46 mg/L by 2022. However, when the growth rate is reduced by 5%, COD shows a downward trend, and will drop to 5.68 mg/L by 2022.   Given this potential impact on COD, it is meaningful to study the structure of secondary industries. Secondary industries include industry and construction which are heavy polluting industries. The industrial value added and construction value added from 2013 to 2018 (data source: Hebei Economic Yearbook) are given in Table 7. The COD predicted by DGMC(1,2) models are shown in Table 8. The MAPE of COD is less than 10%, for both industry and the construction.
Applying the adopted secondary industry growth rates to industrial growth, it can be seen from Figure 6, that when the industrial growth rate is 5%, COD would reach 14.30 mg/L in 2022, while if the construction growth rate is 5% then COD would reach 8.87 mg/L in 2022. Assuming growth rates of 10% for industry and construction, the COD would reach 19.51 mg/L and 9.80 mg/L in 2022, respectively. Assuming that the growth rates of the industrial and construction industries are both 15%, the COD would reach 25.28 mg/L and 10.84 mg/L in 2022, respectively.
Handan City attaches great importance to the adjustment of industrial structure, vigorously implements an innovation-driven development strategy, and in-depth advances the supply-side structural reforms, focusing on the   transformation of traditional industries, the cultivation of strategic emerging industries, and the development of modern service industries.
These results indicate that the impact construction industry on water quality in the upper reaches of the Zhanghe River is modest and not as great as the impact of industrial growth. From the perspective of water quality, this indicates that it would be beneficial to reduce the proportion of industrial development and that investment in the construction industry could be increased.

Predicted COD under tertiary industry
The Table 8 shows the added value of tertiary industry from 2013 to 2018. Following the calculation procedure set out in Section 4.1, the results are shown in Table 9. The MAPE of the DGMC(1,2) is 6.66%.
The growth rates of the added value of tertiary industry from 2013 to 2018 were 1.41%, 7.72%, À5.54%, 17.92% and À1.03%. The results predicted for À10%, À5%, 5%, 10% and 15% growth rates are shown in Figure 7. Under a growth rate of À5%, the COD would be 6.24 mg/L by 2022. Under a growth rate of À10%, the COD would be 4.23 mg/L by 2022. The trend in COD is consistent with the growth rate. Under a growth rate of 15%, the predicted COD by 2022 would be 20.86 mg/L, which exceeds national regulations the standard of 20 mg/L. This indicates that from a water quality COD (mg/L) 9.09 6 6.67 6.54 7.08 8.92 Figure 6 | The predicted COD based on secondary industry growth rates. perspective that tertiary industry growth rates of up to 15% could be sustained up to 2023 but that adverse impacts would arise in subsequent years depending on the growth rate. Table 10 shows the annual average of population from 2013 to 2018. It can be seen from the data that the population first increases and then decreases, and the data basically tends to be stable. Following the calculation procedure set out in Section 4.1, the results are shown in Table 11. The MAPE of the DGMC(1,2) is 2.24%. The growth rates of the population from 2013 to 2018 were 1.21%, 1.67%, À4.36%, À0.19% and 0.69%. From the data from 2013 to 2018, it can be concluded that the population has a trend of a slow decline. Because the population of China is influenced by national policies, growth rates of À5% and 5% were assessed. The results are shown in Figure 8. It can be seen from Figure 8 that when the population growth rate is À5%, the COD experiences a stable decline. When the growth rate is 5%, the   COD shows a trend of continuous rise, reaching 53.42 mg/L by 2022. This indicates that if the growth rate of the six counties of Handan city is 5%, then this population growth would have a significant adverse impact on water quality in the Zhanghe River in the absence of any additional pollution reduction strategies.

CONCLUSIONS
Most research on water quality issues involves multivariate models. Through comparative analysis of DGMC(1, 2), GM(1, 2) and GM(1, 1) models, it was concluded that a DGMC(1,2) model which was able to analyze and predict the water quality of the Zhanghe River from 2013 to 2022, using COD as the indicator of water quality, to a high level of accuracy. The DGMC(1,2) model was used to analyze the relationship between COD in the upper reaches of the Zhanghe River and the added value of the primary, secondary and tertiary industries as well as population. It was found that growth of primary, secondary and tertiary industries, as well as population, would all adversely impact on water quality in the in the absence of any additional pollution reduction strategies.
Under an assumed growth rate of 5% the ranking of the adverse impact on COD in 2022 (highest to lowest) would be population (53.42 mg/L), primary industry (45.06 mg/L), industrial development (secondary) (14.30 mg/L), tertiary industry (11.09 mg/L), and the construction industry (secondary) (8.87 mg/L).
While the model can also be used to inform decision makers in other cities of the primary sources of water quality problems in rivers and to help local governments focus broadly on pollution reduction strategies, the uncertainties of the social economy and the limitations of the model mean that more detailed models calibrated to local conditions should be used to develop pollution reduction strategies which have the greatest potential to deliver environmental benefits while sustaining the economy.