Abstract
The fate of pollutants in rivers is mainly affected by the longitudinal dispersion coefficient (Kx). Thus, improved Kx estimation could greatly enhance the water quality management of rivers. In this regard, evolutionary polynomial regression (EPR) was used to accurately predict Kx in rivers as a function of flow depth, channel width, and average and shear velocities. The predicted Kx by EPR modelling was compared with results obtained by more conventional Kx estimation formulas. Initial data analyses using general linear models of variance revealed that all input variables were statistically significant for Kx estimation. The calibrated EPR model showed good performance with coefficient of determination and root mean square error of 0.82 and 79 m2/s, respectively. This is better that other more conventional estimation methods. Application of sensitivity analysis for the EPR model indicated that channel width, average velocity, shear velocity, and flow depth were the main variables in descending order that affected Kx variability. The introduced EPR estimation model for Kx can be incorporated in one-dimensional water quality models for improved simulation of solute concentration in natural rivers.
INTRODUCTION
The advection-dispersion equation has been widely used to investigate the behaviour of pollutants originating far upstream from non-steady point sources (Seo & Cheong 1998). When applying Equation (1) for simulation of solute concentration in rivers, Kx determination is important. The regular approach to determine Kx for rivers is tracer measurements. However, although tracer measurement approaches have been extensively performed for determination of Kx, their performance is confined by several limitations (Noori et al. 2017). For this reason, empirical methods for Kx estimation have been widely applied by researchers. These methods include both user-friendly regression methods and more sophisticated black-box models based on artificial intelligence techniques (AIT) (Najafzadeh & Sattar 2015; Haghiabi 2016, 2017; Najafzadeh & Tafarojnoruz 2016; Alizadeh 2017; Alizadeh et al. 2017a; Parsaie & Haghiabi 2017a, 2017b; Wang et al. 2017). Since AIT are black-box models, their results are less applicable compared to more general techniques such as user-friendly regression methods for Kx estimation. In other words, it is not possible for other users to directly apply calibrated AIT to other cases of Kx estimation (i.e., different natural rivers) (Ebtehaj et al. 2015; Noori et al. 2015). Thus, there are problems involved in directly combining AIT with a physically based 1D model to enhance the accuracy of the spatiotemporal simulation of solute concentration in water bodies.
In this study, we suggest an alternative method for the prediction of Kx, evolutionary polynomial regression (EPR). The EPR is a hybrid regression method that was first introduced in the field of water resources by Giustolisi & Savic (2006). Abdul-Ghani et al. (2012) used EPR to predict the load of sediment in Malaysian rivers. Reyhani et al. (2013) investigated the EPR performance to compute the normalized flux, relative fouling and turbidity rejection in a wastewater treatment plant. EPR has also been applied in the field of water supply and river discharge prediction (Mounce et al. 2016; Rezaie-Balf & Kisi 2017).
Reported results from EPR applications have revealed its advantages, especially where there is a small subset of noisy or missing input data. In fact, EPR can be adapted to predict any arbitrary function in spite of missing input data. Meanwhile, due to the difficulty in collecting all relevant input data and the limited number of available datasets, Kx prediction does not include necessary input information (such as cross-mixing effect, transient storage, shearing advection and lateral mixing) that influence this parameter in rivers (Lanzoni et al. 2017). Excluding this information often results in major differences in predicted Kx as compared to measurements. Therefore, the EPR may be an appropriate technique for Kx estimation in natural rivers. In view of this, the paper first aims to present a prediction model for Kx based on the EPR method. Thereafter, the introduced EPR model performance is compared to some other more conventional methods for Kx estimation. Note that since the approach is a formula-based user-friendly method, researchers and practitioners can apply the developed method to predict Kx in any river type while applying their own datasets.
MATERIALS AND METHODS
Kx determination methods and data

Empirical equations for estimation of Kx
There are some general guidelines in modelling to assure the modeller that the tuned model is comprehensive and can be used for other situations (new datasets). In our case, both calibration and validation patterns were selected so as to include high and low extreme values of Kx. In general, to have a representative model requires representative data. In a strict sense, this is only possible to check by using a variety of databases. However, Table 2 shows descriptive statistics of the datasets used in calibration and validation. The table shows that the largest Kx (1,486.5 m2/s) is about twice the size of the second largest Kx. A common way is to exclude such outliers (Tayfur & Singh 2005; Li et al. 2013; Disley et al. 2015). Although removing the largest Kx value may enhance Kx model performance, it limits the developed model application. Thus, this study aimed to develop a Kx estimation model using the EPR method while keeping outliers.
Statistical indices of the parameters used for EPR model
Parameters . | B (m) . | H (m) . | U (m/s) . | U* (m/s) . | Kx (m2/s) . |
---|---|---|---|---|---|
Training stage | |||||
Maximum | 196.6 | 8.2 | 1.73 | 0.33 | 1,486.5 |
Minimum | 1.4 | 0.14 | 0.029 | 0.0016 | 0.2 |
Average | 44.55 | 1.25 | 0.45 | 0.08 | 81.61 |
Validating stage | |||||
Maximum | 253.6 | 8.07 | 1.29 | 0.55 | 836.13 |
Minimum | 10.97 | 0.32 | 0.08 | 0.0079 | 1.7 |
Average | 60.83 | 1.55 | 0.51 | 0.09 | 87.04 |
Parameters . | B (m) . | H (m) . | U (m/s) . | U* (m/s) . | Kx (m2/s) . |
---|---|---|---|---|---|
Training stage | |||||
Maximum | 196.6 | 8.2 | 1.73 | 0.33 | 1,486.5 |
Minimum | 1.4 | 0.14 | 0.029 | 0.0016 | 0.2 |
Average | 44.55 | 1.25 | 0.45 | 0.08 | 81.61 |
Validating stage | |||||
Maximum | 253.6 | 8.07 | 1.29 | 0.55 | 836.13 |
Minimum | 10.97 | 0.32 | 0.08 | 0.0079 | 1.7 |
Average | 60.83 | 1.55 | 0.51 | 0.09 | 87.04 |
Development of EPR model



The inner functions of this model are considered to be linear, although they can be non-linear if the exponents are different from one. The initial validation of models produced by EPR is conducted on the basis of phenomena physical knowledge (Giustolisi & Savic 2006). Further details on EPR are described by Giustolisi & Savic (2006).
In this study, the software package EPR-MOGA-XL, working in MS-EXCEL environment, was used for application of EPR in the Kx estimation. The EPR model for prediction of Kx based on inputs B, H, U and was used according to Equation (2). The setting parameters used to evaluate Kx estimation are given in Table 3. Figure 1 shows the different steps for developing the Kx estimation model using EPR.
Details of setting parameters for development of EPR model
Description of parameter . | Setting of parameters . |
---|---|
Function set | Exponential |
Type of model | Statistical |
Type of presentation | ![]() |
Exponents range | [−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2] |
Number of mathematical terms | 4 |
Bias ![]() | 0 |
Description of parameter . | Setting of parameters . |
---|---|
Function set | Exponential |
Type of model | Statistical |
Type of presentation | ![]() |
Exponents range | [−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2] |
Number of mathematical terms | 4 |
Bias ![]() | 0 |
Different steps for development of the Kx estimation model using EPR method.
It is worth noting that a sensitivity analysis was performed to determine the importance of each input for Kx estimation. In this regard, one parameter of Equation (2) was eliminated each time to evaluate its effect on the EPR performance.
Criteria for evaluation of model performance






For perfect fit, R2 and RMSE should equal one and zero, respectively. If DR is equal to zero, the model has a perfect fit. Otherwise, if DR is less (larger) than zero, the model overestimates (underestimates) Kx, respectively. The accuracy ranges of a model are defined as the percentage of DR that falls between −0.3 and 0.3 (Seo & Cheong 1998; Kashefipour & Falconer 2002).
RESULTS AND DISCUSSION
Model analysis
A general linear model analysis of variance (GLM-ANOVA) was applied to the results to statistically determine the significance of input variables. The GLM-ANOVA results revealed that the input variables B, H, U and were statistically significant (p-value < 0.05) for the Kx estimation.
EPR results and comparison with other Kx estimation models
Different runs were performed by application of 103 datasets for calibrating the EPR model for Kx estimation; also, the rest of the data (46 datasets) kept away to validate the model performance.
After training of the EPR model, it provided several equations, as shown in Table 4. Based on the statistical indicators and a trade-off between accuracy and parsimony, Equation (9) had the best performance among all models. Note that GA population size, crossover probability rate and mutation probability rate for the best tuned model, i.e., Equation (9) were 40, 0.4 and 0.1, respectively.
List of explicit equations given by the EPR model
Model . | Equation . | R2 . | RMSE (m2/s) . |
---|---|---|---|
![]() | (8) | 0.76 | 93 |
![]() | (9) | 0.80 | 90 |
![]() | (10) | 0.79 | 90 |
![]() | (11) | 0.79 | 91 |
Model . | Equation . | R2 . | RMSE (m2/s) . |
---|---|---|---|
![]() | (8) | 0.76 | 93 |
![]() | (9) | 0.80 | 90 |
![]() | (10) | 0.79 | 90 |
![]() | (11) | 0.79 | 91 |
The statistical indices show that Equation (9) could predict Kx with high accuracy (calibration). R2 and RMSE were equal to 0.80 and 90 m2/s, respectively, which shows that calibration performed well.
Similar to calibration, validation showed that EPR produced a Kx with good accuracy ( = 0.82, RMSE = 79 m2/s). The EPR results for calibration and validation are summarized in Table 5. In addition, error plot resulting from calibration and validation of the chosen EPR model (Equation (9)) confirm these results (Figure 2).
Results of Kx estimation models
Formula . | R2 . | RMSE (m2/s) . |
---|---|---|
EPR, calibration stage | 0.80 | 89 |
EPR, validation stage | 0.82 | 79 |
Elder (1959) | 0.07 | 186 |
Fischer (1975) | 0.32 | 568 |
Liu (1977) | 0.05 | 420 |
Seo & Cheong (1998) | 0.76 | 92 |
Deng et al. (2001) | 0.76 | 82 |
Kashefipour & Falconer (2002) | 0.61 | 104 |
GA proposed by Sahay & Dutta (2009) | 0.68 | 96 |
MT proposed by Etemad-Shahidi & Taghipour (2012) | 0.52 | 144 |
DE proposed by Li et al. (2013) | 0.74 | 85 |
Zeng & Huai (2014) | 0.73 | 95 |
Wang & Huai (2016) | 0.75 | 97 |
Alizadeh et al. (2017b) | 0.73 | 86 |
Formula . | R2 . | RMSE (m2/s) . |
---|---|---|
EPR, calibration stage | 0.80 | 89 |
EPR, validation stage | 0.82 | 79 |
Elder (1959) | 0.07 | 186 |
Fischer (1975) | 0.32 | 568 |
Liu (1977) | 0.05 | 420 |
Seo & Cheong (1998) | 0.76 | 92 |
Deng et al. (2001) | 0.76 | 82 |
Kashefipour & Falconer (2002) | 0.61 | 104 |
GA proposed by Sahay & Dutta (2009) | 0.68 | 96 |
MT proposed by Etemad-Shahidi & Taghipour (2012) | 0.52 | 144 |
DE proposed by Li et al. (2013) | 0.74 | 85 |
Zeng & Huai (2014) | 0.73 | 95 |
Wang & Huai (2016) | 0.75 | 97 |
Alizadeh et al. (2017b) | 0.73 | 86 |
Error plot using best-tuned EPR model during calibration and validation.
Scatter plots of predicted and observed Kx for validation of the EPR model and the models developed based on evolutionary algorithms are shown in Figure 3. These models are differential evolution (DE) model (Li et al. 2013), M5’ tree (MT) model (Etemad-Shahidi & Taghipour 2012), GA model (Sahay & Dutta 2009) and PSO model (Alizadeh et al. 2017b). Figure 3 shows that most predictions agree well with observations and that these fall on an almost straight line. However, a few predicted Kx values were greater than observations. The overall impression is, however, that the model yields satisfactory results.
Scatter plot of predicted and observed Kx for calibration and validation of the EPR model and the models developed based on evolutionary algorithms.
Scatter plot of predicted and observed Kx for calibration and validation of the EPR model and the models developed based on evolutionary algorithms.
The developed EPR performance was also compared with other Kx estimation models, such as the equations proposed by Elder (1959), Fischer (1975), Liu (1977), Seo & Cheong (1998), Deng et al. (2001), Kashefipour & Falconer (2002), Zeng & Huai (2014) and Wang & Huai (2016). This comparison is shown in Table 5. In addition, scatter plots of the equations for predicted and observed Kx in the validation step are shown in Figure 4. These results clearly show that the Elder (1959), Fischer (1975) and Liu (1977) equations performed least well for Kx estimation with R2 equal to 0.07, 0.32 and 0.05, respectively. Table 5 and Figures 3 and 4 show that Kx estimation by the EPR model has the best R2 and RMSE for validation.
Scatter plot of the EPR model and empirical equations for predicted and observed Kx in the validation step.
Scatter plot of the EPR model and empirical equations for predicted and observed Kx in the validation step.
The accuracy indicated by DR for the above-mentioned models are presented in Table 6 and Figure 5. According to these results, the accuracy for the EPR model (accuracy = 67.39%) is the best among all considered models. Kashefipour & Falconer (2002) and Alizadeh et al. (2017b) with accuracy equal to 56.51% are the second best models and the model proposed by Elder (1959) is found to be the least accurate model with accuracy equal to 2.17%.
DR results for Kx estimation models
Techniques . | DR < −1 . | − 1 < DR < −0.3 . | − 0.3 < DR < 0 . | 0 < DR < 0.3 . | 0.3 < DR < 1 . | 1 < DR . | Accuracy % . |
---|---|---|---|---|---|---|---|
EPR | 0 | 17.39 | 28.26 | 39.13 | 6.52 | 8.7 | 67.39 |
Elder (1959) | 93.47 | 4.34 | 2.17 | 0 | 0 | 0 | 2.17 |
Fischer (1975) | 8.69 | 19.56 | 4.34 | 23.91 | 36.95 | 6.55 | 28.25 |
Liu (1977) | 4.34 | 15.21 | 8.69 | 36.95 | 28.26 | 6.55 | 45.64 |
Seo & Cheong (1998) | 0 | 15.21 | 10.86 | 32.6 | 30.43 | 10.9 | 43.46 |
Deng et al. (2001) | 0 | 19.56 | 13.04 | 26.08 | 32.6 | 8.72 | 39.12 |
Kashefipour & Falconer (2002) | 6.52 | 15.21 | 43.47 | 13.04 | 13.04 | 8.72 | 56.51 |
GA (Sahay & Dutta 2009) | 0 | 19.56 | 10.86 | 26.08 | 39.13 | 4.37 | 36.94 |
MT (Etemad-Shahidi & Taghipour 2012) | 17.39 | 41.31 | 23.92 | 15.21 | 2.17 | 0 | 39.13 |
DE (Li et al. 2013) | 0 | 10.86 | 10.86 | 30.43 | 19.56 | 28.29 | 41.29 |
Zeng & Huai (2014) | 0 | 19.56 | 15.21 | 32.6 | 28.26 | 4.37 | 47.81 |
Wang & Huai (2016) | 21.73 | 17.39 | 32.6 | 17.39 | 10.89 | 49.99 | |
Alizadeh et al. (2017b) | 0 | 19.56 | 21.73 | 34.78 | 13.04 | 10.89 | 56.51 |
Techniques . | DR < −1 . | − 1 < DR < −0.3 . | − 0.3 < DR < 0 . | 0 < DR < 0.3 . | 0.3 < DR < 1 . | 1 < DR . | Accuracy % . |
---|---|---|---|---|---|---|---|
EPR | 0 | 17.39 | 28.26 | 39.13 | 6.52 | 8.7 | 67.39 |
Elder (1959) | 93.47 | 4.34 | 2.17 | 0 | 0 | 0 | 2.17 |
Fischer (1975) | 8.69 | 19.56 | 4.34 | 23.91 | 36.95 | 6.55 | 28.25 |
Liu (1977) | 4.34 | 15.21 | 8.69 | 36.95 | 28.26 | 6.55 | 45.64 |
Seo & Cheong (1998) | 0 | 15.21 | 10.86 | 32.6 | 30.43 | 10.9 | 43.46 |
Deng et al. (2001) | 0 | 19.56 | 13.04 | 26.08 | 32.6 | 8.72 | 39.12 |
Kashefipour & Falconer (2002) | 6.52 | 15.21 | 43.47 | 13.04 | 13.04 | 8.72 | 56.51 |
GA (Sahay & Dutta 2009) | 0 | 19.56 | 10.86 | 26.08 | 39.13 | 4.37 | 36.94 |
MT (Etemad-Shahidi & Taghipour 2012) | 17.39 | 41.31 | 23.92 | 15.21 | 2.17 | 0 | 39.13 |
DE (Li et al. 2013) | 0 | 10.86 | 10.86 | 30.43 | 19.56 | 28.29 | 41.29 |
Zeng & Huai (2014) | 0 | 19.56 | 15.21 | 32.6 | 28.26 | 4.37 | 47.81 |
Wang & Huai (2016) | 21.73 | 17.39 | 32.6 | 17.39 | 10.89 | 49.99 | |
Alizadeh et al. (2017b) | 0 | 19.56 | 21.73 | 34.78 | 13.04 | 10.89 | 56.51 |
Sensitivity analysis
The statistical error parameters from the sensitivity analysis are given in Table 7. These results demonstrate that B (R2 = 0.46, RMSE = 164 m2/s) was the most important variable in the Kx estimation whereas (R2 = 0.78, RMSE = 86 m2/s) had the least influence. Other important variables for Kx estimation are U and H, respectively. Therefore, it is clear that B, U,
and H in descending order are the most important variables for Kx estimation. This is in line with the results of the MT model reported by Etemad-Shahidi & Taghipour (2012) and Noori et al. (2017).
Sensitivity analysis results
Input parameters . | R2 . | RMSE (m2/s) . |
---|---|---|
![]() | 0.74 | 93 |
![]() | 0.69 | 1,001 |
![]() | 0.46 | 164 |
![]() | 0.78 | 86 |
Input parameters . | R2 . | RMSE (m2/s) . |
---|---|---|
![]() | 0.74 | 93 |
![]() | 0.69 | 1,001 |
![]() | 0.46 | 164 |
![]() | 0.78 | 86 |
CONCLUSIONS
In this paper, the EPR technique was applied to predict Kx for natural rivers. In this regard, governing variables for the Kx, i.e., H, B, U and were taken into account to develop the EPR model. The DE, MT and GA methods and regression equations proposed by other researchers were used in the comparison. Performance evaluation of Kx estimation models based on multiple error criteria (R2 and RMSE) showed that the developed EPR model had the highest R2 (0.82) and the least RMSE (79 m2/s) for validation and outperformed other suggested models such as DE (R2 = 0.74 and RMSE= 85 m2/s), Alizadeh et al. (2017b) (R2 = 0.73 and RMSE= 86), GA (R2 = 0.68 and RMSE= 96 m2/s) and MT (R2 = 0.52 and RMSE= 144 m2/s). Additionally, results showed that the second best model was roughly the one proposed by Deng et al. (2001) with R2 and RMSE equal to 0.75 and 82, respectively. The models proposed by Elder (1959), Fischer (1975) and Liu (1977) provided the worst performance with RMSE equal to 186, 568 and 420, respectively. Further investigations based on DR statistic revealed that the Kx estimation model developed by EPR and Elder (1959) had the best and the worst performance, respectively. In addition, based on SA results, B, U,
and H were the most important determining variables in descending order that affected Kx estimation for natural rivers, respectively.
Note, as we used data that have previously been used by other authors, and by comparing the results of these different models, we can say that we have results that are comparable to previously reported results.