Pakistan, being an agricultural country, highly depends on its natural water resources which originate from the upper regions of Hindu Kush-Karakoram-Himalaya Mountains and nourish one of the world's largest Indus Basin irrigation systems. This paper presents streamflow modelling and forecasting using signal difference average (SDA) based variational mode decomposition (VMD) combined with machine learning (ML) methods at Chitral and Tarbela stations on the Indus River network. For this purpose, VMD based random forest (VMD-RF), gradient boosting machine (VMD-GBM) and Bayesian regularized neural network (VMD-BRNN) have been chosen. Moreover, traditional time series flow models, that is seasonal autoregressive integrated moving average (SARIMA) and classical decomposition approach with particle swarm optimization-based support vector regression (PSO-SVR), are considered as benchmark models for comparison. The results show that overall, VMD-BRNN performed best, followed by VMD-GBM and VMD-RF, whereas SARIMA and PSO-SVR ranked last. Overall, SARIMA and PSO-SVR are failing to capture most of the peaks even during the training period whereas hybridization of VMD and ML methods has shown increased robustness of the models. The results show that the influential role of the high dimensional components and robustness on the river flow may be explored by most optimum SDA based VMD signals hybrid with BRNN method.

  • SDA based VMD combined with machine learning model is developed for streamflow modelling and forecasting at Chitral and Tarbela stations.

  • Modelling and forecasting capability are enhanced via SDA based VMD combined with Machine Learning methods.

  • VMD-BRNN is more capable of capturing and forecasting streamflow pattern, especially peaks as compared to VMD-RF and VMD-GBM methods.

It has been highlighted by many researchers and decision-makers about the existing hydrological resources related threats of frequent occurrence of floods and drought at global and regional level (Ostad-Ali-Askari et al. 2017). The history of floods has been reported, including a recent devastating event in 2010 due to heavy rains, which affected almost all the regions of Pakistan (Houze Jr et al. 2011). Low to moderate level of floods arrive every year, depending on the hydrological cycle, however, after every 7–8 years, severe flood unexpectedly appears due to some external forcing factors such as large-scale circulation and other global variations. They are responsible for the loss of lives, property damage and the destruction of bridges, dams and riverine areas. Snow/glacier melts from Hindu Kush-Karakoram-Himalaya Mountains are an elementary source of fresh water to the Indus River and feed one of the world's largest irrigation systems, whereas monsoon and seasonal rainfall are the major sources of water in the system (Hassan & Ansari 2015). Consequently, in some years, abundant rainfall and unexpected rise in snow/glacier melt due to high temperatures may result in flooding of the river. On the other hand, less or no seasonal precipitation in other years plays an important role in setting severe hydrological drought conditions.

Mathematical models have proven their capabilities in the most important tasks for problem solving in hydrology. An innovation was provided by Box et al. (2015) in the form of stochastic time series models such as the autoregressive moving average (ARMA) and further modifications to those models, for example autoregressive integrated moving average (ARIMA) and their seasonal counterpart (SARIMA) (Wood & O'Connell 1985; Tadesse & Dinka 2017). Furthermore, the modification of stochastic methods and implementation of machine learning (ML) techniques have led to an understanding of adequate modelling tactics according to the nature of the time series. Additionally, data-driven techniques due to their self-regularization characteristic, do not depend on the physical behaviour of the hydrological phenomenon and lead to an adequate accuracy in flow forecasting (Poul et al. 2019). Among available data-driven methods, Support Vector Regression (SVR) combined with optimization techniques such as genetic algorithm (GA), particle swarm optimisation (PSO), firefly algorithm (FA) etc., random forest (RF) and gradient boosting machine (GBM) are extensively applied in various fields, including hydrology and have proven their capabilities in model building and forecasting (Elbisy 2015; Touzani et al. 2018; Liu & Sun 2019; Liu et al. 2020). Moreover, conventional artificial neural network (ANN) may capture nonlinearity of the streamflow adequately well, which results in over fitting of the modelled data and hence may fail in the testing period with unseen data (Wang et al. 2021). Furthermore, a regularization approach in Bayesian regularized neural network (BRNN) using a penalty term may handle the complexity of the final network avoiding over/under estimation (Sinecen 2019), discussed in detail in the methodology section. BRNN has been utilized in model building and forecasting of different time series data with esteemed performance (Sinecen 2019), however, the same has not been studied in combination with signal difference average (SDA) based VMD for streamflow forecast previously.

Data pre-processing techniques have been coupled with different ML methods for accuracy enhancement of the models (Zuo et al. 2020; Wu et al. 2021). Wavelet transforms, empirical and ensemble empirical mode decomposition techniques are commonly used for decomposition of actual time series data into sub-series (Wu et al. 2021). Among recently available data decomposition techniques, variational mode decomposition (VMD) proposed by Dragomiretskiy & Zosso (2013) with solid mathematical foundation has been utilized successfully to enhance the robustness of the models in various fields including hydrology (Isham et al. 2018; Wang et al. 2021). Moreover, an important step in VMD is the selection of level of decomposition or mode number (k) of the intrinsic mode function (IMF) and it may affect further computation if not selected appropriately.

Furthermore, different approaches have been reported for selection of k in literature such as: the traditional empirical mode decomposition (EMD) method (Wu et al. 2021) or by observing the existence of central-frequency distortion of the previous IMF component (Zuo et al. 2020), whereas determination of k, based on the similarity concept between actual and ensemble decomposed IMFs, is applied in this paper known as SDA. The motivation for opting for the SDA method is its efficiency (Isham et al. 2018) of selection of k on different data sets with and without noise. A suitable selection of k can boost the effectiveness of VMD over the traditional EMD approach (Isham et al. 2018).

This paper presents streamflow modelling and forecasting using three approaches: (i) traditional time series method seasonal autoregressive integrated moving average (SARIMA); (ii) classical decomposition with particle swarm optimization-based support vector regression (PSO-SVR); and (iii) VMD based ML methods like VMD-RF, VMD-GBM and VMD-BRNN. One of the main objectives of this paper is to distinguish the effective utilization of SDA based VMD on selected ML methods over conventional approaches, which has never been applied previously.

Study area

The Chitral River, one of the major tributaries contributing indirectly to the Indus River flow, is most of the time nourished by glacial melt from the upper regions of Pakistan. The river flow characteristics over the years are steady with the only exception of becoming is enhanced during summer months temperature-dependent snow/glacier melt (Hassan & Ansari 2015). The Chitral River situated at 35°50 N and 71°48 E (Hassan & Ansari 2015) in the upstream section between Mastuj and Chitral towns (average elevation of 1500 m.a.s.l) and is known as Yarkhum River. Unlike Chitral River, Tarbela station is one of the major reservoirs of the Indus River with an annual inflow of 175.2 × 109 m3/year consisting of about 70% only from melting of snow/glacier (Charles et al. 2018). The reservoir has been functional since 1976, having a storage capacity of about 10.3 billion m3, which is now reduced by 28%. It mainly serves in hydropower generation and irrigation while regulating floods triggered by summer monsoon (Khattak et al. 2015). Monthly observed streamflow data of the Chitral River at Chitral and the Indus River at Tarbela stations from April 1969 to March 2008 (468 data points) and from April 1976 to March 2011 (420 data points), respectively, were acquired from the Water and Power Development Authority (WAPDA), Pakistan. Moreover, the last 36 monthly values from each dataset (April 2005 to March 2008 of Chitral and April 2008 to March 2011 of Tarbela) are exempt from model building and the same are used for model testing.

Streamflow data has a strong seasonal pattern for both the stations (Figure 1). Moreover, the data for Chitral station has smooth recession flow whereas Tarbela station shows a jerky pattern (Figure 1). This shows that the water comes to Tarbela from the huge upper Indus basin comprised of different climatic regions (Charles et al. 2018). Furthermore, data average (Av) and standard deviation (SD) are analysed to understand the stationarity of the data estimated from non-overlapping equal length intervals (Table 1). It is found that the data has variation in Av and SD values (Av = 23.97 and SD = 18.67) and (Av = 121.91 and SD = 173.56) for Chitral and Tarbela stations, respectively. Additionally, Av and SD for the last interval (including the testing period) has deviated values as compared to other intervals at the Chitral station (Table 1). On the contrary, variation in Av is minimum among all the intervals for Tarbela, showing consistency in the data pattern.

Table 1

Basic statistics of streamflow data at Chitral and Tarbela stations

ParametersChitral (m3/s) data
Tarbela (m3/s) data
EntireTrainingTestingEntireTrainingTesting
Minimum 56.38 56.38 67.14 269.01 269.01 348.03 
Maximum 1251 1134.48 1251 9455.05 8706.53 9455.05 
Average (Av) 281.47 277.87 322.87 2355.38 2354.78 2379.88 
Av1 282.61 – – 2323.98 – – 
Av2 271.54 – – 2438.76 – – 
Av3 274.45 – – 2316.85 – – 
Av4 295.51 – – 2341.95 – – 
Standard Deviation (SD) 274.31 270.82 311.01 1830.17 1822.90 1962.11 
SD1 268.79 – – 1716.77 – – 
SD2 267.59 – – 1890.33 – – 
SD3 273.03 – – 1863.55 – – 
SD4 286.26 – – 1842.57 – – 
Kurtosis 0.49 0.38 0.96 1.04 0.79 3.22 
Skewness 1.28 1.26 1.37 1.27 1.23 1.63 
ParametersChitral (m3/s) data
Tarbela (m3/s) data
EntireTrainingTestingEntireTrainingTesting
Minimum 56.38 56.38 67.14 269.01 269.01 348.03 
Maximum 1251 1134.48 1251 9455.05 8706.53 9455.05 
Average (Av) 281.47 277.87 322.87 2355.38 2354.78 2379.88 
Av1 282.61 – – 2323.98 – – 
Av2 271.54 – – 2438.76 – – 
Av3 274.45 – – 2316.85 – – 
Av4 295.51 – – 2341.95 – – 
Standard Deviation (SD) 274.31 270.82 311.01 1830.17 1822.90 1962.11 
SD1 268.79 – – 1716.77 – – 
SD2 267.59 – – 1890.33 – – 
SD3 273.03 – – 1863.55 – – 
SD4 286.26 – – 1842.57 – – 
Kurtosis 0.49 0.38 0.96 1.04 0.79 3.22 
Skewness 1.28 1.26 1.37 1.27 1.23 1.63 

Average (Av) and Standard Deviation (SD) are further computed for four equal and non-overlapping intervals.

Figure 1

Strong seasonal pattern is obvious from the (I) Time series, (II) ACF and (III) PACF plots of (a) Chitral River flow at Chitral stations and (b) Indus River flow at Tarbela station. Moreover, PACF has significant spikes at lag 1 and 2 suggesting the expected order of the AR model.

Figure 1

Strong seasonal pattern is obvious from the (I) Time series, (II) ACF and (III) PACF plots of (a) Chitral River flow at Chitral stations and (b) Indus River flow at Tarbela station. Moreover, PACF has significant spikes at lag 1 and 2 suggesting the expected order of the AR model.

Close modal

Seasonal autoregressive integrated moving average model

Seasonal ARIMA (SARIMA) (p, d, q) × (P, D, Q)s is built to capture the seasonal behaviour of the time series, defined as:
(1)
where Yt is the stochastic time series, p and P represent the order of autoregressive (AR), q and Q represent order of moving average (MA) part, d and D denotes differencing of the time series in order to make the process stationary of the non-seasonal and seasonal components, respectively. Moreover, subscript s is for seasonal lag equal to 12 in our case. Furthermore, and are seasonal and non-seasonal AR, and are seasonal and non-seasonal MA and is the regressive operator, respectively (Box et al. 2015).

The adequate SARIMA models are selected on the basis of minimum Akaike information criterion (AIC) values and are found to be SARIMA (1,0,0) × (1,1,1)12 and SARIMA (0,0,1) × (1,1,2)12 for Chitral and Tarbela, respectively.

Particle swarm optimization-support vector regression

SVR is a set of supervised learning methods based on statistical learning theory used for classification and regression analysis, introduced by Vapnik and Chervonenkis (Chervonenkis 2013). A regression via SVR is obtained for a data set of length N by mapping input vector xi into a feature space using a nonlinear function ϖ(x) and further finding the regression function, f(x) =Ω .ϖ (x) + b. Hence, approximating the more accurate output yi with error bounded by δ, such that weight vector Ω and bias value b are the parameters of the regression function. Moreover, for the general purpose of SVR methodology, a kernel function K(xi, x) is incorporated that satisfies the Mercer's theorem (Vapnik 2013). The Radial Basis Function (RBF) is applied from available kernel functions as it can reduce the complexity for input data by effectively adjusting cost (C) and gamma (γ) values (Elbisy 2015), defined as: , whereas the width of the Gaussian function is controlled by the user-defined value of γ >0. Nevertheless, the performance of SVR is significantly affected by the parameters: δ, C and γ. Therefore, proper tuning of these parameters is essential for adequate training of the SVR. Among several available approaches, PSO is selected for tuning of the parameters.

PSO belongs to the swarm intelligence-based family applied in parameter optimization problem solving, introduced by Kennedy and Eberhart (Chen et al. 2019). Moreover, the PSO algorithm is simple from the structural aspect with a global and accurate examining aptitude, having a speedy convergence ability (Liu et al. 2020). The algorithm first considers all the particles having a potential solution (their position and velocity) with their fitness values decided by the object function. As the algorithm grows, the position and velocity of each particle is updated iteratively based on the fitness value in two aspects; one is the best solution of particle compared with its previous value (Pbest) and the other is the solution of particle in the whole population (Gbest), whereas particle swarm optimizer tracks the best performance of the position of each particle by estimating the fitness value via fitness function. Furthermore, after finishing with the search of Pbest and Gbest, the particle's velocity and position are updated (Chen et al. 2019), defined as:
(2)
(3)
where, Vik is the velocity and Xik is the position of the ith particle in the kth iteration. Moreover, w is the internal weight of the algorithm and random numbers c1, c2, r1 and r2 are the parameters range between 0 and 1.
For parameter optimization of SVR via PSO, a simple approach is chosen, which is to set the upper and lower bounds of parameters C, δ and γ. During the iterations, training data are considered as actual data for obtaining the SVR accuracy for selected values of the parameters (Equations (2) and (3)), and then the fitness value is estimated for each particle via fitness function, i.e. symmetric Mean Absolute Percentage Error (sMAPE) in our case, for identification of suitable parameters using the equation, defined as:
(4)
where Qo is the observed and Qp is the predicted streamflow. The sMAPE is a scale-independent and outlier resistant empirical measure which is appropriate when evaluating the forecast accuracy of time series data of different scales (Chen et al. 2017). Finally, when the iterations are exhausted based on fitness value, these optimal values of the parameters C, δ and γ are obtained and an adequate SVR fitting can be performed.

Variational mode decomposition

VMD decomposes the original signal, f(t) (streamflow in our case), into k IMFs components, defined as: Uk(t) = Bk(t) cos (φϕk(t)), where Bk(t) and φk(t) are the instantaneous amplitude and phase, respectively (Wu et al. 2021). VMD was first introduced by Dragomiretskiy & Zosso (2013) based on the Wiener filtering, Hilbert transform, and frequency shifting theories (Wu et al. 2021) as follows: Firstly, VMD computes the analytical signal of Uk(t) to obtain the one-sided frequency spectrum by employing Hilbert transform. Secondly, the spectrum of each Uk(t) is shifted to the baseband by mixing with an exponential tuned to the respective estimated central frequency. Finally, the bandwidth is analysed through H1 Gaussian smoothness of the demodulated signal. In the above process of optimization, minimization of ensemble bandwidth is the main objective. The overall constrained problem (Zuo et al. 2020) is formulated as:
(5)
where {ωk} = {ω1, ω2, ……, ωk} denotes the respective central frequencies, i is the imaginary unit and δ(t) is the Dirac function. The term (Hilbert transform) transforms the kth IMF into an analytical signal to produce a one-sided frequency and shifts the spectrum of individual mode into the baseband. The constrained problem (Equation (5)) is transformed to an unconstrained one by considering the balancing term (β), Lagrange multiplier (η) and a quadratic penalty term (Isham et al. 2018) in the augmented Lagrange function, defined as:
(6)
To find the saddle point of augmented Lagrange function (Equation (6)), an optimization technique called alternate direction method of multipliers is adopted to update Uk, ωk and η in two directions (Isham et al. 2018) and the iterations take the following form:
(7)
(8)
(9)
where and are Fourier transforms of and , respectively and τ is the time step. The above iterative process is repeated until the convergence criteria is satisfied by the condition , where is a priory set accuracy bound.

Random forest

RF are adaptable ML algorithms based on several independent decision trees first introduced by Breiman in 2001 (see for more details Shabani et al. 2020). The structure of the tree is simply defined as branching to two nodes known as root and leaf, respectively (Shabani et al. 2020), where each tree grows by dividing the data with self-search of random samples (see reference therein Shabani et al. 2020). This randomization mitigates extreme overfitting with robust anti-noise ability and leads to extraordinarily quick calculations (Liu & Sun 2019). Error calculation in RF depends on the placement of the data in their final subspace (leaf). Concisely, similarity between A and B in a subspace Q(A, B) is actually the ratio of the times when they appeared in the same leaf. Moreover, the functioning of the classification tree is independent of the values of predictors therefore, inadequacy of similarity is bound to the type of predictors. In general, for any n number of samples with sample feature dimension K and number of decision trees J in a random forest, j decision trees are constructed from actual samples via bootstrap. After that, k samples are withdrawn having dimension K for training purposes of several decision trees. These decision trees are allowed to grow as much as possible and the results obtained from each tree are then averaged for the final RF model (Liu & Sun 2019).

Gradient boosting machine

Initially, the GBMs were proposed for the purpose of classification in the field of ML, later on, connected to the idea of loss function in the statistical aspect (regression) by Friedman (2001). In GBM, a regression tree serves as the base model, whereas residuals (gradients) from the base model consequently fit new models which later on combines with the base model. Moreover, combining of prediction from the previous trees is controlled by involving a shrinking parameter (learning rate) ranges between 0 and 1 (Touzani et al. 2018). Furthermore, regression trees being the base model bound the number of samples in each node in order to minimize the overall variance of the model.

Bayesian regularized neural network

BRNN is an ANN kind (Althoff et al. 2020), which introduces a term based on penalty called regularization. This penalty term controls the weights by penalizing their large values of length n. The feedforward model (Bui et al. 2012) is defined as:
(10)
where wj = weight link to jth neuron from total u neurons, δn[j] = weight of nth input, bj = bias of nth neuron, xm = input data, ym = output data, Aj = activation function defined as: Aj = with error distribution εm.
In conventional ANN, greater or an insufficient number of neurons could lead to over or under fitting of the model, respectively. However, Bayesian regularization controls the potential of neurons via penalization by driving weights of unnecessarily grown or complexed networks to zero (Bui et al. 2012). Penalization is technically handled by the addition of a penalty term to the objective function (Sinecen 2019), J=θ Eε+ϑ Ew where Eε = sum of least square errors, Ew= sum of squares of weights and biases, θ and ϑ are regularization parameters which are optimized based on Gaussian-Newtonian approximation (Bui et al. 2012). Moreover, optimal values of regularization parameters are acquired (Bui et al. 2012) using the iterative formula defined as:
(11)
where H= Hessian matrix of objective function J, λ = effective parameter with z parameters in total. Furthermore, the above equation describes an iterative process to optimize the weights (Ew) considering initial values of θ, ϑ and weights, whereas a one-step Levenberg–Marquardt algorithm is applied to find the optimal weights (Bui et al. 2012). Then effective and new values of regularization parameters are computed until the convergence is met.

Proposed methodology

Proposed framework for streamflow modelling and forecasting is designed as:

Step 1: Determination of level k: An appropriate k is determined using a similarity concept of IMFs sum and input signals via , where y is the observed streamflow signal and Yk is the sum of decomposed IMFs. Moreover, the SDA values are estimated for each k (2–15) and k is selected to the corresponding minimum SDA value.

Step 2: Dataset acquired in step 1 is further split into training (85%) and testing (15%).

Step 3: Selection of set of predictors and target: Significant lags are selected for each IMF via partial autocorrelation function (PACF) plot, considering jth -1 lags outside the confidence interval and greater than 0.5 are indicated as the input predictor of the respective IMF (Zuo et al. 2020). After combining all the predictor's lag, maximum (jth +1) lag of streamflow data is selected as target.

Step 4: Hybridization of RF, GBM and BRNN with VMD: Each of the methods, RF, GBM and BRNN are applied on VMD based training data. Forecasting is done for the testing period. The final model takes the form:
(12)
where Qt+1 is the target streamflow, k is the mode number, IMFs are obtained by VMD from observed streamflow Qt and j is jth significant lag of respective IMF suggested by PACF in step 3.

Moreover, SARIMA is applied on streamflow data using the forecast() package, RF/GBM and BRNN are applied on VMD based predictors using h2o() and brnn() packages in R, respectively. Furthermore, PSO-SVR is applied on seasonally adjusted data using a classical additive decomposition approach resulting in the components: trend, seasonal and remainder using pso() and SVR() available in the pyswarm and sklearn packages in the python environment, respectively. Additionally, model(s) accuracy is measured via Root Mean Square Error (RMSE), sMAPE, and coefficient of correlation (CC) for training, testing and combined modelled data (Table 2).

Table 2

Accuracy measurements of training, testing and overall data ranges

ModelStationTraining
Testing
Overall
RMSEsMAPE (%)CCRMSEsMAPE (%)CCRMSEsMAPE (%)CC
SARIMA Chitral 60.937 10.052 0.974 86.782 16.335 0.975 63.305 10.536 0.973 
Tarbela 643.035 18.495 0.936 784.759 15.604 0.919 656.478 18.246 0.934 
PSO-SVR Chitral 49.943 7.197 0.983 82.528 14.995 0.977 53.170 7.798 0.981 
Tarbela 615.942 16.896 0.941 735.834 13.283 0.933 627.197 16.584 0.940 
VMD-RF Chitral 19.504 4.925 0.998 116.123 39.751 0.952 37.290 7.610 0.991 
Tarbela 162.051 6.429 0.997 704.180 20.606 0.949 258.461 7.653 0.991 
VMD-GBM Chitral 16.610 5.668 0.998 104.602 29.847 0.952 33.138 7.532 0.993 
Tarbela 134.345 5.704 0.997 561.766 16.540 0.963 209.129 6.640 0.994 
VMD-BRNN Chitral 6.644 3.054 0.9996 50.204 15.204 0.987 15.331 3.990 0.998 
Tarbela 59.972 3.016 0.9995 244.178 10.651 0.993 91.834 3.675 0.999 
ModelStationTraining
Testing
Overall
RMSEsMAPE (%)CCRMSEsMAPE (%)CCRMSEsMAPE (%)CC
SARIMA Chitral 60.937 10.052 0.974 86.782 16.335 0.975 63.305 10.536 0.973 
Tarbela 643.035 18.495 0.936 784.759 15.604 0.919 656.478 18.246 0.934 
PSO-SVR Chitral 49.943 7.197 0.983 82.528 14.995 0.977 53.170 7.798 0.981 
Tarbela 615.942 16.896 0.941 735.834 13.283 0.933 627.197 16.584 0.940 
VMD-RF Chitral 19.504 4.925 0.998 116.123 39.751 0.952 37.290 7.610 0.991 
Tarbela 162.051 6.429 0.997 704.180 20.606 0.949 258.461 7.653 0.991 
VMD-GBM Chitral 16.610 5.668 0.998 104.602 29.847 0.952 33.138 7.532 0.993 
Tarbela 134.345 5.704 0.997 561.766 16.540 0.963 209.129 6.640 0.994 
VMD-BRNN Chitral 6.644 3.054 0.9996 50.204 15.204 0.987 15.331 3.990 0.998 
Tarbela 59.972 3.016 0.9995 244.178 10.651 0.993 91.834 3.675 0.999 

Metrics used for accuracy assessment of the model(s)

Accuracy measurements of model's performance are evaluated based on three criteria including RMSE, sMAPE (Equation (4)) and CC defined as:
where Qo is the observed, Qp is the predicted, is the mean observed and is the mean predicted flow.

The regional scale modelling and forecasting of monthly streamflow via VMD-ML has been carried out in the Swat River basin of Pakistan (Sibtain et al. 2020) with ML methods other than those we are discussing here. In their paper, ML alone, VMD-ML and (two stage decomposed)-ML could attain CC values of about 0.869, 0.957 and 0.980 in forecasting, respectively. Furthermore, SDA based VMD-BRNN has not been applied in the Pakistan region previously for streamflow modelling and forecasting with esteemed statistical performance as obtained in the current paper (Table 2).

Hydroclimatic data possess seasonality, cyclicity or trends of different scales in diverse regions of the world. Therefore, streamflow data may contain inline complexities in different seasons, which may vary regionally. The streamflow data of Chitral and Tarbela stations show a strong summer seasonal pattern in time series and respective ACF and PACF plots (Figure 1). Moreover, the data series of Chitral River is positively skewed and more consistent during the training period as depicted by standard deviation whereas that of Tarbela is consistent throughout (section 2, Table 1). Furthermore, variation in consistency in training and testing data series (for example, at the Chitral River) could be challenging for modelling and forecasting.

Despite having a strong seasonal pattern, modelling of streamflow data series via SARIMA performed below average with maximum RMSE and sMAPE values during the training and combined period (Table 2). Unlike SARIMA, performance of PSO-SVR is enhanced during the training and combined period. Especially, during the testing period PSO-SVR performed comparatively well. Moreover, SDA suggested k as 9 and 12, which are further utilized for decomposition of streamflow via VMD for Chitral and Tarbela stations, respectively (Figure 2). Integrated VMD based models achieved increased accuracy during the training period at both the stations (Table 2).

Figure 2

VMD subseries of streamflow quantified by SDA technique with (a) k = 9 for Chitral and (b) k = 12 for Tarbela, respectively.

Figure 2

VMD subseries of streamflow quantified by SDA technique with (a) k = 9 for Chitral and (b) k = 12 for Tarbela, respectively.

Close modal

Accuracy of VMD-BRNN is prominent with minimum RMSE, sMAPE and maximum CC values at both the stations (Table 2). Unlike the training period, testing period accuracy is significantly decreased in VMD-RF and VMD-GBM by increased values of RMSE and sMAPE for Chitral and increased values of sMAPE; though RMSE remained decreased at Tarbela (Table 2). Such limitations of ML and ANN during the testing period are common in the forecasting of streamflow (Wang et al. 2021), nevertheless, VMD-BRNN remained consistent throughout at both the stations (Table 2). Furthermore, sMAPE is subtle to underestimation of the model as compared to over-estimation (Chen et al. 2017). That is, the jerky pattern of recession flow at the Tarbela station (Figure 1(b)) is distinguishably detected by sMAPE during the testing period (Table 2). Additionally, the combined performance using CC, SD and the centre root mean square difference (RMSD) of the models could be easily visualised via the Taylor diagram (Figure 3). The actual streamflow values are displayed at the center of RMSD on x-axis of Figure 3.

Figure 3

Taylor diagram displaying a statistical comparison of combined (training and testing) estimated results of different models of river flow at (a) Chitral (b) Tarbela stations. Where observed river flow values are co-centred with RMSD on x-axis represented by dash line. Among all the models, VMD-BRNN (represented by ◊) is found to be adequate having minimum RMSD and maximum CC values.

Figure 3

Taylor diagram displaying a statistical comparison of combined (training and testing) estimated results of different models of river flow at (a) Chitral (b) Tarbela stations. Where observed river flow values are co-centred with RMSD on x-axis represented by dash line. Among all the models, VMD-BRNN (represented by ◊) is found to be adequate having minimum RMSD and maximum CC values.

Close modal

VMD-BRNN is found best among other models with all the optimum values of CC (0.998 and 0.999), SD and RMSD at both the stations (Figure 3), whereas overall CC values of PSO-SVR, VMD-RF and VMD-GBM are nearly same for Chitral and that of PSO-SVR for Tarbela slightly differs from VMD-RF and VMD-GBM (Figure 3 and Table 2). Overall, minimum values of CC are found to be 0.973 and 0.934 for SARIMA at Chitral and Tarbela stations, respectively (Figure 3 and Table 2).

The violin plots (Figure 4) are depicted as a distribution of observed and modelled streamflow data (from left to right) and their respective box-plots (with approximately similar median values) are represented in the center. Thickness of the violin plot around the first quartile (left edge of box-plot) shows that all the observed and modelled data sets are positively skewed with slight variation in frequency distribution around the third quartile. Specifically, VMD-based models have captured frequency distribution near the third quartile well enough at both the stations. Moreover, observed values have prominent extreme values (outliers) which are captured by VMD-BRNN alone at both the stations (Figure 4).

Figure 4

Represents violin plots comparing the distribution of modelled with observed river flow at (a) Chitral and (b) Tarbela stations.

Figure 4

Represents violin plots comparing the distribution of modelled with observed river flow at (a) Chitral and (b) Tarbela stations.

Close modal

Time series plots provide a visual understanding of over and under fitting of the modelled and forecasted values. SARIMA overestimated the few streamflow peaks during the training and testing period at Chitral (Figure 5(a)). Moreover, the ratio of overestimation is minimum in PSO-SVR as compared to SARIMA. Furthermore, VMD based RF and GBM adequately captured the flow pattern during the training period, however, both of them missed a prominent peak and recession flow throughout the testing period, whereas VMD-BRNN outperformed not only during training, but was able to successfully capture the prominent peak in the testing period (Figure 5(a)) despite having a difference in SD values of Chitral observed data series for the training and testing period (previous section and Table 1).

Figure 5

Time series plot of observed and combined (training and testing) fitted of different model(s) output river flow at (a) Chitral and (b) Tarbela stations. Observed streamflow is indicated by the black line whereas dot-point is used for predicted flow.

Figure 5

Time series plot of observed and combined (training and testing) fitted of different model(s) output river flow at (a) Chitral and (b) Tarbela stations. Observed streamflow is indicated by the black line whereas dot-point is used for predicted flow.

Close modal

In the case of Tarbela, both SARIMA and PSO-SVR overestimated the flow at most of the lower peaks and missed few prominent peaks (Figure 5(b)), whereas SARIMA overestimated the flow pattern during the training period. Nevertheless, the same pattern has been successfully captured by VMD based models and among them, VMD-GBM exaggerated few peaks during the testing period. On the other hand, VMD-BRNN was able to capture the prominent peak in the testing period with lowest RMSE, sMAPE and highest CC values (Figure 5(b) and Table 2). Moreover, streamflow pattern was consistent with annual peak followed by a recession period (minimum flow) throughout, whereas high peaks may fluctuate after a few years (Figures 1 and 5). Almost all the models could capture the streamflow pattern at recession flow, however, only few could perform well due to fluctuations at the peaks. Furthermore, variation in phenotype of streamflow peaks contribute to the regional cycle in the form of either drought or flood (both are rainfall influenced most of the time). Usually, neural network methods can be trained in capturing the complexity concealed in the phenomenon of streamflow efficiently, resulting in an overfitting of the modelled data. Consequently, their performance during the testing period is affected with unseen datasets. This issue is handled by a regularization approach in BRNN by compromising the goodness of the fit over the complexity of the final network by defining the penalty term as discussed in the previous section. Therefore, fluctuations and inconsistency between training and testing data as well as in the occurrence of high flow peaks of streamflow could be handled by VMD-BRNN adequately well. Overall, the performance of VMD-BRNN is excellent in streamflow modelling and forecasting at both the stations.

This paper presents monthly streamflow forecasting by utilising non-decomposed, classical decomposed and SDA based VMD decomposed data approach by stochastic and machine learning models including: SARIMA, PSO-SVR, VMD-RF, VMD-GBM and VMD-BRNN at Chitral and Tarbela stations. Among all the implemented models, the traditional approach (via SARIMA) and the classical decomposed approach (via PSO-SVR) displayed poor performance, failing to capture most of the peaks even during the training period, whereas hybridization of VMD and machine learning methods has shown increased robustness of the models. Moreover, integration of VMD with SDA for determination of number of IMFs has decreased the computational time, allowing the users to focus on the analysis of results from hybrid models completely. Unlike VMD-RF/GBM (poor performance during testing period), the regularization approach in VMD-BRNN distinguishably handled the over/under fitting in both modelling and forecasting monthly streamflow at both the stations. The results show that the influential role of the high dimensional components and robustness on the river flow may be explored by most optimum SDA based VMD signals hybrid with BRNN method. Furthermore, the incorporation of the climatic variables may enhance the accuracy of the forecast. Based on these results, analyses of the influential role of the temperature, rainfall and other climatic factor variations in the river flow will appear in a future communication by the authors.

The author(s) declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The author(s) acknowledge the WAPDA, Pakistan for cooperating in data collection. The contents of this paper form part of the first author's doctoral thesis that will be submitted at the Department of Mathematics, University of Karachi, Karachi, Pakistan.

Data cannot be made publicly available; readers should contact the corresponding author for details.

Althoff
D.
,
Filgueiras
R.
&
Rodrigues
L. N.
2020
Estimating small reservoir evaporation using machine learning models for the Brazilian savannah
.
Journal of Hydrologic Engineering
25
(
8
),
05020019
.
Box
G. E.
,
Jenkins
G. M.
,
Reinsel
G. C.
&
Ljung
G. M.
2015
Time Series Analysis: Forecasting and Control
.
John Wiley & Sons
,
Hoboken
.
Bui
D. T.
,
Pradhan
B.
,
Lofman
O.
,
Revhaug
I.
&
Dick
O. B.
2012
Landslide susceptibility assessment in the Hoa Binh province of Vietnam: a comparison of the levenberg–marquardt and Bayesian regularized neural networks
.
Geomorphology
171
,
12
29
.
Charles
S. P.
,
Wang
Q. J.
,
Ahmad
M.
,
Hashmi
D.
,
Schepen
A.
,
Podger
G.
&
Robertson
D. E.
2018
Seasonal streamflow forecasting in the Upper Indus Basin of Pakistan: An assessment of methods
.
Hydrology and Earth System Sciences
22
(
6
),
3533
3549
.
Chen
S.
,
Wang
J.-q.
&
Zhang
H.-y.
2019
A hybrid PSO-SVM model based on clustering algorithm for short-term atmospheric pollutant concentration forecasting
.
Technological Forecasting and Social Change
146
,
41
54
.
Chervonenkis
A. Y.
2013
Early history of support vector machines
. In:
Empirical Inference
. (Schölkopf, B., Luo, Z. & Vovk, V., eds).
Springer
, Berlin, pp.
13
20
.
Dragomiretskiy
K.
&
Zosso
D.
2013
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
Elbisy
M. S.
2015
Sea wave parameters prediction by support vector machine using a genetic algorithm
.
Journal of Coastal Research
31
(
4
),
892
899
.
Friedman
J. H.
2001
Greedy function approximation: a gradient boosting machine
.
Annals of Statistics
29
(
5
),
1189
1232
.
Hassan
S. A.
&
Ansari
M. R. K.
2015
Hydro-climatic aspects of Indus River flow propagation
.
Arabian Journal of Geosciences
8
(
12
),
10977
10982
.
Houze
R.
Jr
,
Rasmussen
K.
,
Medina
S.
,
Brodzik
S.
&
Romatschke
U.
2011
Anomalous atmospheric events leading to the summer 2010 floods in Pakistan
.
Bulletin of the American Meteorological Society
92
(
3
),
291
298
.
Isham
M. F.
,
Leong
M. S.
,
Lim
M. H.
&
Ahmad
Z. A.
2018
Variational mode decomposition: mode determination method for rotating machinery diagnosis
.
Journal of Vibroengineering
20
(
7
),
2604
2621
.
Khattak
M. S.
,
Reman
N. U.
,
Sharif
M.
&
Khan
M. A.
2015
Analysis of streamflow data for trend detection on major rivers of the Indus Basin
.
Journal of Himalayan Earth Sciences
48
(
1
),
87
.
Liu
P.
,
Xie
M.
,
Bian
J.
,
Li
H.
&
Song
L.
2020
A hybrid PSO–SVM model based on safety risk prediction for the design process in metro station construction
.
International Journal of Environmental Research and Public Health
17
(
5
),
1714
.
Ostad-Ali-Askari
K.
,
Shayannejad
M.
&
Eslamian
S.
2017
Deficit Irrigation: Optimization Models. Management of Drought and Water Scarcity
. In:
Handbook of Drought and Water Scarcity
. (Eslamian, S., ed.).
Taylor & Francis Publisher
,
USA
.
Poul
A. K.
,
Shourian
M.
&
Ebrahimi
H.
2019
A comparative study of MLR, KNN, ANN and ANFIS models with wavelet transform in monthly stream flow prediction
.
Water Resources Management
33
(
8
),
2907
2923
.
Shabani
S.
,
Samadianfard
S.
,
Sattari
M. T.
,
Mosavi
A.
,
Shamshirband
S.
,
Kmet
T.
&
Várkonyi-Kóczy
A. R.
2020
Modeling pan evaporation using Gaussian process regression K-nearest neighbors random forest and support vector machines; comparative analysis
.
Atmosphere
11
(
1
),
66
.
Tadesse
K. B.
&
Dinka
M. O.
2017
Application of SARIMA model to forecasting monthly flows in Waterval River
.
South Africa. Journal of Water and Land Development
35
(
1
),
229
236
.
Touzani
S.
,
Granderson
J.
&
Fernandes
S.
2018
Gradient boosting machine for modeling the energy consumption of commercial buildings
.
Energy and Buildings
158
,
1533
1543
.
Vapnik
V.
2013
The Nature of Statistical Learning Theory
.
Springer Science & Business Media
,
Berlin, Heidelberg, Germany
.
Wang
X.
,
Wang
Y.
,
Yuan
P.
,
Wang
L.
,
Cheng
D.
,
Anderson
M. G.
&
Burt
T. P.
, (eds).
2021
An adaptive daily runoff forecast model using VMD-LSTM-PSO hybrid approach
.
Hydrological Sciences Journal
66
(
9
),
1488
1502
.
Wood
E.
&
O'Connell
P.
1985
Real-time forecasting. Chapter 15
. In:
Hydrological Forecasting
.
Wiley
,
Chichester, West Sussex, UK
, pp.
505
558
.
Zuo
G.
,
Luo
J.
,
Wang
N.
,
Lian
Y.
&
He
X.
2020
Two-stage variational mode decomposition and support vector regression for streamflow forecasting
.
Hydrology and Earth System Sciences
24
(
11
),
5491
5518
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).