One of the weaknesses of water resources management is the neglect of the nonstructural aspects that involve the most important relationships between water resources and socioeconomic parameters. Particularly, socioeconomic evaluation for different regions is crucial before implementing water resources management policies. To address this issue, 14 countries in the world that have continuous increasing trends of using renewable water per capita (RWPC) during 1998–2017 were used for the estimation of eight socioeconomic parameters associated with four key indicators (i.e., economy, demographics, technology communication, and health sanitation) by using four different data-driven methods, including artificial neural networks, support vector machines (SVMs), gene expression programming (GEP), and wavelet-gene expression programming (WGEP). The performances of the models were evaluated by using correlation coefficient (R), root-mean-square error (RMSE), and mean absolute error (MAE). It was found that the WGEP model had the best performance in estimating all parameters. The mathematical expressions for these socioeconomic parameters were explored and their potential to be expanded in different spatial and temporal dimensions was assessed. The derived equations provide a quantitative means for the future estimation of the socioeconomic parameters in the studied countries.

  • The relationships between water resources and socioeconomic parameters were evaluated.

  • The mathematical equations of the hydro-socioeconomic parameters were explored.

  • Different data-driven methods were compared in the estimation of hydro-socioeconomic parameters and the best ones were determined.

One of the main goals of all engineering disciplines is to create more prosperity for communities. Therefore, any decision that is away from the needs and interests of society will lose its value. All communities are dependent on water; water is needed for agricultural production, energy generation, health service, and industrial manufacture. The sustainability of the communities depends directly or indirectly on the quantity, quality, reliability, and affordability of water. The water resources and socioeconomic systems are well interconnected. On the one hand, decisions on water resources can create social challenges, and on the other hand, social behaviors can change the status of water systems.

A wide range of human sciences is needed to solve water issues, including economics (Langer 2020), behavioral and perceptual studies, decision-making, social values, community psychology, and politics. In many places, basic human water needs cannot be met. On the other hand, plenty of water is available in some places for human needs and industrial use. Both cases pose challenges for water resources management. Over many decades, efforts have been made to benefit society through effective water resources management.

In the last decade, data-driven soft computing methods such as artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), gene expression programming (GEP), multivariate adaptive regression splines (MARS), M5 tree model, support vector machines (SVMs), random forest (RF), multi-linear regression (MLR), and hybrid wavelet methods have been successfully employed to address both water quality and quantity issues. Shabani et al. (2016) forecasted water demand of the City of Kelowna (CKD), Canada, using intelligent soft computing models and found that the GEP models were more sensitive to data classification, genetic operators, and optimum lag time than other intelligent soft computing models. Based on a review of 43 papers about the applications of the ANN method, Maier & Dandy (2000) concluded that ANNs have been increasingly used for the prediction of water resources. Mohammadrezapour et al. (2019) estimated monthly potential evapotranspiration in an arid region by using the SVM, ANFIS, and GEP models in Sistan and Baluchestan Province, Iran and indicated that the SVM, GEP, and ANFIS models, respectively, took the first, second, and third places in the estimation of monthly potential evapotranspiration. Roboredo et al. (2016) used an aggregate index of social-environmental sustainability to evaluate the social-environmental quality for a watershed in the southern Amazon. Soil, water, vegetation, socioeconomic, and social organization qualities were considered as indicators in their study. Pande & Sivapalan (2017) examined the human impacts on water resources and found that technology, economy, and trade were closely relevant to water sustainability. Li et al. (2019) demonstrated how socioeconomic development affected water quality in Tai Lake by analyzing population, per capita gross domestic production, and sewage discharge and their relationships with water quality.

Using various data-driven methods, Najafzadeh et al. (2018) estimated scour depth under clear water conditions in rectangular channels. Kisi et al. (2019) modeled the separation (transition) zone using the GEP, MARS, M5T, and DENFIS techniques. Surono et al. (2022) forecasted the air quality by using genetic algorithm-fuzzy k-medoids clustering (GA-FKM) and fuzzy k-medoids clustering particle swarm optimization (FKM-PSO). In addition, ZamanZad-Ghavidel et al. (2021) applied GEP models to 14 countries to determine the appropriate hydro-socioeconomic index (HSEI) for the evaluation of the sustainability of water resource systems. To improve the estimation of socioeconomic parameters for those 14 countries, different data-driven methods are used in this study. The main goal of this study is to determine the best data-driven methods to estimate the socioeconomic parameters for the future since few studies have been conducted to address this issue (Dong et al. 2019; Zhang et al. 2020). Artificial intelligence methods have been widely used in the field of water resources (Bozorg-Haddad et al. 2017) and some efforts have been made to use these methods to address the hydro-socioeconomic issues. Nowadays, it is particularly imperative to understand and determine the complex relationships between water resources and socioeconomic factors/parameters for future water resources management.

Selection of key hydro-socioeconomic indicators and parameters

Interdisciplinary approaches are generally needed for managing water systems. The uncertainties in the status of future water resources and the response of a community to them make management more difficult. Given the interactions between the physical water system and the socioeconomic dynamics, effective water resources management is a complex process. For example, some of the questions that need an interdisciplinary answer are (Sivapalan et al. 2012) as follows:

  • How do social systems relate to water resources systems?

  • How do water resources decisions affect socioeconomic parameters?

Predictions of the socioeconomic parameters can help understand the hydro-socioeconomic phenomena under real conditions (Sivapalan et al. 2012). Figure 1 shows the relationship of hydro-socioeconomic indicators and parameters, including gross domestic product per capita (GDPC), income index (II), exports and imports (EI), human development index (HDI), population density (PD), internet users (IU), mortality rate (MR), and population served with piped water (RPPW). Figure 2 shows the main stages of the current study. The data of this study were extracted from the Knoema database (https://knoema.com). Figure 3 shows the geographic locations of these selected countries and Table 1 shows the basic information about these countries, including the average RWPC, GDP, and PD. In this study, 14 countries with continuous increasing trends of renewable water per capita (RWPC) during the 20-year period (1998–2017) were selected as an input dataset for all data-driven methods. Ten countries (including Albania, Belarus, Bosnia, Bulgaria, Croatia, Estonia, Georgia, Hungary, Latvia, and Lithuania) were used for training, while the remaining four countries (including Poland, Romania, Serbia, and Ukraine) were used for testing. The socioeconomic parameters were estimated in each year of the study period as the model outputs (Figure 4).
Table 1

Basic information of the selected countries in 1998–2017

CountriesAverage RWPC (cubic meters)Average GDP (US dollars)Average PD (people per sq. km)
Albania 10.16 3,044.45 108.56 
Belarus 6.00 4,330.60 47.60 
Bosnia 10.14 3,587.80 72.29 
Bulgaria 2.82 5,153.35 69.54 
Croatia 24.16 10,801.65 78.09 
Estonia 9.50 12,696.00 31.78 
Georgia 15.63 2,434.45 71.16 
Hungary 10.37 10,879.15 111.43 
Latvia 16.15 10,033.15 34.91 
Lithuania 7.69 10,240.20 51.12 
Poland 1.59 9,811.60 124.61 
Romania 10.16 6,387.40 90.90 
Serbia 22.13 4,359.55 83.84 
Ukraine 3.75 2,262.50 80.85 
CountriesAverage RWPC (cubic meters)Average GDP (US dollars)Average PD (people per sq. km)
Albania 10.16 3,044.45 108.56 
Belarus 6.00 4,330.60 47.60 
Bosnia 10.14 3,587.80 72.29 
Bulgaria 2.82 5,153.35 69.54 
Croatia 24.16 10,801.65 78.09 
Estonia 9.50 12,696.00 31.78 
Georgia 15.63 2,434.45 71.16 
Hungary 10.37 10,879.15 111.43 
Latvia 16.15 10,033.15 34.91 
Lithuania 7.69 10,240.20 51.12 
Poland 1.59 9,811.60 124.61 
Romania 10.16 6,387.40 90.90 
Serbia 22.13 4,359.55 83.84 
Ukraine 3.75 2,262.50 80.85 
Figure 1

Relationships of hydro-socioeconomic indicators and parameters.

Figure 1

Relationships of hydro-socioeconomic indicators and parameters.

Close modal
Figure 2

Stages of the current study.

Figure 2

Stages of the current study.

Close modal
Figure 3

Geographic locations of the selected countries.

Figure 3

Geographic locations of the selected countries.

Close modal
Figure 4

Flowchart of the modeling and analyses.

Figure 4

Flowchart of the modeling and analyses.

Close modal

Introduction to the selected socioeconomic parameters

In this study, RWPC is considered as an indicator of water resources status (i.e., hydro), while the socioeconomic parameters include GDPC, II, EI, HDI, PD, IU, MR, and RPPW (Figure 1). The distribution of the population in each country varies according to its natural parameters and characteristics. Therefore, the particular access to water resources plays an important role in PD. With the awareness of the water resources of each area, the facilities needed for residents can be estimated. The HDI is an indicator for the social evaluation of a society, which consists of life expectancy, education index, and II. This index is dependent on several main factors, including water resources (Sinha & Sengupta 2019). MR is an index for measuring the number of deaths. One of the causes of the disease is the lack of adequate water resources or their pollution, which have a great impact on the health of the people who live in such areas, Keshavarz et al. (2013) highlighted the great impact of water scarcity on the health of people who lived in two villages of Shiraz Province, Iran. The GDP is the total value of all finished goods and services produced in a country over a specific period, indicating the overall economic condition of the country. Another economic index used in this study is the II, which is obtained by dividing the gross national income (GNI) by the population of the country. Both economic indicators depend on the amount of water resources in the area. EI is another economic parameter used in this study, which is also dependent on the water resources. The number of people with IU is important in this regard because, as an educational tool, the internet can have a significant impact on people's awareness of water issues (Aerts et al. 2018). Table 2 lists the abbreviations and units of the selected socioeconomic parameters.

Table 2

Selected socioeconomic parameters and their units

ParametersAbbreviationsUnits
Renewable water per capita RWPC Cubic meters 
GDP per capita GDP US dollars 
Income index II Score 
Exports and imports EI US dollars 
Human development index HDI Score 
Population density PD People per sq. km 
Internet users IU Percent 
Mortality rate MR Deaths per 1,000 live births 
Population served with piped water RPPW Percent 
ParametersAbbreviationsUnits
Renewable water per capita RWPC Cubic meters 
GDP per capita GDP US dollars 
Income index II Score 
Exports and imports EI US dollars 
Human development index HDI Score 
Population density PD People per sq. km 
Internet users IU Percent 
Mortality rate MR Deaths per 1,000 live births 
Population served with piped water RPPW Percent 

Different data-driven methods

Artificial neural network

Artificial neural networks are computational systems that are inspired by biological neural networks. ANN approaches include three main layers (i.e., input, output, and hidden layers). The Levenberg–Marquardt (LM) algorithm is one of the faster and more reliable back propagation (BP) algorithms. The detailed theory of ANNs can be found in Haykin (1998).

The characteristics of the ANN models can be summarized as follows:

Applied algorithm: The LM algorithm with three layers was applied for training of the ANN estimation models.

Functions of activation: The logsig, tansig, and pureline functions were applied for the necessary need nodes.

Determination of the neuron number: The trial-and-error method is the best way to determine the optimal number of neurons in the third layer of the ANN models (Barzegar et al. 2016). The ANN program code was written using the MATLAB in the current study.

Support vector machines

SVMs are powerful data-driven methods introduced by Vapnik (1995). The major advantages of SVMs over ANNs include their improved generalization ability, unique and globally optimal architectures, and the ability to be rapidly trained.

The SVM can be used for both classification and regression problems. The model uses dual programming to solve the equations. But for large-scale problems, such a solution method increases the computational cost. To fix this problem, the method of minimum squares support vector machine method was proposed (Suykens et al. 2002). The choice of the kernel function type for SVM depends on the amount of training data and the feature vector dimensions. A kernel function should be chosen so that it is capable of learning from the inputs of the problem. Four types of kernel functions, including linear kernel, polynomial kernel, hyperbolic tangent kernel, and radial basis function (RBF) kernel, can be used for SVM models (Deka 2014). In the current study, the RBF kernel was selected. It can be expressed as follows:
(1)
where is the kernel function; and and are the training and testing datasets, respectively. In this study, the SVM code was written by using MATLAB.

Gene expression programming

The GEP model is based on the Darwin's theory of natural selection. The fundamental steps of this model include (1) selecting the terminal dataset; (2) selecting the function set; (3) selecting the indicators of model evaluation; (4) determining the control components; and (5) determining the requirements/criteria to stop the program run. The GEP model has many advantages. One of the most important advantages of this approach is to generate the express tree and formalization, which can be very useful in the engineering fields (Ferreira 2006).

The characteristics of the GEP model developed in this study can be summarized as follows:

The functions set (F): Different mathematical functions are applied to compare and evaluate the estimation models:

The terminal set (T): The terminal set includes RWPC. Other characteristic parameters used in the GEP model include number of chromosomes = 30, head length h = 7, and genes per chromosome = 3 (function set defined in Genexprotools). Additional values were selected to link the sub-trees. In this study, the Genexprotools 4.0 was utilized to estimate the socioeconomic parameters.

Wavelet analysis

The wavelet analysis is a set of mathematical functions used to continuously analyze the signal to determine its frequency components based on the Fourier transform. Wavelets have certain benefits over the Fourier transform in reducing computations when examining specific frequencies. It can be calculated by:
(2)
where a and b are the scaling and translation functions with integer m; t is an integer that refers to a point of the input signal; n is the discrete time index; x(t) is a given signal; and f(t) is the mother wavelet. Selections of the mother wavelet type and a suitable number of decomposition levels based on the nature of the signal are the most important step in the wavelet analysis. In the present study, a one-dimensional Daubechies wavelet based on the similarity in shape between the mother wavelet and the data series at a suitable level is used to decompose the data into subseries.
The number of decomposition levels, L, is given by (Barzegar et al. 2016):
(3)
where N is the number of data points. Wavelet decomposition and approximation are performed by using the db4 at level 2. For the three parts (i.e., A2, D2, and D1), A2 represents the low-frequency part of the signal, while D2 and D1 represent the high-frequency parts. Figure 4 shows the flowchart of the modeling and analysis in the current study.
In the data of RWPC and socioeconomic parameters were first normalized for each country by using the following equation. The normalized values range from 0 to 1:
(4)
where XN is the normalized value; Xi is the real value; Xmin is the minimum value; and Xmax is the maximum value. Normalizing the training inputs generally improves the quality of the training.

The augmented Dickey–Fuller (ADF) test (Dickey & Fuller 1979) is a unit root test for stationarity of a time series. The null hypothesis is defined as the presence of a unit root (i.e., non-stationary). In general, a p-value less than 5% implies that the null hypothesis can be rejected, and the time series is stationary. In this study, the ADF test is first performed on the time series of the existing parameters using the EVIEWS software. EVIEWS supports various types of information criteria. In this study, the Schwarz information criterion (SIC) is used for the ADF test. Moreover, to consider the 20-year data for each country in the macroeconomic series, a 20-year interval is defined for each country (i.e., 1–20 for the first country, 21–40 for the second country, and so on). Therefore, the lag length is 20. For instance, the first year of the second country must be defined for the software analysis as a new initiation of data for better recognition of pattern. Since the main purpose of this study is to predict socioeconomic parameters, the stationarity of data is very important. Thus, stationary time series have been used, as verified by the results of ADF test. According to the p-values shown in Table 3, all the time series of data used in this study have a p-value less than 0.005.

Table 3

Results of ADF test for evaluation of hydro-socioeconomic parameters

Parameterp-value
RWPC <0.001 
GDP 0.0018 
II 0.0215 
EI 0.0179 
HDI 0.0038 
PD <0.001 
IU 0.0228 
MR 0.0058 
RPPW 0.0186 
Parameterp-value
RWPC <0.001 
GDP 0.0018 
II 0.0215 
EI 0.0179 
HDI 0.0038 
PD <0.001 
IU 0.0228 
MR 0.0058 
RPPW 0.0186 

Assessment of model performance

In this study, the performances of the models are evaluated by the correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE), which are respectively given by:
(5)
(6)
(7)
where and are the average values of the observed and estimated socioeconomic parameter values; SEio and SEie are the observed and estimated socioeconomic parameter values; and N is the total number of datasets. The correlation coefficient (R) measures the strength and direction of the linear relationship between variables; the RMSE shows the goodness of fit relevant to the high values; and the MAE measures the balanced distribution of goodness of fit at moderate values. In general, the model performances are optimum if R and RMSE are closer to 1 and 0, respectively.

ANN, SVM, and GEP models

The three data-driven approaches (i.e., ANN, SVM, and GEP) were applied to estimate eight different socioeconomic parameters with consideration of economy, demographics, technology communication, and health sanitation. All the selected output parameters showed significant correlations with renewable water consumption per capita. Thus, with increasing RWPC, PD decreased because there was no need to focus on the population in a particular area to use water resources and the population was spread in different parts of the country. MR had an indirect relationship with RWPC, having access to adequate water resources and strengthening the agricultural sector, which could help reduce the majority of diseases. As a result, increasing the quantity of available water resources could improve people's health and reduce the number of deaths due to water-related diseases. Other selected parameters showed a direct relationship with RWPC. The ANN with the LM and one hidden layer was applied and the number of neurons of the hidden nodes, ranging from 1 to 10, was determined by applying the trial-and-error method. The numbers of neurons in the hidden layer of the models were 3, 2, 4, 4, 3, 3, 4, and 2 for GDPC, II, EI, HDI, PD, IU, MR, and RPPW, respectively. The activation functions of the hidden nodes of the ANN models were obtained by tangent sigmoid for all parameters. The activation functions of the output nodes were obtained by tangent sigmoid for HDI and IU, and linear functions for GDPC, II, EI, PD, MR, and RPPW parameters.

In the GEP model, the root relative squared error (RRSE) was selected as an appropriate fitness function with a pressure tree. The results of ANN, SVM, and GEP for the test period are shown in Table 4. The GEP models achieved the best values of RMSE, R, and MAE for all socioeconomic parameters. SVM models showed good performances for the socioeconomic parameters, while ANN models had the worst performances in predicting all parameters. Figures 57 show the observed and estimated socioeconomic parameters during the testing period for the ANN, SVM, and GEP methods, respectively. As shown in Table 4, the values of R and RMSE for the three methods have the following relationships: RGEP > RSVM > RANN and RMSEANN > RMSESVM > RMSEGEP, indicating that the GEP model is the best for estimating the aforementioned socioeconomic parameters, in addition to generating the express tree and formalization.
Table 4

Evaluation results of the data-driven models for the testing period (WGEP as the best model)

AspectsParametersWGEP
ANN
SVM
GEP
RRMSEMAERRMSEMAERRMSEMAERRMSEMAE
Economy GDPC 0.857 0.188 0.133 0.706 0.259 0.195 0.739 0.249 0.192 0.763 0.24 0.177 
II 0.872 0.16 0.119 0.785 0.205 0.161 0.793 0.201 0.158 0.803 0.195 0.147 
EI 0.876 0.167 0.108 0.677 0.259 0.191 0.709 0.243 0.175 0.745 0.227 0.156 
Demographics HDI 0.918 0.135 0.097 0.885 0.16 0.117 0.889 0.157 0.117 0.895 0.153 0.114 
PD 0.999 0.011 0.008 0.999 0.016 0.012 0.999 0.015 0.011 0.999 0.014 0.011 
Technology communication IU 0.934 0.172 0.134 0.831 0.238 0.187 0.867 0.235 0.187 0.877 0.228 0.174 
Health sanitation MR 0.931 0.175 0.136 0.856 0.218 0.169 0.877 0.215 0.158 0.89 0.212 0.158 
RPPW 0.936 0.158 0.132 0.888 0.194 0.161 0.893 0.192 0.16 0.901 0.189 0.152 
AspectsParametersWGEP
ANN
SVM
GEP
RRMSEMAERRMSEMAERRMSEMAERRMSEMAE
Economy GDPC 0.857 0.188 0.133 0.706 0.259 0.195 0.739 0.249 0.192 0.763 0.24 0.177 
II 0.872 0.16 0.119 0.785 0.205 0.161 0.793 0.201 0.158 0.803 0.195 0.147 
EI 0.876 0.167 0.108 0.677 0.259 0.191 0.709 0.243 0.175 0.745 0.227 0.156 
Demographics HDI 0.918 0.135 0.097 0.885 0.16 0.117 0.889 0.157 0.117 0.895 0.153 0.114 
PD 0.999 0.011 0.008 0.999 0.016 0.012 0.999 0.015 0.011 0.999 0.014 0.011 
Technology communication IU 0.934 0.172 0.134 0.831 0.238 0.187 0.867 0.235 0.187 0.877 0.228 0.174 
Health sanitation MR 0.931 0.175 0.136 0.856 0.218 0.169 0.877 0.215 0.158 0.89 0.212 0.158 
RPPW 0.936 0.158 0.132 0.888 0.194 0.161 0.893 0.192 0.16 0.901 0.189 0.152 

The bold values represent the best values for each criteria among different methods.

Figure 5

Observed and estimated socioeconomic parameters during the testing period using ANN.

Figure 5

Observed and estimated socioeconomic parameters during the testing period using ANN.

Close modal
Figure 6

Observed and estimated socioeconomic parameters during the testing period using SVM.

Figure 6

Observed and estimated socioeconomic parameters during the testing period using SVM.

Close modal
Figure 7

Observed and estimated socioeconomic parameters during the testing period using GEP.

Figure 7

Observed and estimated socioeconomic parameters during the testing period using GEP.

Close modal

Wavelet analysis

The one-dimensional Daubechies-4 (db4) wavelet was used to decompose the data into subseries. The Daubechies-4 wavelet has been applied in many studies (e.g., Barzegar et al. 2016). In the current study, the number of data is 280. So, the level of wavelet decomposition is 2. The discrete db4 wavelet decomposed GDPC, II, EI, HDI, PD, IU, MR, RPPW, and RWPC parameters at level 2. The values of the A2, D2, and D1 analyses are shown in Table 5. For example, the values of the L-frequency A2 at level 2 for the economy-related parameters GDPC, II, and EI signals vary from −0.033 to +1.036, from −0.026 to +1.080, and from −0.041 to +1.058, respectively. The values of the H-frequency parts D (2 and 1), which contain the signal details, range from −0.206 to +0.152 and from −0.367 to +0.375 for GDPC.

Table 5

1D Daubechies-4 wavelet analysis results

Parameters GDPC II EI 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.033 −0.206 −0.367 −0.026 −0.168 −0.368 −0.041 −0.346 −0.455 
 Max 1.036 0.152 0.375 1.080 0.175 0.383 1.058 0.343 0.516 
Parameters HDI PD IU 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.012 −0.154 −0.363 −0.110 −0.157 −0.432 −0.059 −0.163 −0.349 
 Max 1.082 0.181 0.383 1.024 0.152 0.366 1.086 0.186 0.353 
Parameters MR RPPW RWPC 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.077 −0.145 −0.368 −0.032 −0.172 −0.341 −0.031 −0.161 −0.367 
 Max 1.071 0.189 0.349 1.119 0.225 0.352 1.112 0.152 0.432 
Parameters GDPC II EI 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.033 −0.206 −0.367 −0.026 −0.168 −0.368 −0.041 −0.346 −0.455 
 Max 1.036 0.152 0.375 1.080 0.175 0.383 1.058 0.343 0.516 
Parameters HDI PD IU 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.012 −0.154 −0.363 −0.110 −0.157 −0.432 −0.059 −0.163 −0.349 
 Max 1.082 0.181 0.383 1.024 0.152 0.366 1.086 0.186 0.353 
Parameters MR RPPW RWPC 
 Wavelet analyses A2 D2 D1 A2 D2 D1 A2 D2 D1 
 Min −0.077 −0.145 −0.368 −0.032 −0.172 −0.341 −0.031 −0.161 −0.367 
 Max 1.071 0.189 0.349 1.119 0.225 0.352 1.112 0.152 0.432 

WGEP models

To improve the efficiency of the GEP model and construct the hybrid wavelet-GEP (WGEP) model, the selected socioeconomic parameters were first decomposed into three subseries (i.e., A2, D2, and D1 series) by using a MATLAB DWT wavelet tool. To build the WGEP model, each decomposed subseries of the selected parameters was estimated separately with the GEP model and the WGEP estimation model was directly generated with the summation of the estimated subseries. The results of data-driven models with the db4 mother wavelet and WGEP for the testing period are shown in Table 4. The RMSE values of the WGEP model with the db4 mother wavelet are 0.188, 0.160, 0.167, 0.135, 0.011, 0.172, 0.175, and 0.158 for GDPC, II, EI, HDI, PD, IU, MR, RPPW, and RWPC parameters, respectively. Figure 8 shows the results of the observed and estimated values of A2 decomposed series by applying the WGEP for the eight socioeconomic parameters in the selected countries. Figure 9 shows the observed and estimated socioeconomic parameters during the testing period from the WGEP method. The R values of the WGEP model are 0.734, 0.761, 0.767, 0.842, 0.999, 0.872, 0.864, and 0.876 for GDPC, II, EI, HDI, PD, IU, MR, and RPPW, respectively. Figure 10 shows the comparisons of the R, RMSE, and MAE values of the ANN, SVM, GEP, and WGEP methods, indicating that the WGEP model had the best performance.
Figure 8

Observed and estimated values of the A2 decomposed series by applying WGEP for eight socioeconomic parameters.

Figure 8

Observed and estimated values of the A2 decomposed series by applying WGEP for eight socioeconomic parameters.

Close modal
Figure 9

Observed and estimated socioeconomic parameters during the testing period using WGEP.

Figure 9

Observed and estimated socioeconomic parameters during the testing period using WGEP.

Close modal
Figure 10

Comparisons of R, RMSE, and MAE values of the used data-driven methods.

Figure 10

Comparisons of R, RMSE, and MAE values of the used data-driven methods.

Close modal
Figure 11 shows the distribution of the estimated socioeconomic parameters. The box plots help understand the distributions of the estimated data by displaying the minimum, the first quartile – Q1 (0.25%), the median (0.50%), the third quartile – Q3 (0.75%), and the maximum. As shown in Figure 11, the PD and EI parameters have the minimum median and the IU parameter has the maximum median. The EI parameter has lower values of the first to third quartiles. The estimated HDI has the maximum value in the third quartile (0.75%). Most MR and GDPC values vary between the first quartile (0.25%) and the third quartile (0.75%). Table 6 shows the ranks of eight parameters, in terms of R, RMSE, and MAE for the models used in this study, indicating that PD is the best calculated parameter in all four models.
Table 6

Ranking of socioeconomic parameters in four models

RankingANNSVMGEPWGEP
PD PD PD PD 
HDI HDI HDI HDI 
RPPW RPPW RPPW RPPW 
II II II II 
MR MR MR EI 
IU IU EI IU 
EI, GDPC EI IU MR 
___ GDPC GDPC GDPC 
RankingANNSVMGEPWGEP
PD PD PD PD 
HDI HDI HDI HDI 
RPPW RPPW RPPW RPPW 
II II II II 
MR MR MR EI 
IU IU EI IU 
EI, GDPC EI IU MR 
___ GDPC GDPC GDPC 
Figure 11

Distributions of the eight socioeconomic parameters estimated by WGEP.

Figure 11

Distributions of the eight socioeconomic parameters estimated by WGEP.

Close modal

Table 7 lists all the mathematical equations used in the models to estimate the socioeconomic parameters. The performances of the models for all selected socioeconomic parameters for the studied countries follow the following order: WGEP > GEP > SVM > ANN (Table 6). The WGEP model outperformed its simple form (i.e., GEP) for all eight parameters. The WGEP model improved the performance by 22, 18, 26, 12, 22, 25, 18, and 16% compared with the GEP model for GDPC, II, EI, HDI, PD, IU, MR, and RPPW, respectively.

Table 7

Mathematical equations used in the models to quantify the socioeconomic parameters

ParametersEquations
GDPC A2  
D2  
D1  
II A2  
D2  
D1  
EI A2  
D2  
D1  
HDI A2  
D2  
D1  
PD A2  
D2  
D1  
IU A2  
D2  
D1  
MR A2  
D2  
D1  
RPPW A2  
D2  
D1  
For all parameters Final equation Equation of A2 + Equation of D2 + Equation of D1 
ParametersEquations
GDPC A2  
D2  
D1  
II A2  
D2  
D1  
EI A2  
D2  
D1  
HDI A2  
D2  
D1  
PD A2  
D2  
D1  
IU A2  
D2  
D1  
MR A2  
D2  
D1  
RPPW A2  
D2  
D1  
For all parameters Final equation Equation of A2 + Equation of D2 + Equation of D1 

Hts, GDPC, II, UR, EI, HDI, PD, IU, RPPW, and MR denote renewable water per capita (Hydro), GDP per capita, income index, unemployment rate, exports and imports, human development index, population density, internet users, proportion of rural population served with piped water, and mortality rate (under five years old), respectively.

According to Table 7, various operators have been used to increase the accuracy of the models, and these relations have been applied to quantify the dependance of socioeconomic sciences and water resources. In the GEP models, it is also possible to select simple mathematical equations to reduce the number of operators. But it should be noted that there is a possibility of reducing the accuracy of the proposed models (Bagatur & Onen 2018).

In addition, the results from this study highlighted the importance of examining the relationships between the status of water resource and socioeconomic parameters. This study indicated that water resources parameters had significant impacts on socioeconomic parameters (Sivapalan et al. 2012). WGEP had the best performance among all the data-driven models used for predicting the socioeconomic parameters in this study. In fact, the socioeconomic conditions of a country can be a good indicator that reflects the status of its water resources and also have a mutual relationship, which is very important for making decisions in the integrated management of water resources.

Water resources are important in terms of production and social, economic, and environmental values for a country. Socioeconomic considerations are needed to cope with the decreasing trend of water resources in many countries in recent decades and the increasing demand for water resources. The new contributions of this study include the following: (1) To the best of our knowledge, this is the first effort to jointly apply various data-driven methods, including artificial neural networks, SVMs, GEP, and WGEP for analyses of linked hydrologic and socioeconomic systems. (2) Different socioeconomic parameters, including GDPC, II, EI, HDI, PD, IU, MR, and population served with piped water (RPPW) were estimated by using RWPC as a representative parameter of water resources. (3) The potential to expand the mathematical relationships in different spatial and temporal dimensions was assessed. In this study, the relationship between water resources and socioeconomic parameters was modeled by data-driven methods and their performances were compared and assessed. This study indicated that the hybrid data-driven models based on the wavelet theory improved the performances of GEP models. It was demonstrated that the WGEP models had the best performance and the ANN models showed the poorest performance. Thus, it is possible to assess the socioeconomic status of a region/country by developing such models before implementing major water projects. The methods developed in this study can significantly improve the related water resources planning and management and also provide useful information for socioeconomic development. The main limitation of this study is the unavailability of data on all socioeconomic parameters that are likely to be strongly correlated with water resources. In the future research, other data mining models can be used to characterize the relationship between water resources and socioeconomic parameters. Other socioeconomic parameters that account for different environmental and/or political dimensions, such as the health index, happiness index, and employment rate can also be studied.

The authors thank Iran's National Science Foundation (INSF) for its financial support for this research.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Aerts
J. C.
,
Botzen
W. J.
,
Clarke
K. C.
,
Cutter
S. L.
,
Hall
J. W.
,
Merz
B.
,
Michel-Kerjan
E.
,
Mysiak
J.
,
Surminski
S.
&
Kunreuther
H.
2018
Integrating human behaviour dynamics into flood disaster risk assessment
.
Nature Climate Change
8
(
3
),
193
199
.
https://doi.org/10.1038/s41558-018-0085-1
.
Bagatur
T.
&
Onen
F.
2018
Development of predictive model for flood routing using genetic expression programming
.
Journal of Flood Risk Management
11
,
S444
S454
.
Barzegar
R.
,
Adamowski
J.
&
Moghaddam
A. A.
2016
Application of wavelet-artificial intelligence hybrid models for water quality prediction: A case study in Aji-Chay River, Iran
.
Stochastic Environmental Research and Risk Assessment
30
(
7
),
1797
1819
.
https://doi.org/10.1007/s00477-016-1213-y
.
Bozorg-Haddad
O.
,
Soleimani
S.
&
Loáiciga
H. A.
2017
Modeling water-quality parameters using genetic algorithm-least squares support vector regression and genetic programming
.
Journal of Environmental Engineering
143
(
7
),
04017021
.
doi:10.1061/(ASCE)EE.1943-7870.0001217
.
Deka
P. C.
2014
Support vector machine applications in the field of hydrology: A review
.
Applied Soft Computing
19
,
372
386
.
https://doi.org/10.1016/j.asoc.2014.02.002
.
Dickey
D. A.
&
Fuller
W. A.
1979
Distribution of the estimators for autoregressive time series with a unit root
.
Journal of the American Statistical Association
74
(
366a
),
427
431
.
https://doi.org/10.1080/01621459.1979.10482531
.
Dong
Q.
,
Zhang
X.
,
Chen
Y.
&
Fang
D.
2019
Dynamic management of a water resources-socioeconomic-environmental system based on feedbacks using system dynamics
.
Water Resources Management
33
(
6
),
2093
2108
.
https://doi.org/10.1007/s11269-019-02233-8
.
Ferreira
C.
2006
Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence
, Vol.
21
.
Springer
.
https://doi.org/10.1007/3-540-32849-1
.
Haykin
S.
1998
Neural Networks a Comprehensive Foundation
, 2nd edn.
Upper Saddle River
,
Prentice-Hall
, pp.
26
32
.
Keshavarz
M.
,
Karami
E.
&
Vanclay
F.
2013
The social experience of drought in rural Iran
.
Land Use Policy
30
(
1
),
120
129
.
https://doi.org/10.1016/j.landusepol.2012.03.003
.
Kisi
O.
,
Khosravinia
P.
,
Nikpour
M. R.
&
Sanikhani
H.
2019
Hydrodynamics of river-channel confluence: Toward modeling separation zone using GEP, MARS, M5 Tree and DENFIS techniques
.
Stochastic Environmental Research and Risk Assessment
33
(
4
),
1089
1107
.
https://doi.org/10.1007/s00477-019-01684-0
.
Langer
P.
2020
Groundwater mining in contemporary urban development for European spa towns
.
Journal of Human, Earth, and Future
1
(
1
),
1
9
.
http://dx.doi.org/10.28991/HEF-2020-01-01-01
.
Li
C.
,
Feng
W.
,
Song
F.
,
He
Z.
,
Wu
F.
,
Zhu
Y.
,
Giesy
J. P.
&
Bai
Y.
2019
Three decades of changes in water environment of a large freshwater lake and its relationship with socio-economic indicators
.
Journal of Environmental Sciences
77
,
156
166
.
https://doi.org/10.1016/j.jes.2018.07.001
.
Maier
H. R.
&
Dandy
G. C.
2000
Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications
.
Environmental Modelling & Software
15
(
1
),
101
124
.
https://doi.org/10.1016/S1364-8152(99)00007-9
.
Najafzadeh
M.
,
Shiri
J.
&
Rezaie-Balf
M.
2018
New expression-based models to estimate scour depth at clear water conditions in rectangular channels
.
Marine Georesources & Geotechnology
36
(
2
),
227
235
.
https://doi.org/10.1080/1064119X.2017.1303009
.
Pande
S.
&
Sivapalan
M.
2017
Progress in socio-hydrology: a meta-analysis of challenges and opportunities
.
Wiley Interdisciplinary Reviews: Water
4
(
4
),
e1193
.
Roboredo
D.
,
Bergamasco
S. M. P. P.
&
Bleich
M. E.
2016
Aggregate index of social-environmental sustainability to evaluate the social-environmental quality in a watershed in the Southern Amazon
.
Ecological Indicators
63
,
337
345
.
https://doi.org/10.1016/j.ecolind.2015.11.042
.
Shabani
S.
,
Yousefi
P.
,
Adamowski
J.
,
Naser
G.
&
Rahman
I.
2016
Intelligent soft computing models in water demand forecasting
.
Water Stress in Plants
99
117
.
http://dx.doi.org/10.5772/63675
.
Sinha
A.
&
Sengupta
T.
2019
Impact of natural resource rents on human development: what is the role of globalization in Asia Pacific countries?
Resources Policy
63
,
101413
.
https://doi.org/10.1016/j.resourpol.2019.101413
.
Sivapalan
M.
,
Savenije
H. H.
&
Blöschl
G.
2012
Socio-hydrology: A new science of people and water
.
Hydrological Processes
26
(
8
),
1270
1276
.
https://doi.org/10.1002/hyp.8426
.
Surono
S.
,
Goh
K. W.
,
Onn
C. W.
,
Nurraihan
A.
,
Siregar
N. S.
,
Saeid
A. B.
&
Wijaya
T. T.
2022
Optimization of Markov weighted fuzzy time series forecasting using genetic algorithm (GA) and particle swarm optimization (PSO)
.
Emerging Science Journal
6
(
6
),
1375
1393
.
https://doi.org/10.28991/ESJ-2022-06-06-010
.
Suykens
J. A.
,
De Brabanter
J.
,
Lukas
L.
&
Vandewalle
J.
2002
Weighted least squares support vector machines: Robustness and sparse approximation
.
NeuroComputing
48
(
1–4
),
85
105
.
https://doi.org/10.1016/S0925-2312(01)00644-0
.
Vapnik
V.
1995
The Nature of Statistical Learning Theory. Data Mining and Knowledge Discovery
.
Springer
,
Berlin
.
ZamanZad-Ghavidel
S.
,
Bozorg-Haddad
O.
&
Goharian
E.
2021
Sustainability assessment of water resource systems using a novel hydro-socio-economic index (HSEI)
.
Environment, Development and Sustainability
23
(
2
),
1869
1916
.
https://doi.org/10.1007/s10668-020-00655-8
.
Zhang
P.
,
Zou
Z.
,
Liu
G.
,
Feng
C.
,
Liang
S.
&
Xu
M.
2020
Socioeconomic drivers of water use in China during 2002–2017
.
Resources, Conservation and Recycling
154
,
104636
.
https://doi.org/10.1016/j.resconrec.2019.104636
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).