## Abstract

The estimation of small reservoir capacity is of great significance for water resources management. However, many widely distributed small reservoirs lack the capacity information because of the high costs of field measurements. This study proposed a novel approach to estimate the small reservoir capacity in the hilly area by using remote sensing and Digital Elevation Model (DEM). The basic idea of this approach is to explore the relationship between influential factors (i.e., topographic and geomorphic parameters) and measured reservoirs’ capacity to establish a machine learning model based on particle swarm optimization–extreme learning machine (PSO–ELM) to estimate the capacity. The Mihe River basin in northern China is selected as a case study, 111 measured reservoirs, and six optional influential factors are selected to develop and test this model. The results show that the five influential factors (i.e., the area of sub-catchment, the water surface area, the longest flow path of sub-catchment, the average slope of sub-catchment, and the average slope of buffer area) are the optimal combination with the lowest difference between the measured and the estimated reservoir capacities. The results demonstrate that the proposed approach is a robust tool for estimating the capacity of small reservoirs in the hilly area.

## HIGHLIGHTS

A total of 123 small reservoirs are identified in the Mihe River basin above the Tanjiafang station by remote sensing and DEM.

Five influential factors are selected from the spheres of topography and geomorphology to estimate the reservoir capacity.

The estimation model based on particle swarm optimization–extreme learning machine (PSO–ELM) is a robust tool for estimating the small reservoir capacity in the hilly area.

### Graphical Abstract

## INTRODUCTION

Reservoirs, regardless of their size, are of great significance for the comprehensive management of water resources in river basins (Votruba & Broza 1989; Leemhuis *et al.* 2009). Large- and medium-sized reservoirs play a critical role in flood control and disaster reduction, water supply and irrigation, hydropower generation, etc. Small reservoirs, with large numbers, are also indispensable in ensuring the safety of drinking water and food (Eilander *et al.* 2014).

Over the past decades, more than 847,000 reservoirs have been built worldwide, of which about 95% are small reservoirs, whose heights are less than 15 m from foundation to crest (Rosenberg *et al.* 2000; World Commission on Dams 2000; Chen *et al.* 2018). In China, as of 2016, there are 98,460 reservoirs, including 93,850 small reservoirs, which account for about 95% of the total number of reservoirs in China (Ministry of Water Resources 2016). In China, a small reservoir refers to the reservoir with a capacity between 0.1 and 10 million m^{3}.

Different from large reservoirs, small reservoirs lack storage capacity monitoring information and even lack geographic location (Liebe *et al.* 2009; Ogilvie *et al.* 2018). However, capacity is one of the most important factors determining the performance of small reservoirs. The lack of capacity not only restricts the decision-making on flood control and drought relief but also goes against the sustainable utilization of limited water resources (Meigh 1995). Although some projects have been carried out to monitor the capacity of reservoirs, for example, the International Commission on Large Dams (ICOLD), the Global Lakes and Wetlands Database (GLWD), and the Global Reservoir and Dam Database (GRanD), most of them are targeted to large reservoirs, and the information about the locations of some reservoirs is absent (Gao 2015; Chen *et al.* 2016; Langhorst *et al.* 2019).

There are currently three methods to estimate reservoir capacity. The first method is to estimate reservoir storage by constructing the water surface area capacity curve. The water surface area is obtained by extracting remote sensing data (Guan *et al.* 2021), and water depth is measured in the field (Meigh 1995; Liebe *et al.* 2005; Sawunyama *et al.* 2006; Rodrigues *et al.* 2012). The second method is remote sensing. *In situ* or low-altitude surveys based on Sonar and LiDAR sensors or depth meters can measure reservoir bathymetric information (Avisse *et al.* 2017). The third method is to reconstruct the underwater topography and establish a reservoir storage estimation model by extrapolating and interpolating underwater topography based on DEM data during the period of low water (Zhang *et al.* 2016) and remote sensing images according to the similarity between the underwater topography and the surrounding topography (Tseng *et al.* 2016; Getirana *et al.* 2018; Liu *et al.* 2020). The first two methods are time-consuming and limited to shallow waters with favorable visibility, which are infeasible for broad-scale applications practically. The third method is suitable for wide and shallow reservoirs with a large water surface area; their underwater topography is traceable and easy to generalize, but its applicability to the elevated natural reservoirs remains to be verified (Liu *et al.* 2020).

However, small reservoirs in hilly areas, which are generally constructed according to the natural environment, have complex underwater topography. These small reservoirs have a low water level during most periods of the year, so it is difficult to obtain continuous remote sensing images to match the water surface area and elevation data, which is the basic dataset of the third method. Therefore, an applicable and fast estimation method of reservoir capacity, which does not rely on topography reconstruction, can help management master information about the capacity of abundant small reservoirs in the basin. Besides, based on the regulation mode of small reservoirs, information, such as reservoir storage, can be further estimated, thereby providing data basis for comprehensive utilization of water resources in the basin.

Apart from the water surface area, the surrounding topography is also a key factor that determines reservoir capacity (Mehran *et al.* 2019; Fassoni-Andrade *et al.* 2020), which is confirmed by the third method. The capacity of a small reservoir is closely correlated with the topography of the basin where the reservoir is located (Yang *et al.* 2021), which offers a new idea for estimating the capacity of small reservoirs, that is, to estimate the capacity of small reservoirs by exploring the relationship between the multi-dimensional factors of topography surrounding small reservoirs and the reservoir capacity.

With great advantages in dealing with the nonlinear relationship, machine learning algorithms have been widely used in various fields such as the hydrology field. Typical machine learning algorithms include artificial neural network (ANN) (Tanty & Desmukh 2015; Filipova *et al.* 2022), extreme learning machine (ELM) (Atiquzzaman & Kandasamy 2016, 2018; Mouatadid & Adamowski 2017), particle swarm optimization–ELM (PSO–ELM) (Anupam & Pani 2020; Li *et al.* 2020; Zhu *et al.* 2020; Pham *et al.* 2021), support vector machines (SVM) (Asefa *et al.* 2006; Deka 2014), and so on.

Both ANN and SVM can approximate the complex nonlinear relationships and their detailed principles were described by Zurada *et al.* (1997) and Zhang *et al.* (2009), but ANN is more suitable for multi-dimensional data than SVM. However, some of ANN's shortcomings, such as long training time and easy to fall into local minima, lead to unsatisfactory simulation results. ELM is an improved type of feedforward neural network. Compared with the traditional single hidden layer feedforward neural network, ELM has the advantages of fast learning speed and good generalization performance (Huang *et al.* 2006, 2015).

However, ELM still has its own shortcomings, such as randomly given input weight and hidden layer deviation, which are unchanged in the calculation process. This resulting ELM requires more hidden layer neurons to ensure the simulation accuracy, which weakens its generalization ability. PSO is a typically used optimization algorithm, which can be used to optimize the input weights and hidden layer deviations of ELM. Therefore, compared with ELM, PSO–ELM has better generalization ability and simulation accuracy. However, it is found that there are few studies on PSO–ELM to estimate the capacity of small-size reservoirs, especially in Northern China where such reservoirs are widely distributed, and most of them were constructed in early times with limited topographical data. To fill this gap, PSO–ELM is applied to the estimation of storage capacities of small-size reservoirs. To compare the accuracy of estimation, the estimation models based on ANN and ELM are also established. Without constructing the underwater topography, these machine learning methods can explore the nonlinear relationship between topography and capacity based on the characteristic parameters of reservoirs and the basins where they are located. Making use of remote sensing and DEM, these methods have low requirements of model input parameters, which makes them applicable to estimate the capacity of small reservoirs in similar watersheds in the hilly area.

The specific steps are as follows: (1) the locations of small reservoirs were extracted by setting an appropriate threshold based on remote sensing images and DEM data; (2) the influencing factors of reservoir capacity were extracted and screened; (3) the machine learning model was established to explore the relationship between influencing factors and reservoir capacity, and then the estimation model was trained and calibrated to estimate the capacity of small reservoirs without data.

## STUDY AREA AND DATA

### Study area

The Mihe River basin is located in eastern China, middle of Shandong Province, and the south of Taiyi mountain, the main mountain in Shandong Province. The Tanjiafang hydrological station is the main control station in the Mihe River basin. The river length above the Tanjiafang station is 90 km, with a catchment area of 2,153 km^{2}, is dominated by hilly terrain.

The catchment controlled by the Tanjiafang station is selected as the study area because it is a typical hilly area with many reservoirs, which is important for flood control in the Mihe River basin. There is one large reservoir, three medium reservoirs, and a large number of small reservoirs, most of which lack location information and capacity information. In addition, to ensure a sufficient number of training samples, this paper selects an additional reference basin. The reference basin is also located in the Taiyi mountains, which have similar hydrogeological characteristics with the study area. The study area and reference area are shown in Figure 1.

### Data sources and processing

Three datasets are collected to support our study, including the DEM and Landsat datasets and the reservoirs’ information dataset. The ALOS DEM and Landsat 7 ETM SLC are the basic sources of the elevation and land-use types, which are obtained from the open-access databases. The measured records of reservoirs are collected from the Hydrology Centre of Shandong Province, including the reservoir location and the designed reservoir capacity. A total of 12 measured reservoirs in the study area and 89 measured reservoirs in the reference area are collected (Supplementary Tables S1 and S2).

The DEM data are pre-processed by filling sinks and reconditioning using ArcHydro Tools, the Landsat data are pre-processed by radiometric calibration, and atmospheric correction used ENVI 5.3. The pre-processed remote sensing data are used for supervision and classification to obtain the land-use types of the study area and reference area, including water body, construction land, green space, cultivated land, and greenhouse land. And then Google Earth Engine (GEE) is used to identify and remove mountain shadow patches and verify the water body's position and shape. The land-use types of the study area and the reference area are shown in Figure 2.

Based on the verified water body patches, a water body area threshold of small reservoirs is set to screen small reservoirs in the study basin. When the area threshold is 0.002 km^{2}, the number of small reservoirs in the Mihe River basin is 123, which is basically consistent with the number of small reservoirs in the study area mentioned in other documents (Shouguang Water Resources Bureau 2021). Therefore, for the study area, there are 12 small reservoirs with known capacity information and 111 small reservoirs with unknown capacity, which still need to be estimated.

The spatial distribution of the identified 123 reservoirs and their controlled sub-catchments in the study area are shown in Figure 3.

## METHODS

### Basic idea

This paper aims to develop an estimation approach to estimate the capacity of small reservoirs in hilly areas. The basic idea is to explore the relationship between the reservoir capacity and the surrounding topography and geomorphology and to select some appropriate influential factors to establish a machine learning model for estimating the reservoir capacity. We implemented the above-described idea of the estimation model using ANN, ELM, and PSO–ELM by following the flowchart as illustrated in Figure 4.

Six factors representing the surrounding topography and geomorphology are selected as the influential factors, including the area of sub-catchment controlled by the reservoirs (*A _{sc}*), the water surface area of the reservoir (

*A*), the longest flow path of sub-catchment (

_{ws}*L*), the average slope of sub-catchment (

_{sc}*S*), the average slope of buffer area (

_{sc}*S*), and the degree of relief of sub-catchment (

_{b}*DR*).

_{sc}*A*reflects the rain harvesting area of the reservoir. The larger the

_{sc}*A*, the larger the storage capacity of the reservoir.

_{sc}*L*affects the time of flood transition to the reservoir and further influences the designed capacity of the reservoir. Generally, the shorter the longest drainage distance in the sub-basin, the faster the flood will converge to the small reservoirs.

_{sc}*S*reveals the overall topography of the catchment area of the reservoir.

_{sc}*S*refers to the micro-terrain around the reservoir, that is, the terrain above the water surface and below the top of the reservoir dam.

_{b}*A*refers to the extracted water body area. In general, the larger the water surface area, the greater the reservoir storage, and the closer the reservoir to its full capacity.

_{ws}*R*is the difference between the altitude of the highest point and the lowest point in the catchment, which is a macroscopic index reflecting the topographic characteristics of the catchment.

_{sc}### Artificial neural network

*i*in the hidden layer is expressed as follows:where is called activation (or transfer) function, and commonly used activation functions include Sigmoid type activation function (Logistics function and Tan

*h*function);

*N*, the number of input neurons; , the weights which are calculated iteratively by the gradient descent method; , inputs to the input neurons; and , the threshold terms of the hidden neurons. As for the traditional ANN, its weight needs to be obtained iteratively by the gradient descent method, which has the disadvantages of slow operation speed, easily falling into the local optimal solution, and excessive fitting.

### Extreme learning machine

*et al.*2006). The input weights and hidden layer biases of ELM can be stochastically chosen if the activation functions in the hidden layer are infinitely differentiable. Similar to general ANNs, an ELM structure consists of an input layer, an output layer, and the hidden layer. The only parameters that need to be set by users are the activation function and the number of nodes in the hidden layer. For the training data (

*X*,

_{i}*V*),

_{i}*X*= [

_{i}*x*

_{i}_{1},

*x*

_{i}_{2}, …

*x*]

_{in}*∈*

^{T}*R*, and

^{n}*V*= [

_{i}*v*

_{i}_{1},

*v*

_{i}_{2}, …

*v*]

_{im}*∈*

^{T}*R*, if

^{m}*k*is the number of nodes in the hidden layer and

*g*(

*x*) is the activation function, the standard feedforward neural network is described in the following equation:where is the connection input weight between the input layer and the

*i*th neuron of the hidden layer; is the connection output weight between the output layer and the

*i*th neuron of the hidden layer; and

*b*is the threshold of the

_{i}*i*th neuron in the hidden layer. Compared with the traditional feedforward neural network, the input weight and the hidden layer bias of ELM are randomly obtained by randomness, and the output weight matrix is calculated by More–Penrose (MP) generalized inverse (Ahila

*et al.*2015). ELM is not only thousands of times faster than traditional learning algorithms, but also avoids some problems caused by gradient-based learning methods such as local minimum, stop criteria, and learning rate (Zhu

*et al.*2005; Cao

*et al.*2010).

### PSO–ELM for reservoir capacity estimation

#### Particle swarm optimization

PSO is an iterative optimization algorithm (Xu & Shu 2006). The basic idea of PSO is to search for the optimal solution to the problem through cooperation and information sharing among individuals in the group. Suppose there is a community including *n* particles, denoted as *Y**=* (*Y*_{1}, *Y*_{2}, …, *Y _{n}*). The

*i*th particle is expressed as a D-dimensional vector

*Y*

_{i}*=*[

*y*

_{1}

*,*

*y*

_{2}, …,

*y*]

_{D}*, which not only stores the position of the*

^{T}*i*th particle in the D-dimensional search space but also stores the fitness and velocity (Mategaonkar

*et al.*2018; Swathi & Elwha 2018).

*i*th particle is

*Vi*

*=*[

*v*

_{i}_{1},

*v*

_{i}_{2}, …,

*v*]

_{iD}*. According to the objective function, the fitness value corresponding to the particle position*

^{T}*Y*is calculated to judge whether the current position is good or bad. During each iteration update process, the particle updates its position by tracking two ‘extreme values’. One is the optimal solution found by the particle itself called the individual extremum

_{i}*pbest*. The other extremum is the optimal solution currently found by the entire population called the global extremum

*gbest*. In each iteration of PSO, the particle velocity and position are updated as follows:where

*v*is the velocity of the

_{i}*i*th particle;

*k*is the current iteration number; is the inertia coefficient,

*c*

_{1}and

*c*

_{2}are acceleration factors, and

*r*

_{1}and

*r*are random numbers in the interval (0,1).

_{2}*pbest*and

_{i}*gbest*represent the two extreme values used to update the position of the particles, which are the optimal solution found by the

_{i}*i*th particle and the optimal solution found by the entire population. The position and speed are usually limited to [

*−*

*y*,

_{max}*y*],[

_{max}*−*

*v*,

_{max}*v*] to prevent blind searching of particles.

_{max}#### Particle swarm optimization–extreme learning machine

Due to insufficient generalization of ELM (Mahmood *et al.* 2017), PSO is used to optimize the input layer weight and the hidden layer bias of ELM (Xu & Shu 2006). The input weight and hidden layer bias are regarded as PSO particles. The specific steps are as follows (as shown in Figure 5):

*Step 1*: Data sorting and preprocessing. Extraction of six influential factors representing the surrounding topography and geomorphology *X _{i}* and the reservoir capacity

*V*.

_{i}*Step 2*: Establishing the ELM model. The ELM model is established by using the datasets of (*X _{i}*,

*V*).

_{i}*Step 3*: Start using PSO to optimize two parameters of ELM: the weights of input layers and the bias of hidden layers. Generate the initial population, select the appropriate number of particles, and determine the appropriate acceleration factors and the maximum number of iterations.

*Step 4*: For each particle in the population, ELM is used to calculate the output weight, initial fitness value, *pbest*, and *gbest* and determine whether it meets the condition of stopping iteration (the maximum number of iterations). Then, formulae (3)–(4) are used to update the velocity and position of all particles until the condition is met.

## RESULTS AND DISCUSSION

### Extraction and analysis of the influential factors

Six influential factors, namely *A _{SC}*,

*A*,

_{ws}*S*,

_{sc}*L*,

_{sc}*S*, and

_{b}*R*, are extracted and analyzed statistically, as shown in Figure 6. Figure 6 shows that most of the influential factors follow a gamma distribution, followed by an exponential distribution.

_{sc}To analyze the rationality of the selected six influential factors, the correlation among the 101 sets of (*x _{ij}*, y

*) was tested by the Kendall, Spearman, and Pearson correlation analysis methods. The results are shown in Table 1.*

_{j}. | . | V
. _{rc} | A
. _{sc} | A
. _{ws} | L
. _{sc} | S
. _{sc} | L
. _{sc} | S
. _{b} | R
. _{sc} |
---|---|---|---|---|---|---|---|---|---|

Kendall | V _{rc} | 1.000 | 0.491 | 0.561 | 0.464 | 0.079 | 0.464 | −0.016 | 0.226 |

A _{sc} | 0.491 | 1.000 | 0.466 | 0.819 | 0.234 | 0.819 | 0.081 | 0.418 | |

A _{ws} | 0.561 | 0.466 | 1.000 | 0.452 | −0.071 | 0.452 | − 0.198 | 0.139 | |

S _{sc} | 0.079 | 0.234 | −0.071 | 0.211 | 1.000 | 0.211 | 0.498 | 0.644 | |

L _{sc} | 0.464 | 0.819 | 0.452 | 1.000 | 0.211 | 1.000 | 0.053 | 0.409 | |

S _{b} | −0.016 | 0.081 | − 0.198 | 0.053 | 0.498 | 0.053 | 1.000 | 0.272 | |

R_{s}_{c} | 0.226 | 0.418 | 0.139 | 0.409 | 0.644 | 0.409 | 0.272 | 1.000 | |

Spearman | V _{rc} | 1.000 | 0.669 | 0.751 | 0.635 | 0.123 | 0.635 | −0.039 | 0.339 |

A _{sc} | 0.669 | 1.000 | 0.627 | 0.950 | 0.381 | 0.950 | 0.111 | 0.601 | |

A _{ws} | 0.751 | 0.627 | 1.000 | 0.613 | −0.098 | 0.613 | − 0.294 | 0.215 | |

S _{sc} | 0.123 | 0.381 | −0.098 | 0.351 | 1.000 | 0.351 | 0.684 | 0.839 | |

L _{sc} | 0.635 | 0.950 | 0.613 | 1.000 | 0.351 | 1.000 | 0.076 | 0.590 | |

S _{b} | −0.039 | 0.111 | − 0.294 | 0.076 | 0.684 | 0.076 | 1.000 | 0.394 | |

R _{sc} | 0.339 | 0.601 | 0.215 | 0.590 | 0.839 | 0.590 | 0.394 | 1.000 | |

Pearson | V _{rc} | 1.000 | 0.697 | 0.838 | 0.656 | 0.105 | 0.656 | −0.050 | 0.360 |

A _{sc} | 0.697 | 1.000 | 0.671 | 0.931 | 0.345 | 0.931 | 0.137 | 0.494 | |

A _{ws} | 0.838 | 0.671 | 1.000 | 0.668 | −0.002 | 0.668 | −0.153 | 0.308 | |

S _{sc} | 0.105 | 0.345 | −0.002 | 0.335 | 1.000 | 0.335 | 0.662 | 0.811 | |

L _{sc} | 0.656 | 0.931 | 0.668 | 1.000 | 0.335 | 1.000 | 0.113 | 0.525 | |

S _{b} | −0.050 | 0.137 | −0.153 | 0.113 | 0.662 | 0.113 | 1.000 | 0.314 | |

R _{sc} | 0.360 | 0.494 | 0.308 | 0.525 | 0.811 | 0.525 | 0.314 | 1.000 |

. | . | V
. _{rc} | A
. _{sc} | A
. _{ws} | L
. _{sc} | S
. _{sc} | L
. _{sc} | S
. _{b} | R
. _{sc} |
---|---|---|---|---|---|---|---|---|---|

Kendall | V _{rc} | 1.000 | 0.491 | 0.561 | 0.464 | 0.079 | 0.464 | −0.016 | 0.226 |

A _{sc} | 0.491 | 1.000 | 0.466 | 0.819 | 0.234 | 0.819 | 0.081 | 0.418 | |

A _{ws} | 0.561 | 0.466 | 1.000 | 0.452 | −0.071 | 0.452 | − 0.198 | 0.139 | |

S _{sc} | 0.079 | 0.234 | −0.071 | 0.211 | 1.000 | 0.211 | 0.498 | 0.644 | |

L _{sc} | 0.464 | 0.819 | 0.452 | 1.000 | 0.211 | 1.000 | 0.053 | 0.409 | |

S _{b} | −0.016 | 0.081 | − 0.198 | 0.053 | 0.498 | 0.053 | 1.000 | 0.272 | |

R_{s}_{c} | 0.226 | 0.418 | 0.139 | 0.409 | 0.644 | 0.409 | 0.272 | 1.000 | |

Spearman | V _{rc} | 1.000 | 0.669 | 0.751 | 0.635 | 0.123 | 0.635 | −0.039 | 0.339 |

A _{sc} | 0.669 | 1.000 | 0.627 | 0.950 | 0.381 | 0.950 | 0.111 | 0.601 | |

A _{ws} | 0.751 | 0.627 | 1.000 | 0.613 | −0.098 | 0.613 | − 0.294 | 0.215 | |

S _{sc} | 0.123 | 0.381 | −0.098 | 0.351 | 1.000 | 0.351 | 0.684 | 0.839 | |

L _{sc} | 0.635 | 0.950 | 0.613 | 1.000 | 0.351 | 1.000 | 0.076 | 0.590 | |

S _{b} | −0.039 | 0.111 | − 0.294 | 0.076 | 0.684 | 0.076 | 1.000 | 0.394 | |

R _{sc} | 0.339 | 0.601 | 0.215 | 0.590 | 0.839 | 0.590 | 0.394 | 1.000 | |

Pearson | V _{rc} | 1.000 | 0.697 | 0.838 | 0.656 | 0.105 | 0.656 | −0.050 | 0.360 |

A _{sc} | 0.697 | 1.000 | 0.671 | 0.931 | 0.345 | 0.931 | 0.137 | 0.494 | |

A _{ws} | 0.838 | 0.671 | 1.000 | 0.668 | −0.002 | 0.668 | −0.153 | 0.308 | |

S _{sc} | 0.105 | 0.345 | −0.002 | 0.335 | 1.000 | 0.335 | 0.662 | 0.811 | |

L _{sc} | 0.656 | 0.931 | 0.668 | 1.000 | 0.335 | 1.000 | 0.113 | 0.525 | |

S _{b} | −0.050 | 0.137 | −0.153 | 0.113 | 0.662 | 0.113 | 1.000 | 0.314 | |

R _{sc} | 0.360 | 0.494 | 0.308 | 0.525 | 0.811 | 0.525 | 0.314 | 1.000 |

*Note*: Bold values indicate a significant correlation at the confidence level of 0.01.

In order to demonstrate the correlation between each influential factor and reservoir capacity, the scatter plots and correlation coefficient *R*^{2} are used to show the relationship between each influential factor and reservoir capacity, as shown in Figure 7. Figure 7 shows that correlation coefficients between the reservoir capacity and *A _{sc}*,

*A*, and

_{wc}*L*are the top three factors, and the values of

_{sc}*R*

^{2}are 0.486, 0.705, and 0.43, respectively, which are all positive correlations. In addition, a nonlinear correlation is also tested between the reservoir capacity and

*S*,

_{sc}*S*, and

_{b}*R*.

_{sc}### Accuracy and sensibility analysis of the estimation model

To verify the accuracy of the model and the sensitivity of the influential factors, three machine learning models and eight combinations of influential factors are used for experiments. Three models (i.e., ANN, ELM, and PSO–ELM) were used to illustrate the accuracy of the proposed model and eight combinations of factors were applied to demonstrate the sensitivity of influential factors. Accuracy refers to the differences between the reference reservoir capacity and the estimated reservoir capacity under the same combination of influential factors. Sensitivity refers to the differences between the reference reservoir capacity and the estimated reservoir capacity by the same model under different combinations of influential factors. The differences were tested by the mean absolute percentage error (MAPE) (Khair *et al.* 2017) and correlation coefficient *R*^{2}.

*A _{sc}*,

*L*, and

_{sc}*A*were regarded as fixed factors because they were significantly correlated with the reservoir capacity.

_{ws}*S*,

_{sc}*S*, and

_{b}*R*were chosen as optional factors because they are insignificantly correlated with the reservoir capacity. Different simulation scenarios are obtained by the combination of the fixed factors and the optional factors. The eight combination scenarios are shown in Table 2.

_{sc}Scenario . | Different combinations of six factors . | |||||
---|---|---|---|---|---|---|

The fixed factors . | The optional factors . | |||||

1 | A _{sc} | L _{sc} | A _{ws} | |||

2 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | ||

3 | A _{sc} | L _{sc} | A _{ws} | S _{b} | ||

4 | A _{sc} | L _{sc} | A _{ws} | R _{sc} | ||

5 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | S _{b} | |

6 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | R _{sc} | |

7 | A _{sc} | L _{sc} | A _{ws} | S _{b} | R _{sc} | |

8 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | S _{b} | R _{sc} |

Scenario . | Different combinations of six factors . | |||||
---|---|---|---|---|---|---|

The fixed factors . | The optional factors . | |||||

1 | A _{sc} | L _{sc} | A _{ws} | |||

2 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | ||

3 | A _{sc} | L _{sc} | A _{ws} | S _{b} | ||

4 | A _{sc} | L _{sc} | A _{ws} | R _{sc} | ||

5 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | S _{b} | |

6 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | R _{sc} | |

7 | A _{sc} | L _{sc} | A _{ws} | S _{b} | R _{sc} | |

8 | A _{sc} | L _{sc} | A _{ws} | S _{sc} | S _{b} | R _{sc} |

ELM, ANN, and PSO–ELM models were established with different scenarios. The 101 small reservoirs with known information about their capacity were used to train and calibrate these models, of which 80% served as the training samples and 20% were the calibration samples. After calibration of the models, the calibration results of the three models are obtained under different scenarios, as shown in Table 3.

Scenario . | ANN . | ELM . | PSO–ELM . | ||||||
---|---|---|---|---|---|---|---|---|---|

MAPE . | Outliers . | R^{2}
. | MAPE . | Outliers . | R^{2}
. | MAPE . | Outliers . | R^{2}
. | |

1 | 80.07% | 1 | 0.1853 | 29.12% | 0 | 0.7357 | 24.46% | 0 | 0.9082 |

2 | 39.56% | 0 | 0.5814 | 25.87% | 0 | 0.4473 | 13.22% | 0 | 0.9612 |

3 | 32.10% | 1 | 0.8305 | 32.76% | 1 | 0.7706 | 16.26% | 0 | 0.9506 |

4 | 48.53% | 1 | 0.5117 | 27.77% | 0 | 0.3621 | 16.02% | 0 | 0.9545 |

5 | 35.75% | 0 | 0.8051 | 19.27% | 1 | 0.7959 | 9.82% | 0 | 0.9685 |

6 | 38.74% | 0 | 0.6805 | 20.20% | 0 | 0.8521 | 10.69% | 0 | 0.9778 |

7 | 36.91% | 1 | 0.5470 | 31.39% | 1 | 0.2828 | 15.73% | 0 | 0.9659 |

8 | 32.06% | 1 | 0.6904 | 24.33% | 0 | 0.7917 | 10.59% | 0 | 0.9699 |

Scenario . | ANN . | ELM . | PSO–ELM . | ||||||
---|---|---|---|---|---|---|---|---|---|

MAPE . | Outliers . | R^{2}
. | MAPE . | Outliers . | R^{2}
. | MAPE . | Outliers . | R^{2}
. | |

1 | 80.07% | 1 | 0.1853 | 29.12% | 0 | 0.7357 | 24.46% | 0 | 0.9082 |

2 | 39.56% | 0 | 0.5814 | 25.87% | 0 | 0.4473 | 13.22% | 0 | 0.9612 |

3 | 32.10% | 1 | 0.8305 | 32.76% | 1 | 0.7706 | 16.26% | 0 | 0.9506 |

4 | 48.53% | 1 | 0.5117 | 27.77% | 0 | 0.3621 | 16.02% | 0 | 0.9545 |

5 | 35.75% | 0 | 0.8051 | 19.27% | 1 | 0.7959 | 9.82% | 0 | 0.9685 |

6 | 38.74% | 0 | 0.6805 | 20.20% | 0 | 0.8521 | 10.69% | 0 | 0.9778 |

7 | 36.91% | 1 | 0.5470 | 31.39% | 1 | 0.2828 | 15.73% | 0 | 0.9659 |

8 | 32.06% | 1 | 0.6904 | 24.33% | 0 | 0.7917 | 10.59% | 0 | 0.9699 |

According to Table 3 and Figure 7, the MAPE of PSO–ELM is smaller than that of the other two models (ANN and ELM) for eight combination scenarios. The value of *R*^{2} of PSO–ELM is better than that of the other two models for all scenarios. Therefore, compared with the ANN and ELM, the PSO–ELM-based estimation model is more suitable for estimating the small reservoir capacity in hilly areas.

Furthermore, the sensitivity of influential factors for the PSO–ELM is tested by different scenarios. The MAPE of scenario 1 is the largest one (24.46%) resulting in the worst estimation effect, which illustrates that only using the fixed factors can not accurately estimate the reservoir capacity. Although there is no significant linear relationship among *S _{sc}*,

*S*, and

_{b}*R*and reservoir capacity, they are very important for estimating the small reservoir capacity. After adding one optional factor (scenarios 2, 3, and 4), both the MAPE and

_{sc}*R*

^{2}are better than scenario 1. It is the same as adding two optional factors (scenarios 5, 6, and 7), both the MAPE and

*R*

^{2}are better than scenarios 2, 3, and 4. While, when adding three optional factors (scenario 8), the MAPE is better than scenarios 6 and 7 and less than scenario 5, and the value of

*R*

^{2}is better than scenarios 5 and 7 and less than scenario 6. This illustrates that blindly increasing the number of influential factors cannot continue to improve the accuracy of the model.

In fact, scenarios 5, 6, and 8 can all serve as well factor combinations; however, the MAPE of scenario 6 is slightly poor, while scenario 8 needs to add an additional factor. These two can be used as backup schemes. In scenario 5, both the number of factors and the accuracy of the model are better than in scenarios 6 and 8. Therefore, scenario 5 of *A _{sc}*,

*L*,

_{sc}*A*,

_{ws}*S*, and

_{sc}*S*are selected as the influential factors to establish the estimation model based on PSO–ELM to estimate the small reservoir's capacity in hilly areas.

_{b}To intuitively demonstrate improvements, the difference between the estimated capacity and the actual capacity of the three models under eight combination scenarios of influential factors were plotted as scatter graphs, as shown in Figure 8.

### Validation of estimation results

To further validate the robustness of the reservoir capacity estimation model established by PSO–ELM, the datasets of 12 known small reservoirs in the Mihe River basin were deleted from the training samples. Only the data of the 89 small reservoirs in the reference basin are used for training and calibration, and the 12 known reservoirs in the Mihe River basin are used to estimate and test. The results are shown in Figure 9. Figure 9 shows that even if the training samples do not contain any known information about small reservoirs in the target basin (the Mihe River basin), the MAPE increased to 15.6% and the value of *R*^{2} is 0.6863, which are still acceptable for the estimation model. This implies the robustness of the reservoir capacity estimation model established based on the five influential factors and PSO–ELM in this paper. Hence, the capacity of small reservoirs in areas without sufficient data can be estimated by transplanting the estimation model of basins with similar topography, thereby enlarging the application range of the proposed model.

### Estimation results of the capacity of small reservoirs

A total of 123 small reservoirs were identified in the study area, as shown in Figure 3. Among them, the capacity of 12 reservoirs is known and the capacity of the remaining 111 small reservoirs needs to be estimated. The five influential factors *x _{ij}* of these 111 small reservoirs were substituted into the trained estimation model to calculate the reservoir capacity

*y*

_{j}, as shown in Figure 10. Figure 10 shows that the estimated capacity of 111 small reservoirs ranges from 0.54 to 3.80 million m

^{3}, which meets the requirements of the small reservoir's capacity defined in this paper (0.1–10 million m

^{3}), so the estimated results are rational.

## CONCLUSION

The acquisition of hydrological data in areas without sufficient monitoring has always been a heated topic and difficult point in hydrological research. Based on remote sensing images and DEM data, this paper estimated the capacity of small reservoirs without enough data in hilly areas.

Five topographic and geomorphic factors, namely the area of sub-catchment controlled by the reservoirs (*A _{sc}*), the water surface area of the reservoir (

*A*), the longest flow path of sub-catchment (

_{ws}*L*), the average slope of sub-catchment (

_{sc}*S*), and the average slope of buffer area (

_{sc}*S*), are selected as influential factors to establish the estimation model of reservoir capacity in the hilly area. These five factors are the best combination of simulation results among the information that can be mined so far.

_{b}Compared with the results of ANN and ELM, the estimated result of the PSO–ELM for the reservoir capacity is significantly better in hilly areas. Meanwhile, due to the robustness of the estimation model based on PSO–ELM, the estimation model established in the reference basin with monitoring data can be transferred to the target basin with similar topography and the lack of monitoring data.

In addition, the proposed method in this paper is applicable to estimate the capacity of small reservoirs located in hilly areas with a water surface area of greater than 0.002 km², while the effect of this method in estimating the capacity of the other types of reservoirs has not been studied. In the next step, a field exploration will be carried out on small reservoirs without sufficient data, and the simulation results will be compared with the measured results.

## ACKNOWLEDGEMENTS

Financial support for this work is provided by the National Natural Science Foundation of Shandong Province (No. ZR2021QE009) and the science and technology projects of the Hydrology Center of Shandong Province: Impact of Rainstorm on Water Resources Management (No. SDYD2020-425).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper and the Supplementary Material.

## CONFLICT OF INTEREST

The authors declare there is no conflict.