In this article we present a novel nested dynamic programming (nDP) algorithm for multipurpose reservoir optimization with additional decision variables related to different water users. The nDP algorithm is built from two algorithms: (1) dynamic programming (DP) and (2) nested optimization algorithm implemented with Simplex and quadratic Knapsack methods. The novel idea is to include a nested optimization algorithm into the DP transition that reduces the initial problem dimension and alleviates the DP's curse of dimensionality. The nDP can solve multi-objective optimization problems, without significantly increasing the algorithm complexity and the computational expenses. Computationally, the nDP can handle dense and irregular variable discretization; it is coded in Java as a prototype application and has been successfully tested with eight objectives at the Knezevo reservoir in the Republic of Macedonia. The article presents a discussion on comparison of nDP with other DP methods and highlights the advantages of nDP.

## INTRODUCTION

Historically, the two most widely used methods for optimal reservoir operation have been dynamic programming (DP) and stochastic dynamic programming (SDP). These two methods suffer from the so-called ‘dual curse’ which prevents them to be used in reasonably complex water systems. The first one is the ‘curse of dimensionality’ that denotes an exponential growth of the computational complexity with the state–decision space dimension (Bellman 1957). The second one is the ‘curse of modelling’ that requires an explicit model of each component of the water system (Bertsekas & Tsitsiklis 1995) to anticipate the effect of each system's transition. The literature offers various strategies to overcome the curse of dimensionality such as successive approximations, incremental DP, and differential DP (Jacobson & Mayne 1970; Bellman 1986; Anvari et al. 2013; Li et al. 2013). The application of various DP and SDP methods in optimal reservoir operation are reviewed in Yeh (1985) and for multi-reservoir systems in Labadie (2004). One of the additional issues arises is when assessment of the constraints and objective functions requires running complex models (for example, river, a model to calculate water levels or flows downstream). One of the possible solutions is the use of fast approximate surrogate models (e.g., artificial neural networks replicating the behaviour of a hydrodynamic river model (Solomatine & Torres 1996)) – this reduces the total running time but still for real-life problems the use of DP can be prohibitively long.

This paper addresses the problem of optimal reservoir operation concerning multiple objectives that are related to: (1) reservoir releases to satisfy several downstream users competing for water with dynamically varying demands; (2) deviations from the target minimum and maximum reservoir water levels; and (3) hydropower production that is a combination of the reservoir water level and the reservoir releases.

Addressing such a problem with classical DP requires a reasonably high level of discretization of the reservoir storage volume, which in combination with the required releases discretization for meeting the demands of downstream users leads to computationally expensive formulations and causes the curse of dimensionality.

We present an alternative approach, in which at each transition of the classical DP an additional optimization algorithm is run to identify the optimal releases (allocations) to individual users. Because this second optimization algorithm is ‘nested’ inside the classical DP algorithm we name this method ‘nested dynamic programming (nDP)’. Depending on the way we formulate the objective function related to deficit in the allocation problem in the nested optimization, two methods are implemented: (1) Simplex for linear allocation problems and (2) quadratic Knapsack method in the case of non-linear problems.

The presented nDP algorithm was coded as a prototype application in Java using Eclipse. The DP part of the nDP was coded by using the example from the book by Loucks & Van Beek (2005). The nested optimization algorithms, Simplex and quadratic Knapsack methods are developed as separate modules. The nDP algorithm was tested at the multipurpose reservoir Knezevo of the Zletovica hydro-system located in the Republic of Macedonia, with the purpose of urban water supply, agriculture, ensuring ecological flow and generation of hydropower.

The article is organized in six sections. After the Introduction, the second section presents the problem, formulation of variables, objective functions and the nDP algorithm. The third section introduces the case study, which is followed by a section presenting the implementation of the nDP for this case. Results and discussion follow, then a section including the comparisons of the nDP with others DP methods, which is followed by conclusions.

## THE PROBLEM AND THE NDP ALGORITHM

### nDP framework

Typically in a single-reservoir optimization problem there is only one decision variable to be identified at each time step – the reservoir release. A specific characteristic of the problem considered in this paper is that this release is to be divided between n competing users, and this multiplies the total number of decision variables. This problem, if posed in the DP set-up using the Bellman equation (Bellman 1957) (for stages t = T−1, T−2, …1)

where st is the state vector representing discrete reservoir storage volume at the beginning of the period t; T is the number of stages in the sequential decision process; is the state value function; is the actions or decision variables vector during the period t; is the reward from period t when the current state is st, the action at is executed and the resulting state is st+1.

The reservoir model is based on the mass balance equation where qt is the reservoir inflow, et are evaporation losses and rt is the total reservoir release.

The nDP contains a nested optimization algorithm inside the DP algorithm that optimally allocates the total reservoir release rt to different users corresponding to their demands dit, as shown in Figure 1.

Assuming a backward-moving DP, at each nDP transition, the reservoir beginning and end states are known. From the mass balance Equation (2), the release rt can be calculated and then nested optimization algorithm is run to identify allocation of rt between n users. The inputs of the nested optimal allocation algorithm are the reservoir release, the users' demands and their relative importance, while the decision variables are the water volumes given to users (called subsequently the ‘users' releases') for satisfying their demands. The overall objective function may include terms related to users' releases, deviations from target reservoir levels, or both releases and levels (e.g., hydropower). At the end of the transition, all decision variables are estimated and the reward is calculated.

The nDP pseudo-code is presented below:

1. Discretize storage st and st+1 in m intervals, i.e., si,t (i = 1, 2, …, m), sj,t+1(j = 1, 2, …, m).

2. Set time at t = T − 1.

3. Set reservoir level i = 1(for time step t).

4. Set reservoir level j = 1(for time step t + 1).

5. Calculate the first group of the objective functions (related to deviations from target reservoir levels).

6. Calculate the total release rt using Equation (2).

7. If total release rt is bigger or equal to total demand then return 0, and go to step 9.

8. Execute the nested optimization algorithm to allocate the total release to all users {r1t, r2t, … rnt} in order to meet their individual demands.

9. Calculate the second group of the objective functions (related to users' releases).

10. Using the reservoir levels and the user releases, calculate the third group of the objective functions (related to hydropower production).

11. Combine the objective functions from steps 5, 8 and 9 into the aggregate objective function V(st).

12. j = j + 1.

13. If jm, go to step 5.

14. Select the optimal actions (decision variables) {a1t, a2t, … ant}opt, which consist of the optimal transition {st+1}opt and the users' releases {r1t, r2trnt}opt that give minimal value of V(st).

15. i = i + 1.

16. If im, go to step 4.

17. If t = 0, stop.

18. t = t − 1.

19. Go to step 3.

The optimal allocation algorithm is incorporated (nested) in the DP method (step 7) and directly updates the state value function V(st) at each time step consequently changing the optimal reservoir policy and solving the multi-objective optimization problem.

The action vector {a1t, a2t, … ant} consists of the transition state st+1, and the users' releases {r1t, r2t, … rnt}. The corresponding total release rt, at each transition is calculated from Equation (3). The et evaporation losses can be calculated at each transition using the reservoir area that is a function of the reservoir storage volume st, and other factors.

### Approaches to nested optimization of step-wise resource allocation

Depending on the problem formulation various methods can be used to optimally allocate the total reservoir release rt between n water users. Optimality could be understood in different ways and this is reflected in the objective function. We have chosen to use two methods: Simplex method in the case of linear problem and quadratic Knapsack algorithm for quadratic problem. Each water user is described with its demand dit and corresponding weight Wit at time step t. For the nested optimal allocation the following variables are relevant: d1t, d2t, … dnt are users' demands; W1t, W2t, … Wnt are the corresponding demands' weights; rt is the reservoir release; r1t, r2t, … rnt are the users' releases, v is the release discretization value.

Note that in the beginning of carrying out the nested optimization step, the nDP algorithm checks if the release rt can satisfy the aggregated demand of all users in Equation (3), step 7 in the nDP pseudo-code. If the release rt can satisfy the aggregated demand of all users, then the optimal allocation is not performed since all the releases can be set to their demands.

#### Simplex method

The quadratic Knapsack method is used when the objective function is non-linear, this is the case when the squared weighted deficit of the demand objectives is to be minimized. The reservoir release rt is assumed to be discretized in v levels; this value is set at the beginning and stays the same over nDP execution. The quadratic Knapsack objective function is to minimize with the same constraints previously described in Equations (5)–(7).

## CASE STUDY

The hydro system Zletovica is located in the eastern part of the Republic of Macedonia and it uses the water resources of the river Zletovica and its tributaries. It is composed of the reservoir Knezevo, several water distribution canals used for delivering water to downstream users and associated infrastructure structures, as shown in the schematic representation in Figure 2. The reservoir Knezevo is a multipurpose reservoir, and its main objective is to provide drinking water to several towns and populated areas in the region, as well as to provide environmental flows in Zletovica, water for agriculture and hydropower (in this order of decreasing priority).

There are five towns in this region (Kratovo, Probishtip, Zletovo, Shtip and Sveti Nikole) and two large agricultural irrigation regions named upper and lower zone. The town Kratovo is not modelled because the intake for this town is upstream from the Knezevo reservoir and only a small water quantity is required. The region is characterized with mountainous topography and the system is designed to include several small hydropower plants, most of which are located downstream of the reservoir Knezevo as derivational (run of river) power plants that utilize the natural head differences created by the topography. The hydropower system is still in development and according to the feasibility study report (GIM 2010) the plan is to build eight hydropower plants.

The GIM (2008) report contains a detailed hydrological model of the Zletovo river basin with four river flow measurement points on the river Zletovica. The monthly time series data of the flow measurement points from the year 1951–1990 are used in this research. There is a significant tributary inflow between the Knezevo reservoir and the last branching point. First, the tributary inflow is used to satisfy the water demand objectives, and if additional water quantities are needed then they are released from the reservoir.

## APPLICATION OF NDP IN THE CASE STUDY

### Formulation of the optimization problem

The Knezevo reservoir optimization problem has eight objectives Oi and six decision variables. Each objective is described by its target value and its corresponding weight at each time step. In this study, we aggregate all objectives into one objective function being the weighted sum of squared deviations over the entire time horizon; referring to the Bellman Equation (1) the reward function has the following form: where st is the reservoir storage volume at time step t, wit is the objective weight for a given objective i and time step t, and Dit is the difference between the target value and decision variable for a given objective i and time step t, ci is the balance coefficient explained below in this paper.
The first two objectives relate to deviations from the two target water levels: the allowable minimum water level and the maximum water level (exceeding which is dangerous since this may lead to an overspill via the reservoir spill outlets and possible flooding downstream). These objectives are formulated as follows:  where min lt and max lt are the recreation-related (minimum) and flood-related (maximum) water level targets.
Based on the hydro system configuration, our formulation has five users with water demand-related objectives. These are the following water users: (1) the towns of Zletovo and Probishtip (one intake); (2) the upper agricultural zone; (3) the towns of Shtip and Sv Nikole (one intake); (4) the lower agricultural zone; and (5) the minimum environmental flow, with their respective demands d3t, d4t, d5t, d6t, d7t. The water deficits are calculated using Equation (12) where rit is the release (decision variable) for a given objective (i) and time step (t).
The last objective is related to hydropower. Its corresponding formulation uses w8t as the hydropower energy production weight and D8t is calculated from where d8t is the hydropower target demand and pt is the hydropower production.

The action vector at consist of six actions or decision variables: st+1, r3t, r4t, r5t, r6t, r7t which are the next optimal reservoir state and water user releases at each time step. Using these decision variables, it is possible to calculate all other variables and objective functions.

The five hydroelectric power plants (HEC), named HEC0–HEC4 are dependent on the reservoir operation and they are the only ones considered in the optimization. (There are three additional hydropower plants located on the tributaries of Zletovica that are not considered.) HEC0 is positioned at the Knezevo reservoir and the entire reservoir release rt goes through the turbines of HEC0. The reservoir release rt is compared to the generator maximum water capacity HEC0max, as in Equation (14).
The r0t is the reservoir release quantity that goes over the turbines of HEC0. The energy generated by HEC0 (MWh) is The gen0 is the coefficient that includes total efficiency and it is taken from the GIM (2008) report; lt and lt+1are reservoir water levels at time t and t + 1. The others HEC1–HEC3 and HEC4 are calculated similarly, with one difference – for these derivational HECs the generated energy does not depend on the reservoir water levels directly, but only on the releases from the reservoir, which (together with corresponding tributary flows) constitute the total flow for each HEC. The actual head for these HECs is fixed (by topographical differences in elevation). The calculation of the hydropower production of HEC1–3 and HEC4 is somewhat more complex, so these details are not presented here as they are beyond the focus of this article. All hydropower plants together produce the total energy pt.

Since in Equation (9) various objective function types are aggregated into one, their values need to be adjusted so they are comparable in order of magnitude. The approach used was to assign the appropriate coefficients (ci in Equation (9)). A similar approach to balancing was taken in other previous research studies, e.g., Pianosi & Soncini-Sessa (2009) and Rieker (2010).

The nested algorithm is selected in the beginning of optimization. The nDP is constructed in a way that first the tributary flow (downstream of the Knezevo reservoir) is optimally divided between users' demands using the previously selected nested optimization algorithm. Afterwards, the remaining, still unsatisfied user demands are met from the reservoir releases by applying the optimization as described above.

### Optimization algorithm settings

The nDP algorithm was tested on the Zletovica hydro system using 40-year monthly data from 1951 to 1990, which resulted in 480 time steps. The reservoir operation volume is 23 × 106 m3 which was discretized in 115 equal levels (200 × 103 m3 each). The target for the recreation reservoir level was set at 1,035 [amsl], for the flood control reservoir level at 1,058 [amsl] and for the hydropower production at 1.5 GW-h per month. The targets for the consumptive users were set to their predetermined monthly demands.

To test and evaluate the proposed nDP algorithm and to compare results using the two nested optimization methods, Simplex and quadratic Knapsack, four different scenarios were considered, as shown in Table 1. They are based on two sets of weights for objective function and two formulations for the deficits in the nested optimization problem: (a) linear (hence the Simplex method, Equation (4)) named L1 and L2; and (b) quadratic (the quadratic Knapsack method, Equation (8)) named Q1 and Q2. In the quadratic Knapsack the discretization of allocation was set to 50. The higher weights are set for urban water supply objectives (w3 and w5) followed by ecological flow, irrigation, reservoir water level objectives and hydropower. The second set of weights gives higher priority to one of the urban water supply objectives (O5) at the expense of the two irrigation objectives (O4 and O6).

Table 1

Objective functions weights for the four considered scenarios L1, Q1, L2, Q2

Scenarios w1 w2 w3 w4 w5 w6 w7 w8
L1 and Q1 0.05 0.05 0.25 0.1 0.25 0.1 0.16 0.04
L2 and Q2 0.05 0.05 0.25 0.075 0.3 0.075 0.16 0.04
Scenarios w1 w2 w3 w4 w5 w6 w7 w8
L1 and Q1 0.05 0.05 0.25 0.1 0.25 0.1 0.16 0.04
L2 and Q2 0.05 0.05 0.25 0.075 0.3 0.075 0.16 0.04

## RESULTS AND DISCUSSION

The scenario optimization results present the comparison of the sums of (1) recreation and flood objectives' deviations, Equations (10) and (11), (2) water users' deficits, Equation (12), and (3) hydropower deficit, Equation (13), over the entire time horizon, shown in Figures 3,45, respectively. For all deviations/deficits, the sum over the entire horizon is calculated using Equation (16)
Figure 3

Comparison of the sums of recreation level (D1) and flood level (D2) deviations from the target over the entire time horizon for the four scenarios.

Figure 3

Comparison of the sums of recreation level (D1) and flood level (D2) deviations from the target over the entire time horizon for the four scenarios.

Depending on the formulation that is used in the nested optimization, the obtained differences in results can be observed in Figures 3,45. The quadratic formulation in Q1 and Q2 scenarios influence the D1 and D2 deviations, producing better results compared with the L1 and L2 scenarios, respectively, as can be seen in Figure 3. The D3 and D5 deficits in the L1 and L2 scenarios are at the expense of D4 and D6 (agriculture) that have higher deficits, as shown in Figure 4. When comparing L1 and L2 with Q1 and Q2 scenarios, it can be seen that D3 and D5 have less deficits when a linear formulation is used. The quadratic formulation (quadratic Knapsack) makes a more balanced distribution between the water demand users, which is due to its objective function, shown in Equation (8). The quadratic formulation objective function alleviates the D6 peak deficits that occur in L1 and L2 scenarios, at the expense of D3, D4 and D5. The quadratic formulation has better results concerning D8 hydropower deficits, as shown in Figure 5.

These differences in results when using linear or quadratic formulations in the nested algorithm come from the properties of the objective functions as described in Equations (4) and (8). Linear objective functions have the known characteristic that optimal solutions are always corner-point feasible solutions, which means that when the objective function is formulated as a weighted sum of separate linear objectives, the optimal solution will favour objectives (‘users' in our case) with higher weights. Therefore, D5 and D3 (which are with the highest weights) show smaller deficits for L1 and L2 than with Q1 and Q2. The quadratic formulation allows for a more balanced solution overall (with better trade-offs) and in fact alleviates huge deficits obtained with linear formulation for users with lower weights. However, linear formulation performs faster, and depending on the problem, targets and the ‘desired’ optimal reservoir operation, the modeller can choose between the linear and quadratic formulation. It is also important to note that the proposed solution is general in the sense that it allows for inclusion of other nested algorithms, and it is not restricted to the two proposed and tested here.

The w5 (corresponding to the urban water supply) is the highest weight in the L2 and the Q2 scenarios shown in Table 1, and consequently, it is the most satisfied water demand user, as shown in Figure 4. The D3 deficit in the L1 scenario is zero, but it has a significant increase due to the w5 increase in the L2 scenario, although w3 weight is unchanged, as shown in Figure 4. The D4 and D6 deficits are slightly increased in L2, compared to the L1 scenario. The D3 and D5 deficits are lower in Q2 compared to the Q1 scenario. These deficit reductions are achieved at the expense of D4 and D6 deficits. When comparing L1, L2, Q1 and Q2 scenarios considering D1, D2, D3, D7 and D8 deficits and their weights, it is obvious that the results are different even without changes of the corresponding weights, as shown in Figures 4,56. This happens because the change in one or more weights, in this case w4, w5 and w6, modifies the main objective function and, consequently, produces different results. The presented scenarios demonstrate that by changing the weights in accordance with the user preferences it is possible to create different optimal reservoir operations.

As it is difficult to present the 40 years nDP optimization results' graphs, a 3-year (1985–1987) sample was selected from the Q2 scenario to show some characteristic results. Figure 6 shows the variations of reservoir inflow, tributaries inflow, reservoir volume and reservoir outflow. Figure 7 shows the recreation and flood target’ deviations. Figure 8 shows water users' demands. Figure 9 shows water users' deficits and Figure 10 shows hydropower deficits. This 3-year period is a combination of two dry and one wet year in between them.

This 3-year period is characterized by the significant reservoir and tributary inflows in the spring months (from February until May) due to high precipitation and snowmelt, which is relatively small in the other periods, as shown in Figure 6. The reservoir releases are rising between June and October because of the increased demand of the agricultural users, as shown in Figures 6 and 8. The town users (d3t and d5t) and the ecology (d7t) demands are relatively constant, while the agriculture users' (d4t and d6t) demands are variable reaching a maximum in the summer months, as shown in Figure 8. The reservoir inflow and the tributary inflow in 1986 are larger and more widely distributed between January and June compared to 1985 and 1987 when there are high reservoir inflows in April and May, as shown in Figure 6. The difference between 1986 and the other 2 years, 1985 and 1987, is shown in the reservoir release shown in Figure 6, the water users' deficits in Figure 9 and hydropower deficits shown in Figure 10. The majority of users' deficits occur from April until October in 1985 and 1987, as shown in Figure 9. The deficits D4 and D6 (agriculture) are larger than D3 and D5 (towns) deficits in 1985 and 1987, because of the smaller weights w4 and w6 and their larger demand, as shown in Figures 8 and 9. If the linear formulation results were presented using the L1 or the L2 scenarios, the difference between the agriculture and the town's deficits would be even larger.

The reservoir satisfies (almost) all objectives in 1986, as shown in Figures 9 and 10. There are reservoir flood deviations (D2) in May 1985 and 1987 because of peak reservoir inflows, as shown in Figures 6 and 7. On the other side, there are recreational level deviations (D1) in October and December 1985. This coincides with the reservoir volume shown in Figure 6, which is lowest in periods between November and January. The last months of 1987 suggest that probably the next year is very dry. The reservoir stores additional volume in winter months (October–December 1987) at the expense of water demand users and the hydropower, as shown in Figures 6, 9 and 10.

The nDP algorithm was tested with a variable reservoir volume discretization using the Q2 scenario settings. The reservoir storage volume was discretized: (1) from 0 to 5 × 106 m3 and from 22 × 106 to 23 × 106 m3 in intervals of 200 × 103 m3; and (2) the rest was discretized in intervals of 150 × 103 m3 (resulting in the 144 discretization levels). The higher discretization has directly increased the computational resources and the running time, but it has improved the results (1–2%). This test shows that the nDP algorithm allows for higher levels of reservoir level discretization without prohibitive computational time, and for each problem, a balance between the desired level of accuracy and the computational time and resources can be determined. In addition, the nDP algorithm was tested with variable objective weights at different time steps.

In designing the optimal reservoir operation, one needs to be aware of the objectives' interdependence, in our case, users' releases, reservoir level and hydropower. The hydropower production is a function of the total users' releases (reservoir release) and the reservoir level. The hydropower weight influences both the total users' releases and the reservoir level. An additional scenario was executed based on the Q2 scenario settings, where the highest priority (weight) was set to the hydropower. The scenario result showed that the reservoir was filling to the highest possible level and afterwards made significant releases, increasing the hydropower production. The increased hydropower production directly influenced the users' releases and the reservoir levels. This shows that although the main objective function is composed of a sum of weighted objectives, there is additional complexity because of the objectives' interdependencies, which means that changing the objective weight influences other dependent objectives and the overall results. The desired optimal reservoir operation can be designed by tuning the weights.

The nDP makes an exhaustive search over all possible states and actions, executing the nested optimization at each transition, and making a significant number of calculations. For the presented set-up, the scenario execution time was under 5 minutes on a standard desktop computer. As expected, the execution time of the Q1 and the Q2 scenario was longer because of the more complex calculation in the quadratic Knapsack than in the Simplex optimization algorithm. The minor obstacle for the nDP implementation was the memory requirements that in our case were solved by increasing the maximum memory allocation pool for the Java Virtual Machine.

## DISCUSSION ON COMPARISON OF NDP WITH OTHER DP ALGORITHMS

### nDP compared with a classical DP algorithm

The presented case study has five water demand objectives. The classical DP approach would model these objectives as five different releases. Since the DP algorithm is an exhaustive search, all possible states and actions need to be evaluated. Let us assume that the discretization is the same and that the five releases are discretized on 10 × 103 m3.

The DP algorithm makes a transition between all states. Let us consider only one transition from full to empty reservoir. In this simple calculation, the possible combinations of five releases are below 2,3005. There are 480 time steps and over 3,000 transition at each time step. A simple assessment shows that the total number of DP transitions is around 1023, which makes this problem computationally unsolvable. For the applications considered, the usage of classical DP becomes impossible when there are several decision variables.

### nDP compared with an aggregated water demand DP algorithm

To decrease the optimization problem dimension and to allow for DP implementation a simplified sub-optimal approach can be taken: the water demand objectives can be combined into a single aggregated objective so that at the first stage the standard DP optimization can be employed. The resulting releases are quasi-optimal since only the aggregated demand is optimized rather than the individual ones. At the second stage, these quasi-optimal reservoir releases are distributed between the different water demand users for each time step according to their importance. Such an approach can be called the aggregated water demand (AWD) DP algorithm. It would be good to compare AWD DP with nDP but using the case considered in this paper it appeared to be difficult.

Let us consider a case similar to the Knezevo reservoir case with eight objectives: recreation level target, flood level target, five water demand users (two towns, two agriculture and ecology) and hydropower. The AWD approach will combine the water demand objectives into a single objective, and the rest of the problem will remain the same (Equations (17) and (18)):

The new optimization problem now has four objectives, (1) recreation level target, (2) flood level target, (3) (aggregated) water demand user and (4) hydropower, and it can be solved by AWD DP. In the first stage, the DP algorithm is executed with the four objectives and the total optimal reservoir releases identified. In the second stage, these releases are be distributed at each time step between the five users, forming allocations , , , and .

Let us assume that hydropower is a priority, and only the agriculture-related allocations and are used for producing hydropower. In this case, the AWD approach cannot be applied because it is difficult to create an objective function in DP that will represent the hydropower weight when all water demand users are aggregated (since hydropower is generated only by two out of four allocations). The solution for this problem would be to divide the water demand users and their allocations into two groups, (a) , and (b) , , , and in this way to represent the hydropower in the objective function. This will introduce the second decision variable in DP that considerably increases the dimensionality of the problem. If we make this case more complicated and assume that the water demand user is very important compared to the other users, and needs to be represented as a separate decision variable in DP, then the application of the classical AWD would become even more computationally demanding.

An alternative is to use the nested approach, as introduced in this paper, which makes it easy to introduce a large number of separate objective functions (e.g., hydropower or d3t prioritization as discussed above).

This consideration allows the conceptual difficulty in the accurate comparison of nDP and AWD in the case study of Zletovica to be seen; the main problem is that the objective functions' formulations for these two algorithms are different. Indeed, the AWD DP combines all water demands into a single objective in Equations (17) and (18) while the other objectives are the same, so the water demand objectives (and consequently the hydropower) are represented differently. The first stage of AWD DP uses the following objective function: The objective function used by nDP is shown in Equation (9). The difference is obvious in the user demand objectives, and where Even if we consider the results of the AWD DP second stage and combine them into one objective function with eight sub-objectives, it will not be the same as used in nDP. Another difference between the nDP and the AWD DP is that the second stage of the AWD DP does not change the reservoir releases, because they are already calculated in the first stage, while in the nDP at each transition the releases are calculated taking into consideration all objectives. This interplay between individual objective functions brings additional complexity into the problem of comparing these algorithms and it is planned to develop an example for demonstrating this in further research.

## CONCLUSIONS

This article presented a novel nested approach to solving a DP problem that allows for several objective functions at each time step. The use of the nDP algorithm was implemented in the Zletovica river basin case study with eight objectives and six decision variables. The nDP algorithm has the following advantages:

• It effectively alleviates the curse of dimensionality of a standard DP algorithm.

• It allows for considering all objective functions at each time step and hence allows for deeper optimization if compared to the sub-optimal approach where all water demands are aggregated (AWD DP).

• Computationally, nDP runs fast on standard personal computers. In the presented case study, optimization for one scenario required no more than 5 minutes on a standard desktop PC.

• The algorithm allows for employing dense and variable discretization of the reservoir volume and release.

• It supports using a variable (time-dependent) weight at each time step for every objective function.

• It provides a framework for including many objectives in the nested optimization algorithm without a significant change in the source code or an increase in the computational expenses.

• Different optimization algorithms can be used in the nesting for water allocation; however, since nested optimization has to be repeated multiple times (for each transition of DP) the algorithm used for this purpose needs to be fast. Thus, in the presented nDP framework we used Simplex and quadratic Knapsack algorithms which showed excellent performance.

• The nDP framework can support complex non-linear objective functions.

• The presented nesting approach can be applied not only in the deterministic DP context, but also in SDP, reinforcement learning and other optimization algorithms.

One of the limitations of the presented study is the fully deterministic formulation, without explicit consideration of uncertainties, so characteristic of the reservoir operation. Therefore, further research will be focussed on implementing the nDP approach in a SDP setting. In addition, it is planned to test the nested optimization approach in the problems solved by reinforcement learning.

## REFERENCES

REFERENCES
Anvari
S.
Mousavi
S. J.
Morid
S.
2013
.
J. Hydroinform.
16
,
907
921
.
Bellman
R.
1957
Dynamic Programming
.
Princeton University Press
,
Princeton, NJ
,
USA
.
Bellman
R.
1986
Dynamic programming and Lagrange multipliers
. In:
The Bellman Continuum: A Collection of the Works of Richard E. Bellman
(
Roth
R. S.
, ed).
World Scientific Publishing
,
Singapore
,
49
.
Bertsekas
D. P.
Tsitsiklis
J. N.
1995
Neuro-Dynamic Programming
.
Athena Scientific
,
Belmont, MA
,
USA
.
GIM
2008
Project for Irrigation System of Multipurpose Hydro System Zletovica
.
Civil Engineering Institute Macedonia
,
Skopje
,
Republic of Macedonia
.
GIM
2010
Fesibility Project for 8 Hydropower Centrals of Hydro System Zletovica.
Civil Engineering Institute Macedonia
,
Skopje
,
Republic of Macedonia
.
Jacobson
D.
Mayne
D.
1970
Differential Dynamic Programming
.
Elsevier
,
New York
,
USA
.
J. W.
2004
.
J. Water Resour. Plann. Manage.
130
,
93
111
.
Li
J.-Q.
Zhang
Y.-S.
Ji
C.-M.
Wang
A.-J.
Lund
J. R.
2013
.
Water Sci. Technol.
68
(
11
),
2458
2467
.
Loucks
D. P.
Van Beek
E.
2005
Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications
.
UNESCO Publishing
,
Paris
,
France
.
Pianosi
F.
Soncini-Sessa
R.
2009
.
Water Resour. Res.
45
,
W10430
.
Rieker
J. D.
2010
A Reinforcement Learning Strategy for Reservoir Operational Improvement of Riverine Water Quality
.
Department of Civil Engineering, Colorado State University
,
Fort Collins, CO
,
USA
.
Solomatine
D.
Torres
A.
1996
Neural network approximation of a hydrodynamic model in optimizing reservoir operation
. In:
Proceedings of the Second International Conference on Hydroinformatics
,
9–13 September
,
Zurich
,
Switzerland
.
Yeh
W.
1985
.
Water Resour. Res.
21
,
1797
1818
.