A Fokker–Planck–Kolmogorov equation-based inverse modelling approach for hydrological systems applied to extreme value analysis

This paper takes a stochastic approach to identify uncertainties in hydrological systems that can be applied to the study of hydrological extremes. The system to be identified is supposed to be governed by a stochastic differential equation of the Langevin type, whose parameters are found through the inverse solution of the equivalent Fokker–Planck–Kolmogorov equation. The study presents the algorithmic and numerical implementation for the inverse modelling process, along with the implementation of this approach in three study areas. Results showed a flexible method that made it possible to consider hydrological variability and seasonality during system identification. The identified system parameters rely on the internal–external driving factors of the analysed river basin and provide indications about the behaviour of extreme events in possible future climate scenarios or situations where internal system parameters are altered. While the study cases presented refer to non-stationary Markov processes that allow for one-dimensional systems identification only, the proposed methodological approach is a step in the right direction when it comes to identifying n-dimensional Markov processes/systems. doi: 10.2166/hydro.2017.079 om http://iwaponline.com/jh/article-pdf/20/6/1296/505765/jh0201296.pdf er 2021 Thomas Rosmann (corresponding author) Pontificia Universidad Javeriana, Maestria en Hidrosistemas, Carrera 7 No. 42-00, Bogota, Colombia E-mail: trosmann@javeriana.edu.co Efraín Domínguez Departamento de Ecologia y Territorio, Pontificia Universidad Javeriana, Bogota, Colombia


INTRODUCTION
. However, trend studies for discharge series have to take into account human activity as a major factor. Furthermore, in most of the above-mentioned studies, seasonal trends were found.
With the results of trend studies in mind, the assumption is that some kind of change is happening on the planet, which gives reason to further study the topic in depth.
One of the crucial questions that needs to be asked is as follows: Is the nature of extreme events changing and how can humans adapt to that change? To answer this question, it is not enough to state that extreme events are changing.
Rather, explanations are needed to also show how their underlying processes are working and which influences are causing the change (Rosmann ). A more profound understanding of the statistical characteristics of the time series under investigation can be achieved by taking a stochastic approach to studying the random processes that define them (Maldonado ; Kovalenko ). All of these previous studies show the ability of stochastic models to relate the probabilistic characteristics of hydrological processes with the system input signal and physio-geographic and other characteristics of the watersheds, in which they are located.
With the results of previous investigations in mind, the goal of this study was to develop an inverse modelling approach using the FPK equation to identify uncertainties in hydrological systems and the driving factors that influence their extreme values. A model based on the Langevin equation was developed by solving its equivalent FPK equation and therefore finding the drift and diffusion terms that describe the structure of the random process. It also tries to establish a relationship between the calibrated model parameters and the geomorphometrical basin parameters, as well as rainfall input to the system. This way, a relationship between the abstract model parameters with the physical and hydrological parameters of the river basin can be developed allowing a better understanding of the possible mechanisms that govern the dynamics of the underlying process. This, again, could indicate new practices in watershed management, prevention and mitigation of extreme events.

Stochastic model
To extract information from a hydrologic random variable, Yevjevich () proposed the use of four of its properties: (1) estimation of parameters and other characteristic values, (2) estimation of probability distribution functions, (3) modelling of processes in time and (4) the determination of properties of random variables.
The main parameter in the proposed model is discharge at a given point and is represented by a random process Q(t).
The process consists of 12 independent random variables q, indexed from 1 to 12, which represent the monthly mean, maximal or minimal discharges for each month of the year. Q(t) takes the form of The monthly mean, maximal or minimal discharges were calculated from the daily values of the obtained discharge time series.
Several authors reported that the rainfall-runoff process, at certain time scales can be considered as a Markov process (Naidenov & (2)).
Annual and monthly extreme runoff series' autocorrelation functions have correlation radii of lag one, which is the required autocorrelation radius for a process to be considered  The one-dimensional FPK equation derived from the Langevin equation (Equation (3)) takes the following form (Equation (4)) and was used to solve the inverse problem described in the following section: with k 1 , k 2 , k 3 , g 1 , g 2 , and g 3 being the initially unknown parameters of the system linked to drift (k i ) and diffusion (g i ).
In order to establish a relationship between the parameters of the Langevin and FPK equation, Sveshnikov () proposed the following: Substituting Equation (6) into Equation (8) gives Equation (9), as well as substituting Equations (5) and (9) into Equation (7) gives Equation (10): and Equation (3) becomes Since all of the above-presented parameters k 1, k 2, k 3 and g 1, g 2, g 3 are initially unknown, one of the main tasks of this paper was to establish a connection between these and the known and measurable parameters of the basin. These could be external parameters like precipitation or temperature and internal parameters such as the morphometry or the land cover of the basin.

Implementation of the inverse modelling FPK approach
To identify the stochastic mathematical operator that rules the hydrological response of a basin, an inverse modelling approach based on the FPK equation was implemented. This approach includes five steps as shown in Figure 1.
In the first step, a basin of interest was selected and related hydrometeorological and geomorphometrical data were retrieved. Precipitation and temperature stations well correlated with the discharges at the basin outlet were selected. As system output, mean and extreme discharges were calculated from the observed daily discharge data as the mean, maxima and minima of daily values. These mean and extreme values were applied as the basin's responses to the monthly mean and extreme event reference precipitation. The reference precipitation was calculated as the mean, maxima and minima value from the number of days previous to the extreme events that had the highest correlation with the discharge values. A more detailed description of the data selection criteria can be found in the data description section.
In the second step, the stochastic process was charac- Subsequently, the best fitted probability distribution was used as the initial condition for the FPK equation optimization problem for monthly mean, maximum and minimum discharge, respectively. The best fit distribution for each month was used, instead of inferring the best fit over all of the 12 months of the same random process and applying it to each month. The distribution functions were adjusted to The standard error function is given as follows: where r is the autocorrelation coefficient for lagged values, n is the total number of observations in each random variable and t is the critical value of the Student's t distribution with significance level α, which was chosen to be 5%.
To understand the process, it is very important to determine the form of the Langevin equation that describes the process of the observed river basin. This means that it is necessary to find the drift governing expression ψ(Q(t), t) and also the noise modulating function ζ(Q(t), t) of Equation  (5) and (6)), whose parameters were initially unknown. The base for the inverse problem was set, since these were the parameters whose values should be found in the optimization process in order to obtain the best simulation of the PDF translation from one month to the next.
In the fourth step, a numerical scheme was implemented to be used for the optimization of the unknown model par- and a time layer weight (λ) that controls whether the equation is solved explicitly (when equal to 0) or implicitly (when equal to 1).
In Equation (13), i denotes the interval descriptor for the time step and j for discharge. Due to the input data used, the increments of discharge j remain constant at 1 m 3 /s, and therefore, to guarantee the stability of the system, it is necessary for the increment of the time step to fulfil a Courant-Friedrichs-Lewy stability condition (Dominguez & Rivera ).
An absorbing-type boundary was used, which assumes that any value outside the upper and lower boundary of the PDF has a probability of 0. Therefore, it was necessary to scale the boundary values α and β sufficiently in order not to lose probability density in the tails (Gardiner ): All of the parameters were known except the drift and diffusion vectors, which were determined using an iterative optimization approach for Equation (13). First, initial guesses for the six parameters k 1 , k 2 , k 3 , g 1 , g 2 and g 3 were provided to the model to calculate the translation of probability density from one time step to the next. The mean average error between the observed and the modelled data of the translation's final month was calculated and the initial guesses were adapted by iteratively optimizing them and thereby minimizing the error. For this task, the curvefit optimization method in SciPy's optimize package was chosen, which uses the Levenberg-Marquardt algorithm (Moré ), applying the least squares method and returns a set of parameters corresponding to the minimized error.
As an additional control, a mean average error of less than 10% had to be achieved to accept the resulting set of parameters.
In the fifth and final step, the obtained optimal values of k 1 , k 2 , k 3 , g 1 , g 2 and g 3 were validated by establishing a relationship between them and the physical (morphometric and climatological) parameters of the basin, whose values could be deduced from observed data, such as precipitation and temperature time series. For this purpose, multiple regression analysis was applied, which was conducted between the optimal parameters and precipitation and temperature data to test the external influences on the basin on one side. On the other side, this was more difficult to do for internal basin parameters because no data were available about them. Some previous studies have related the coefficients of variation and skewness to internal parameters.

Data used
The aforementioned inverse modelling approach was implemented in four study areas, which should be located in different climatological regimes and at the same time count on high-quality data. For each of the selected basins, the lowermost discharge station available before its outlet was used for discharge data. All precipitation stations within the basin, which counted on a correlation of over 60% with the discharge data, as well as the closest temperature station to the discharge station, were also used for the analysis. The precipitation value was estimated as the mean of all the selected precipitation stations. The temporal range of data was chosen according to the availability of data in each river basin, but was at least 40 years up to the year 2012. The location of the four basins is shown in For each basin, the necessary discharge, precipitation and temperature data were collected and prepared as described above and presented in Figure 4. The initial conditions shown in this figure were the input data for the model optimization runs.

RESULTS
As mentioned above, the FPK equation model was used to simulate the transformations of probability density from one month to the next, which means that the probability density curve of one month was used to model that of the next. For this reason, the optimal model parameters were determined in order to provide the best possible simulation.
After the relationship to the physical parameters was established, these model parameters could be modified to simulate changed conditions and the system's response to them. As described in Equations (5), (6) and (11), these were the parameters k 1 , k 2 , k 3 , g 1 , g 2 and g 3 .
In the first set of model runs, unsupervised optimization was conducted, which did not take any assumptions and tried to optimize all six parameters. The results of this first run showed that it was impossible to find a global minimum of optimization and, therefore, a perfectly optimized solution.
It further led to the conclusion that a very large number of local minima could be obtained and therefore the procedure was very much dependent on the initial guesses that were used for the parameters. Changing these initial guesses in many cases led to completely different results. On the other hand, it was shown that an extraordinarily good fit could be achieved with a large number of different parameter sets, taking into account that it was quite possible to find a fit with a mean average error below 1% for all transformations from one month to the next for all four basins.
The ordinary least squares technique (Feldman & Valdez-Flores ) was used to establish a linear regression between the model and watershed parameters described above, but it was not possible to find any results that relate satisfactorily to the physical parameters of the basin.
For this reason, it could be concluded that an unsupervised optimization was not the most adequate approach to this problem, and that some kind of supervision had to be implemented. This supervised modelling approach meant reducing the number of equation parameters to be optimized and deducting some of their values from observed data. Without knowing the detailed characteristics of the river basin, this was more easily done for external basin parameters than for internal ones.
To decide how model parameters could be provided by available data, the following reasoning was applied. The analytical forms of the drift and diffusion coefficients in Equations (16) and (17) show that the drift term describes the instantaneous rate of change of the mean and the diffusion term the instantaneous rate of change of the squared fluctuations of the process at a given Q (Dominguez & Rivera ). This leads to the suggestion that a variable which is correlated to Q might influence the drift and diffusion coefficients.
Further data analysis showed that there was a correlation between the runoff and precipitation data, usually ranging from 0.6 to 0.8, as well as a slightly lower one between temperature and runoff minima. Therefore, monthly precipitation was chosen as the variable to provide the data used as an external basin parameter. The assumption was that the independent terms of the drift and diffusion equations, k 3 and g 3 , should be most closely related to the external parameters of the basin. The remaining four parameters, k 1 , k 2 , g 1 and g 2 , were optimized and their correlation to the discharge series' coefficients of variation and skewness was tested after an optimal result was obtained.
Leading from Equations (16) and (17), the following equations were proposed to calculate these fixed independent terms, k 3 and g 3 : where X represents monthly precipitation for mean data and reference precipitation for extreme value data of the initial and final month of the transformation. Regression analysis was applied to the optimal parameters of the remaining three basins, which indicated a relationship of the parameters k 1 , g 1 and g 2 with the coefficients of variation and skewness, therefore these three parameters were related to the internal parameters of the basin. The coefficient k 2 does not have a significant correlation with any coefficient, which makes sense, since the parameters k 2 and g 2 are associated with the squared nature of the process (see Equation (17)). Yet the expected value, to which the parameter k 2 is linked, does not produce polynomials of an order larger than 1. For extreme values, the changes in the coefficients could only be related to the parameter k 1 , which suggests that they are less influenced by the fluctuations and more by direct changes to the kernel of the system.

Model application
The calibrated model was applied to the observed data for     ism, although it was found that the model was extremely sensitive to initial guesses. In a possible next step, a preoptimization procedure could be applied to find more suitable initial guesses that provide better results.
The results of optimization and application of data to the model indicated very well that it can be used for river basins with a natural rainfall-runoff cycle. For all of the basins that rely on this characteristic, it was possible to calibrate the model and simulate future scenarios. The only exception was the Brisbane basin, where not even a calibration was possible. This implied that the model works well for natural basins, but cannot be applied to regulated ones, since the discharge data do not respond to the natural variations of the system anymore.
The proposed methodology of inverse modelling proved to be a powerful tool to effectively implement the studied problem. This way, the task of modelling a complicated physical process could be achieved without initially knowing its detailed structure. One conclusion is that this technique can most likely be applied to numerous other hydrological modelling problems, for example, in the assessment of hydrological risks, the study of hydrometeorological variability or further topics related to global climate change. Finally, the results of simulation of future scenarios showed that it is possible to simulate alterations for changes in the basin's external and internal parameters and produced one key indication. It became evident that the changes to internal basin parameters have a bigger impact on the behaviour of the runoff process than those to external parameters, especially for extreme events.
These results and the fact that a large number of changes to internal parameters are caused by human activity reiterates the fact that it is up to us to control future extreme events by better understanding the internal mechanisms of river basins and taking the right steps to control our impact on them. It became apparent that some impact is caused by changes in external parameters, for example, precipitation changes which might be attributed to the influence of global climate change, but they are not as strong.
To completely understand all of the mechanisms driving the alterations in the behaviour of extreme events, further investigations are necessary that focus more in-depth on the topic. Especially studies on the relationship between the drift and diffusion parameters and the basin's land use parameters as well as the geomorphometric characteristics should be encouraged, since no clear allocation of the model parameters to those internal parameters of the basins could be made in this study.