Probabilistic analysis of the controls on groundwater depth using Copula Functions

Groundwater is an essential water resource in the Yarkant River Basin Irrigation District, which is the largest oasis in Xinjiang, China. This study used a novel approach to analyze the relationship between groundwater depth and three driving factors by developing Copula Functions from monthly time series data collected from 16 monitoring wells. More speci ﬁ cally, marginal distribution and joint distribution functions were established, and the conditional probabilities for three data ranges were calculated using both two- and three-dimensional Copula Functions. The developed statistical models showed that groundwater exploitation, runoff, and surface water irrigation signi ﬁ cantly affected groundwater depth. The most signi ﬁ cant effect on water table declines was associated with groundwater exploitation and lagged 1-month behind the groundwater withdrawals. The amount of runoff and irrigation were both inversely related to water table depth due to groundwater recharge. Frank Copula Functions were found to produce the best ﬁ t to the joint distribution of the variables and were used herein, allowing for the establishment of a detailed probability distribution of the change in groundwater depth under the combined effect of multiple controlling factors. The results provided key insights into how irrigation and groundwater withdrawals can be adjusted to effectively manage groundwater resources.


INTRODUCTION
The Yarkant River is the longest tributary to the Xinjiang Tarim River, which is the largest inland river in China (Chen et al. ). The basin is located in central Asia and possesses a temperate continental arid climate (Sun et al. ; Zhang et al. ) that is located between the Taklimakan Desert and the Bulgari and Togak Deserts. Although the Yarkant River Basin is subject to low and irregular rainfall, high temperatures and evaporation, as well as notable periods of drought (Duethmann et al. ), it possesses a desert oasis consisting of fragile ecosystems. It also possesses the Yarkant River Basin Irrigation District (YRBID), which is the largest agricultural irrigation district in Xinjian.
The irrigation district is the most important area in China for the production of high-quality grain, cotton, and fruits (Chen et al. ), and agricultural water consumption is significant (Deng et al. ). The basin is a multi-ethnic area dominated by the Uighurs, and the minority population accounts for 89.85% of the total population of the basin.
The agricultural population represents an important proportion of the population.
The population and cultivated land area within the Yarkant River Basin have increased significantly in recent years and have consequently increased stress on its water resources and environment. In fact, shortages in water resources have severely restricted urban and agriculture development in the area (Zhang et al. ). To help alleviate these shortages, groundwater has become an important source of water (Zhang et al. ; Xiao et al. ).
Previous research has shown that groundwater resources can help maintain the social and economic development of the oasis, by offsetting the water requirements for agricultural irrigation and domestic use (Zhou & Li ).
However, the increased exploitation of groundwater in the area has accelerated the rate at which the depth to groundwater has increased (Wang et al. ). Moreover, research has shown that fluctuations in groundwater levels may affect hydrological processes in the entire basin, causing changes in surface water and groundwater interactions (e.g., groundwater recharge and discharge), as well as ecosystem dynamics (Mansuer et al. ). Thus, groundwater is an essential domestic, agricultural, and ecosystem resource that must be effectively managed (Du et al. ; Feng ).
During the latter part of the 20th century, the protection and sustainable utilization of groundwater resources in arid areas has gradually become a major concern (Mogaji et al.

). A few previous groundwater resource analyses in the
Yarkant River Basin attempted to document the spatial distribution and causes of groundwater declines. These analyses were based on (1) the use of Linear Regression Methods applied to measured time series data of the changes in groundwater (water table) depths (Chen et al. ), (2) the use of the Mann-Kendall method to analyze the temporal and spatial evolution of regional groundwater systems (Shataer ), (3) the use of the SWAT model to predict changes in the depth to the groundwater table and to analyze the dynamic response of regional groundwater under climate change (Liang ), (4) the use of Monte Carlo type algorithms to predict changes in groundwater dynamics (Gulupiya b), (5) the use of Wavelet analyses to determine changes and trends in water table depths (Liu et al. ), and (6) the calculation and analysis of the groundwater levels required to maintain sound ecological conditions in the area (Gulupiya a). All of these studies mainly focused on annual and/or interannual variations in the depth to groundwater and helped investigators define and predict fluctuations in water table levels through time. However, most of these investigations result in a point prediction in groundwater levels, rather than providing the probability of the range in changes in water table depths using specific statistical distributions. From the point of view of rational groundwater resources management, it is important to clarify the probability relationships between groundwater depth and the factors that determine the probability distribution of groundwater depth.
The Copula Function is an expression that links the joint cumulative distribution function of variables with the marginal cumulative distribution function of variables to explore the correlation between variables (Genest & Favre ). Univariate hydrological analyses cannot fully characterize the multiple controls on hydrological events, including changes in groundwater depth. Thus, the study of the frequency at which hydrological events occur by analyzing the joint distribution of multiple controlling variables has become an important focus of hydrological research (Schumann ).
Since the first application of the Copula Functions to the field of hydrology in 2003 (Michele ), they have been broadly used in hydrology to gain insights into, as well as model the relationship between, two or more parameters.
Their application includes the analysis of precipitation Theoretically, groundwater depth and its controlling factors can also be analyzed using Copulas Functions, thereby providing a probability perspective of the correlative structure between the depicted variables. However, due to the unique nature of the groundwater Herein, this paper expands on this earlier work by developing an explicit probabilistic Copula model of the structure of groundwater depth in an arid area, while considering more than two driving factors. More specifically, the objective of this study was to characterize the variations in  boundary of Jiashi County. The Kunlun Mountains are located to the south and north of the district. Within the study area, the average annual precipitation measures about 53.14 mm, whereas the annual potential evaporation is much larger, measuring about 2,196 mm. The annual average temperature is about 12 C, which is typical of an inland arid climate. Bachu County is located to the north of the Yarkant River irrigation district, whose change of groundwater depth is largest in the Yarkant River irrigation area.
Agriculture, industry, and household water are most dependent on groundwater resources in Bachu County.
Agriculture is the main economic activity in the region, and agricultural production depends on irrigation. Water for irrigation is derived from a combination of wells (groundwater) and canals. The irrigation area seldom produces runoff. Rather, surface water resources mainly come from snowmelt in the surrounding mountains. The uncertainty in surface runoff has led to overexploitation of groundwater resources. In addition, data show that cultivated areas within the irrigation district increased by 7.015 × 10 4 ha, while population increased by 7.784 million in the past decade. The ratio of groundwater use to total water use increased from 7.3% to 18.4% from 2006 to 2017. The ratio between surface water irrigation (SI) use and groundwater exploitation (GE) amount in the study area is shown in Figure 2(e). Groundwater exploitation, surface water irrigation, and runoff referred to GE, SI, and R, respectively.

Data
The measurements of groundwater depth used in the paper Monthly data series were used in this study; the monthly averaged data of groundwater depth, groundwater exploitation, surface water irrigation, and runoff were collected by and obtained from the Kashgar Hydrology Bureau and the Yarkant River Basin Authority. Figure 2 shows the temporal variations in these four variables over the monitoring period. Since the study area is in an arid region where the annual precipitation (∼50 mm) is much less than the evaporation (∼2,300 mm/year), groundwater recharge by precipitation is negligible. Therefore, only groundwater exploitation, surface water irrigation, and runoff (primarily from snowmelt) were considered as controlling factors for the groundwater depth analysis.

Groundwater depth and its controlling factors have complex
interrelationships, especially for short time intervals. In order to establish a proper Copula Functions between them, data preprocessing is needed to meet the required random and steady state. The procedure used to develop two-and three-dimensional (2D and 3D) Copula Functions is displayed in Figure 3 and described in detail below.

The Kendall rank correlation coefficient method
The Kendall rank correlation coefficient, commonly referred to as the Tau coefficient, is a non-parametric test used to measure the relationship between variables. It is used in this paper to test for hysteresis relationships between the groundwater depth and the potential driving factors.
Specifically, for the time series of groundwater depth, where X 0 ¼ (x 0 (t)), t ¼ 1, 2, . . . , n, different time lags were assigned to the time series of the controlling factors X i , such that ε ¼ 0, 1, 2, . . . , k. The time series of the controlling factors with time lags can be obtained, where The Kendall rank correlation coefficient method with time lags is expressed as follows: where Τ ranges between À1 and 1. When Τ is 1, the groundwater depth is completely consistent/varies with the driving factor, which is considered as a random variable. When Τ is À1, the groundwater depth varies indirectly with the driving factor. When Τ is 0, the coefficient indicates that groundwater depth and the driving factor are independent of each other.

The ARIMA model
The steady state of time series data is the premise of establishing Copula Function. The ARIMA model can transform a non-stationary time series into a stationary The equation used in the ARIMA model of order In Equation (2), φ(B) and θ(B) are polynomials of degree p and q, respectively: Through parameter evaluation, the optimal results of parameter d, p, and q used in the ARIMA model were 2, 0, and 14, respectively. In this study, the parameters of the ARIMA model were determined based on the information criterion function method (AIC and BIC) to obtain the optimal values of p and q (Chen ).

2D Copula Function
Copula Functions were firstly used by Sklar in the 1950s (Sklar ). The Copula Function models the nonlinearity, symmetry, or asymmetry of the dependence structure of the variables (Songbai 2016). It can be used to construct a multidimensional joint distribution function by using a marginal distribution and correlation structure (Salvadori & De Michele ). Copula methods represent one approach from a cohort of multivariate methods that are widely used to model the dependence structure of two (or more) random variables. Sklar's theorem states that when x 1 , x 2 , x 3 , . . . , x n are random variables, and function of them, F is the joint probability distribution function of the n-dimension. Therefore, there must exist a Copula Function C, which makes the following formula true: where C represents the Copula Function. The Copula C captures the dependence among the variables, irrespective of their marginal distributions. If the marginal distribution  Table 1.

3D Copula Function
Vine Copula (with Archimedean pairs) was selected to con- The Vine Copula structure is established by combining groundwater depth with any two driving factors. The joint distribution function of the 3D Copula Function under the C Vine Copula decomposition structure is given as follows: In the above expression, each Vine Copula Function can be decomposed into the product of a 2D Copula Function and an edge distribution function. The unknown parameters in the structure are generally estimated by the maximum likelihood method (Brechmann & Schepsmeier ).

Establishing a marginal distribution function
Before constructing a Copula Function, the marginal distribution of multivariate variables must be calculated. The AIC was calculated as follows: where n is the number of samples, RSS is the sum of residual squares, and m is the number of parameters.

Conditional probability distribution
For 2D variables, the marginal distribution functions of X and Y are expressed as follows (You et al. ): where X is the times series of controlling factors, Y is the times series of groundwater depth, and F is a marginal distribution of the time series. The conditional probability of event occurrence is given in the following equation: The 3D conditional probability is calculated as follows:  where X and Y are the times series of controlling factors, and Z is the times series of groundwater depth.

RESULTS AND DISCUSSION
Correlation between groundwater depth and the driving factors  (Table 3). When the groundwater depth was set to a lag of 1 month, there was a significant correlation with groundwater exploitation (p < 0.05). When the lag time was 0, the statistical significance level of the correlation coefficient between runoff, irrigation and groundwater depth ranged between 0.05 and 0.1. This lack of significant correlation suggests that there is no obvious lag effect between them in the area. Since the groundwater monitoring wells were not used for groundwater exploitation, it is reasonable to assume that there was a 1-month lag between groundwater exploitation and a change in groundwater depth at the monitoring wells. Therefore, this study used GE þ1Mo (indicates that groundwater exploitation occurs 1 month (Mo) ahead (þ) of the changes in groundwater depth), SI, and R time series data as driving factors to analyze the probability distribution of groundwater depth.

Stability test of driving factors and groundwater depth
As noted above, Copula models require that the time series

Establishing marginal distribution functions
Selecting an appropriate marginal distribution function for each variable is required to effectively construct the 2D Copula joint distribution function. Marginal distributions were preliminarily constructed by 20 empirical distribution functions (Table 2). Subsequently, Chi-square tests were applied to estimate parameters of the marginal distribution function and assess the goodness-of-fit. Twenty probability distribution functions were selected as candidates to develop the marginal functions for groundwater depth (GD) and the driving factors. The optimal distribution types of marginal functions for GD, GE þ1Mo , SI, and R Note: GE is groundwater exploitation, SI is surface water irrigation, R is runoff, GD is groundwater depth, lag0 À lag6 is driving factor lagging 0 À 6 months, respectively. **p < 0.05, *p < 0.1.
were the normal distribution, logistic distribution, t locationscale distribution, respectively. The Chi-square test showed that the marginal distribution functions of GD, GE þ1Mo , SI, and R all passed the 0.05 significant level (Figure 7).

2D Joint distribution functions
After determining the appropriate marginal distribution function for groundwater depth and its driving factors, 2D  Copula Functions were constructed between them. Groundwater depth (X ) and three driving factors (Y ) were arranged in ascending order, and then, data pairs were selected from  (Table 4) and is suitable to construct the joint distributions between GE þ1Mo -GD, SI-GD, and R-GD pairs.
The joint probability distributions between groundwater depth and three driving factors were examined using QQ plots ( Figure 8). The correlation coefficient (R 2 ) of joint dis-

2D Conditional probabilities
Conditional probabilities were developed by initially dividing the variable values for depth of groundwater below the surface, groundwater exploitation, surface water irrigation, and runoff into three ranges. Subdividing the data limited the effect of range length on the constructed probabilities.
As previously noted, there is a 1-month lag between the timing of groundwater exploitation and groundwater depth.
Thus, GE þ1Mo was used to analyze the probability distribution of the variations in groundwater depth. The probability distribution of groundwater depth under the influence of groundwater exploitation is shown in Table 5.
Regardless of the impacts of the other factors, when the volume of groundwater exploitation was in the maximum range (3,355.41 × 10 4 -5,033.12 × 10 4 m 3 ), the probability that groundwater depth was in the maximum range (4.78-5.49 m) reached a maximum value (0.53) after a 1-month lag time. Moreover, the larger the groundwater exploitation, the higher the probability that groundwater depth would be in the maximum depth range. Similarly, the smaller the groundwater exploitation, the smaller the probability that groundwater depth was in the maximum range of depth  Note: GEþ1Mo is groundwater exploitation (10 4 m 3 ), GD is groundwater depth (m), P is probability. Note: SI is surface water irrigation (10 4 m 3 ), GD is groundwater depth (m), P is probability. values (0.13). This analysis illustrates the strong positive dependency of groundwater depth on groundwater exploitation. Table 6 shows the probability distribution of groundwater depth under a change in surface water irrigation.
When the volume of surface water irrigation was within the maximum range ( The probabilities that the groundwater depth was within a specific range given a defined range of groundwater exploitation and runoff are shown in Table 8. Given the mutual effect of runoff and groundwater exploitation on groundwater depth, the first line of Table 8 indicates that when runoff is in the range of 0-6.45 m 3 , and the groundwater exploitation is in the range of 0-1,677.71 × 10 4 m 3 during the previous month, the probability that groundwater  9 show the probability that the groundwater depth is within different ranges under the mutual effects of surface water irrigation and runoff. The correlations between runoff, surface water irrigation, and groundwater depth are negative. When both runoff and irrigation increase, groundwater depth is reduced. In contrast, the correlation between groundwater exploitation and groundwater depth is positive. When runoff and irrigation increase, along with exploitation, the depth to groundwater will also increase. The influence of groundwater exploitation on groundwater depth is greater than that of surface water irrigation and runoff. These relationships also agree with the conclusions derived from the 2D Copula analysis. In terms of the actual situation of the YRBID, the surface water resources of the region mainly occur in the form of ice Note: GEþ1Mo is groundwater exploitation (10 4 m 3 ), SI is surface water irrigation (10 4 m 3 ), GD is groundwater depth (m), P is probability. The application of multidimensional Copula analysis allows the interrelations, restrictions, and detailed changes that occurred through time in the multivariate groundwater system to be more fully understood. Moreover, the probability that groundwater depth was in a defined range could be assessed. The analysis provided critical information on the probability of how the depth of the groundwater table would change in response to selected driving factors. The results provide a power management tool in that it provides insights into how groundwater exploitation and irrigation can be adjusted to control groundwater levels in the area.
For example, in the future, we will examine the ecological impacts of changing groundwater levels in the area, and establish Copula Functions to estimate the appropriate level of groundwater exploitation given a specific runoff volume and amount of irrigation. Similarly, if a certain amount of groundwater exploitation is planned, it can be determined if groundwater levels will remain appropriate given the amount of surface irrigation.
This study used 8 years of monthly monitoring data from 16 wells in the area, and these data can be used to assess monthly and seasonal changes in groundwater depth. In future research, the data can be further expanded to obtain a more accurate functional probability relation. Furthermore, although this investigation pertains to the arid oasis, the approach could be used in other semi-arid to arid regions consisting of similarly structured groundwater flow systems. The analyses showed that Copula Functions are an effective method to analyze the probability of changes in groundwater levels in an arid oasis, and the response of groundwater depth to multiple driving factors. Specifically, this study found that in YRBID, Frank Copula Functions produced the best fit for generated joint distribution functions between groundwater depth and its driving factors. A 3D-Vine Copula Function was proposed for the first time in groundwater analysis. The results showed that groundwater exploration is the main factor leading to changes in groundwater depth, whereas the influence of runoff and surface water irrigation on groundwater depth was relatively small.

CONCLUSIONS
Observations from the Irrigation District suggest that groundwater depth has become deeper in recent years than is typical. The model generated here can be used as a management tool to assess changes in groundwater depth as a function of groundwater exploitation, irrigation, and runoff. For example, the data could be used to adjust the volume of surface water irrigation during a given amount of groundwater exploitation to maintain groundwater levels within a suitable (defined) range.
This study provides an effective research method for the probability analysis of groundwater depth in irrigated areas of an arid region, serving as an effective theoretical and technical approach to the sustainable utilization and management of groundwater resources in a basin.