Water resources in China, especially in the major basins of the Songhua and Yangtze rivers, are characterized by uneven distribution both temporally and spatially, leading to notable challenges in per capita water availability. In this study, we investigate the ensemble probability distribution of annual runoff over the past 70 years in two of China's major watersheds: the Songhua River and the Yangtze River. By dividing each basin into several regions from upstream to downstream, based on annual mean discharge as a proxy for annual runoff, we observed a significant correlation between the annual runoff of the control sections and those of the upstream and downstream regions in these large watersheds. Consequently, this study establishes the probability distribution of annual runoff for each region within the basins, anchored on the design annual runoff of the control sections, using the joint bivariate logarithmic normal distribution of two interrelated random variables. Furthermore, we developed a relationship between the conditional probability distribution and correlation diagram data, and a regression equation correlating regional annual runoff with the watershed control section's runoff. This investigation into the historical patterns of runoff over the past 70 years provides a comprehensive understanding of the dynamics in these critical watersheds.

  • Time series up to 70 years from 1950 to 2023.

  • The study area is two large watersheds in China: the Songhua River and the Yangtze River.

  • The results of this study offer a foundational reference for future water resource management strategies and planning in China.

Water resources in China are unevenly distributed in both time and space, and per capita water resources are limited (Jiang 2009; Niu et al. 2022; Wu et al. 2023). With the rapid development of social economy, the contradiction of water supply, ecology, and other water use between upstream and downstream has been increasingly prominent in many areas. In recent years, water allocation has been carried out on a yearly basis in the basin to solve the water problem. The long-term variation of runoff demonstrates significant uncertainty and can be regarded as a stochastic process due to its pronounced long-term auto-correlation. This behavior, known as long-term persistence, is evident across paleoclimatic, climatic, and annual scales (Pizarro et al. 2022; Guo et al. 2024). Therefore, the probability distribution of future runoff is often used as the basis for water resources planning and allocation and flood control regulation. Due to the uneven spatial distribution of rainfall-runoff, the probability distribution of annual runoff may be different in different regions of a watershed, especially in large basins. For the water allocation of the whole basin, regional water transfer coordination should consider the different probability distributions of annual runoff in each region of the basin, that is, the probability combination problem.

The commonly used methods for analyzing the probability combination of runoff or the joint probability of different elements in different regions (rivers) include multivariate normal probability distribution, which has been used to analyze the probability combination of flood peak and flood volume (Morán-Vásquez et al. 2022, 2023), multivariate joint probability distribution derived from the marginal probability of the logarithmic normal distribution of runoff (flood) in different regions (rivers) (Hangshing & Dabral 2018; Liu et al. 2018; Latif & Mustafa 2020; Zhong et al. 2021), and two-dimensional normal distribution, which is used to calculate the coincidence probability of basin design flood (Chen et al. 2012; Jianping et al. 2018), the multivariate joint probability distribution of annual runoff in different watersheds derived by converting runoff into one-dimensional variables (Wang et al. 2009; Jiang et al. 2017), and the copula function applied to the multiple joint probability distribution of the hydrological analysis (Peng et al. 2017; Guo et al. 2018; Lilienthal et al. 2018; Liu et al. 2020; Xiang et al. 2020; Latif & Mustafa 2021; Wen et al. 2022). Koutsoyiannis et al. (2008) conducted a comprehensive comparison of stochastic and deterministic methods for medium-range flow prediction in the Nile River. These methods mostly solve the runoff (flood) problems encountered by different regions (rivers) or different joint probability elements.

However, the most common problem of the probability combination of annual runoff in basin-wide water allocation is the corresponding situation of runoff in different regions of the basin, that is, the probability combination of total runoff and runoff in each region of the basin when the runoff with design probability occurs at the control section of the basin. In the conventional design flood, there were two methods to deal with this problem: the typical year method and the regional composition method with the same probability (Jing et al. 2020; Xiong et al. 2020). In the typical year method, several representative years with unfavorable runoff conditions for regulation are selected from the measured data. The corresponding design runoff for each region is then calculated based on the runoff composition proportion of each region and the design runoff at the basin control section. This method assumes that the selected representative years adequately reflect the overall hydrological behavior of the basin, including the concept of concentration time (Giandotti 1934; Grimaldi et al. 2012). The regional composition method, on the other hand, assumes that each sub-region within the basin has the same probability of flood occurrence as the control section of the entire basin. Using this assumption, the corresponding runoff for each sub-region is calculated based on the water balance principle. This method provides a detailed analysis by considering the contributions and hydrological characteristics of each sub-region. Both methods are comparatively reliable and suitable for flood–prevention design, as they address different aspects of hydrological analysis and management.

The runoff at the basin control section is an aggregation of the runoff from different regions within the basin. In recent years, numerous studies have focused on the spatial correlation among flows in sub-basins. For instance, Grimaldi et al. (2012) explored the stochastic generation of spatially coherent river discharge peaks for continental event-based flood risk assessment. This study highlights the importance of understanding and predicting the spatial distribution of river discharge during large-scale flood events. Additionally, our research provides new insights and methods by analyzing the spatial correlation among sub-basins, further enhancing the understanding of complex hydrological processes within basins. At the same time, there is also a correlation between the runoff of the basin control section and the runoff of each region in the basin (Yang et al. 2022). Therefore, when the runoff of a basin control section is a specified design value, the runoff of each region in the basin may be randomly distributed in a relatively small range. Under this condition, the mean square deviation of the probability distribution is less than the corresponding value of the measured regional runoff data series in the basin (Fischer & Schumann 2021). This paper studies the probability combination problem of annual runoff in a river basin, which is the most commonly used index in water resources planning and allocation (Cai et al. 2021). This paper attempts to calculate the probability distribution of annual runoff of each region in the basin under the condition that the design value of the basin control section is given.

The probability distribution of annual runoff of each region in a given basin is derived from the two-dimensional joint probability distribution of two related random variables. The functions used for joint probability distribution are mainly the logarithmic normal distribution, exponential distribution, Gumbel function, and P-III function (Nerantzaki & Papalexiou 2022). The logarithmic normal distribution is a function that has been proved to be suitable for runoff probability distribution by long-term practices (Vivekanandan & Srishailam 2021; Lee et al. 2022; Scala et al. 2022). Previous research has extensively explored the dynamics of water resources, particularly in the context of annual runoff and its probability distribution (Herman et al. 2020; Zhu et al. 2020; Gao et al. 2021; Hu et al. 2021; Zhou et al. 2022). These studies laid the groundwork for understanding the intricacies of water allocation in large basins. However, as the challenges of water distribution grow more complex with increasing demand and climatic uncertainties, there is a pressing need for refined methodologies that consider both spatial and temporal variations. The present study builds on this foundation by offering a nuanced analysis of the Songhua River and Yangtze River basins, integrating the probability distribution of annual runoff across different regions of the basin. Such an approach not only provides a comprehensive understanding of the basins' water dynamics but also offers practical insights for policymakers and stakeholders in water resource management.

In comparison, small basins are more particularly susceptible to the same weather and rainfall events, while the correlation of annual runoff in different regions in the basin is generally better than that in large basins. Therefore, the Songhua River and Yangtze River basins with large drainage areas are selected as the study areas in this paper. The Songhua River basin is located at 41°42′–51°48′ north latitude, with a drainage area of 556,800 km2, a temperate continental monsoon climate, and an average annual precipitation of 525 mm. Jiamusi hydrological station, with a drainage area of 528,277 km2, is the outlet of the basin, and basically controls the runoff of the whole basin. The Yangtze River basin is located at 24°30′–35°45′ north latitude, with a drainage area of 1,800,000 km2, a temperate to subtropical continental monsoon climate, and an average annual precipitation of 1,067 mm. Datong hydrological station, with a drainage area of 1,705,383 km2, is the outlet of the basin, and basically controls the runoff of the whole basin.

Due to the large area of the Songhua River and Yangtze River basins and their complex tributaries, the key hydrological stations along the main stream are divided into control points from upstream to downstream. For example, the runoff from Yichang to Hankou of the Yangtze River mainly includes the runoff of the tributaries Hanjiang River and Dongting Lake. Three key hydrological stations, namely Jiangqiao, Harbin, and Jiamusi, on the main stream of the Songhua River are selected as control nodes. The basin is divided into three regions from upstream to downstream, namely above Jiangqiao, from Jiangqiao to Harbin, and from Harbin to Jiamusi. The annual runoff probability combination of the three regions and the control section of the Jiamusi River basin is analyzed, and the conditional probability distribution of annual runoff of the three regions under the given runoff values of the Jiamusi River basin is calculated. With Cuntan, Yichang, Hankou, and Datong as the control nodes, four key hydrological stations in the main stream of the Yangtze River are selected. From the upstream to the downstream, the basin is divided into four regions: above Cuntan, from Cuntan to Yichang, from Yichang to Hankou, and from Hankou to Datong. The locations of the four hydrological stations are shown in Figure 1. The annual runoff probability combination of these four regions and the basin control section is analyzed, and the conditional probability distribution of annual runoff of these four regions under the given runoff value of Datong is deduced.
Figure 1

(a) Overview map of the study area. (b) The Songhua River basin and its main channels. (c) The Yangtze River basin and its main channels.

Figure 1

(a) Overview map of the study area. (b) The Songhua River basin and its main channels. (c) The Yangtze River basin and its main channels.

Close modal

The data series of the Songhua River basin from the year 1954 to 2023 and the Yangtze River basin from the year 1950 to 2023 are used. The hydrological year of the Songhua River basin from May 1 to April 30 of the next year, and the hydrological year of the Yangtze River basin from April 1 to March 31 of the next year are adopted. The annual mean discharge is used to represent the annual runoff.

The control nodes are Jiangqiao, Harbin, and Jiamusi hydrological stations in the Songhua River basin, and Cuntan, Yichang, Hankou, and Datong hydrological stations in the Yangtze River basin. The annual mean discharge series of the region between two stations is obtained by subtracting the annual mean discharge of the downstream station from the annual mean discharge of the upstream station. The statistical characteristic values of annual runoff in the three regions and control sections of the Songhua River and the four regions and control sections of the Yangtze River are shown in Table 1.

Table 1

Regional characteristic statistics of the Songhua River and Yangtze River basins

BasinStation (region)Area (km2)Mean annual runoff (m3/s)Mean square deviationSkewnessKurtosisLag1AutocovLag10Autocov
Songhua River Above Jiangqiao 162,569 669 327 1.05360 4.2429 20,546 6,577.6 
Jiangqiao–Harbin 227,200 640 255 0.20032 1.9661 13,257 7,161.9 
Harbin–Jiamusi 138,508 753 319 0.48030 2.7730 36,529 −7,802 
Jiamusi 528,277 2,062 732 0.38642 2.5241 182,960 −15,506 
Yangtze River Above Cuntan 866,559 10,851 1,291 0.17784 2.8389 245,570 313,310 
Cuntan–Yichang 138,942 2,626 632 0.00024 2.7037 86,452 −13,614 
Yichang–Hankou 482,535 8,840 1,567 0.08571 2.7008 86,508 −22,868 
Hankou–Datong 217,347 5,922 1,572 0.58998 3.2594 508,380 −496,440 
Datong 1,705,383 28,239 3,745 0.43899 4.1316 1,245,900 150,300 
BasinStation (region)Area (km2)Mean annual runoff (m3/s)Mean square deviationSkewnessKurtosisLag1AutocovLag10Autocov
Songhua River Above Jiangqiao 162,569 669 327 1.05360 4.2429 20,546 6,577.6 
Jiangqiao–Harbin 227,200 640 255 0.20032 1.9661 13,257 7,161.9 
Harbin–Jiamusi 138,508 753 319 0.48030 2.7730 36,529 −7,802 
Jiamusi 528,277 2,062 732 0.38642 2.5241 182,960 −15,506 
Yangtze River Above Cuntan 866,559 10,851 1,291 0.17784 2.8389 245,570 313,310 
Cuntan–Yichang 138,942 2,626 632 0.00024 2.7037 86,452 −13,614 
Yichang–Hankou 482,535 8,840 1,567 0.08571 2.7008 86,508 −22,868 
Hankou–Datong 217,347 5,922 1,572 0.58998 3.2594 508,380 −496,440 
Datong 1,705,383 28,239 3,745 0.43899 4.1316 1,245,900 150,300 

In this paper, the important hydrological stations in the Songhua River and Yangtze River basins were selected to analyze the correlation between the annual runoff between two adjacent stations and the annual runoff of the control sections, and the empirical cumulative probability of the two control sections was fitted with the log-normal distribution function to find out which discharge corresponded to 50%, 75%, and 90% of the cumulative probability, and the corresponding virtual runoff series was calculated under the condition of the above discharge (Figure 2).
Figure 2

Flowchart of the study.

Figure 2

Flowchart of the study.

Close modal

The probability combination relationship of annual runoff in the basin is analyzed using the annual mean discharge of the hydrological year to represent the annual runoff. The control section of the basin represents the runoff of the whole basin, and the annual runoff of the control section is an important indicator for the planning and allocation of water resources in the basin. Therefore, the probability combination relationship between the annual runoff of each region in the basin and the annual runoff of the basin control section is established. According to the correlation between the annual runoff of each region in the basin and the annual runoff of the basin control section, the probability distribution of annual runoff of each region in the basin is deduced when the annual runoff of the basin control section is a given design value.

The analysis starts from the correlation between the annual runoff of each region in the basin and the control section of the basin. The runoff of the basin control section is collected by the runoff in each region of the basin. Runoff in different regions of the basin is often formed by rainfall in the same weather system. Therefore, the annual runoff of each region in a basin should have a certain correlation with the control section of the basin.

For the two data series X and Y with linear correlation, the linear regression equation can be established through the least square method.

In hydrological probability statistics, the formula for calculating the empirical expectation cumulative probability value of runoff based on a measured data series is as follows:
(1)
where n is the length of the data sequence; i is the serial number of the data sequence from large to small; and P(xi) is the probability of exceeding the ith value after sorting the data sequence from large to small.

In order to obtain the runoff value of the specified design standard probability, the empirical probability calculated by Equation (1) also needs to adapt to the probability of the theoretical distribution function curve, such as the logarithmic normal distribution as well as the others.

According to the characteristics of the normal distribution, the common value of the logarithmic normal distribution, the value of P(x) = 50%, is the mean value of the lnX sequence (El). From hydrological probability analysis experience, it is known that the common value of the runoff sequence in China (P(x) = 50%), the runoff value eEl corresponding to El, is less than the mean value of the runoff sequence x.

Based on the two-dimensional joint probability distribution of two related random variables, the probability distribution of annual runoff of each region in the basin under the condition of a given annual runoff value of the basin control section is deduced. Since the log-normal distribution is a function that has been proved to be applicable to the probability distribution of river runoff in China through long-term practice and a general analytical form can be obtained, then the two-dimensional logarithmic normal distribution function is selected to derive the conditional probability distribution formula of regional annual runoff in the basin. The conditional cumulative probability distribution of the logarithmic normal function of variable y under the condition of x = x* is:
(2)
Equation (2) is a one-dimensional logarithmic normal distribution. The mean of the conditional probability distribution of lny is:
(3)
The mean square deviation is:
(4)

If there is correlation between the lnX and lnY series, and the correlation coefficient R > 0, then the mean square deviation of the conditional probability distribution of lny is less than the mean square deviation of the lnY series, . If there is no correlation between the lnX and lnY series, that is, R = 0, then the mean and mean square deviation of the lny conditional probability distribution are equal to the mean and mean square deviation of the lnY series, respectively, that is, and . Therefore, the conditional probability distribution of the lny series is equivalent to the conventional probability distribution of the lny series.

The correlation coefficient R between the annual runoff series (X) of the basin control section and the annual runoff series (Y) of each region is calculated, and the regression equation is established. It is also necessary to establish the regression equation between the logarithmic annual runoff series (lnX) of the basin control section and the logarithmic annual runoff series (lnY) of each region:
(5)
where al and bl are parameters.
Equation (3) for the mean of the conditional probability distribution of the logarithmic annual runoff can be transformed into:
(6)
According to the theory of regression equations, the parameters of the annual runoff logarithmic correlation regression Equation (5) are and . Thus, there is:
(7)
Equation (7) shows that if the annual runoff of a given basin control section is substituted into the logarithmic regression Equation (5), the solution is the mean value of the logarithmic conditional probability distribution of regional annual runoff under the given annual runoff value of the control section of the basin. Then, we analyzed whether the mean square deviation () of the logarithmic conditional probability distribution of annual runoff in the region corresponds to the square root of the mean difference between the data points and the regression line (Equation (5)) in the correlation diagram of the regional logarithmic annual runoff and the logarithmic annual runoff of the control section. This value is named :
(8)

Thus, under the condition that the annual runoff value of the basin control section is given, the mean square deviation of the logarithmic conditional probability distribution of regional annual runoff () is equal to the square root of the mean difference between the data points and the regression line (Equation (5)) in the correlation diagram of the regional logarithmic annual runoff and the logarithmic annual runoff of the control section.

Based on the measured series, if a logarithmic series (lnY*) of the annual runoff of the virtual area in the same year as the measured series is created, and the difference between the logarithm of the annual runoff of the measured data and the output of regression Equation (5) is calculated, and the mean value of the logarithmic conditional probability distribution of regional annual runoff is added as each item of the series, that is:
(9)
to obtain the mean value of the series:
(10)
The mean value of the logarithmic series of virtual annual runoff is also equal to the mean value of the logarithmic conditional probability distribution of regional annual runoff. Since the regression line in Equation (5) passes through the center of the relevant data point group of the logarithmic annual runoff series of the regional and watershed control sections, the difference between the logarithm of the annual measured runoff and the regression Equation (5) is basically symmetrical on both sides of the regression equation line, that is, the mean value of the logarithm series of the virtual annual runoff is basically symmetrical on both sides, which should be a logarithmic normal distribution. From Equations (7) and (9), we obtain:
(11)
Therefore, the annual runoff series corresponding to the logarithmic virtual annual runoff series is:
(12)
The mean value of the annual runoff series corresponding to the logarithmic virtual annual runoff series can be regarded as the mean value of the conditional probability distribution of regional annual runoff within the basin under the given annual runoff value of the basin control section, which is:
(13)
The correlation coefficients of the annual mean discharge series and logarithmic series of annual mean discharge of the three regions of the Songhua River basin and the basin control section of the Jiamusi basin are shown in Table 2, and the correlation diagrams are shown in Figures 3 and 4. The correlation coefficients of the annual mean discharge series and logarithmic series of annual mean discharge between the four regions of the Yangtze River basin and the Datong control section are shown in Table 2, and the correlation charts are shown in Figures 5 and 6.
Table 2

Correlation coefficient of annual runoff of each region in the basin and the basin control section

RegionControl sectionAnnual runoff correlation coefficientLogarithmic correlation coefficient of annual runoff
Above Jiangqiao Jiamusi 0.83 0.84 
Jiangqiao–Harbin Jiamusi 0.76 0.73 
Harbin–Jiamusi Jiamusi 0.84 0.86 
Above Cuntan Datong 0.56 0.56 
Cuntan–Yichang Datong 0.58 0.56 
Yichang–Hankou Datong 0.87 0.87 
Hankou–Datong Datong 0.82 0.81 
RegionControl sectionAnnual runoff correlation coefficientLogarithmic correlation coefficient of annual runoff
Above Jiangqiao Jiamusi 0.83 0.84 
Jiangqiao–Harbin Jiamusi 0.76 0.73 
Harbin–Jiamusi Jiamusi 0.84 0.86 
Above Cuntan Datong 0.56 0.56 
Cuntan–Yichang Datong 0.58 0.56 
Yichang–Hankou Datong 0.87 0.87 
Hankou–Datong Datong 0.82 0.81 
Figure 3

Correlation charts depicting linear regression for the annual mean discharge series in the Songhua River basin.

Figure 3

Correlation charts depicting linear regression for the annual mean discharge series in the Songhua River basin.

Close modal
Figure 4

Correlation charts depicting linear regression for the logarithmic series of annual mean discharge in the Songhua River basin.

Figure 4

Correlation charts depicting linear regression for the logarithmic series of annual mean discharge in the Songhua River basin.

Close modal
Figure 5

Correlation charts depicting linear regression for the annual mean discharge series in the Yangtze River basin.

Figure 5

Correlation charts depicting linear regression for the annual mean discharge series in the Yangtze River basin.

Close modal
Figure 6

Correlation charts depicting linear regression for the logarithmic series of annual mean discharge in the Yangtze River basin.

Figure 6

Correlation charts depicting linear regression for the logarithmic series of annual mean discharge in the Yangtze River basin.

Close modal

The above figures and tables show that there is a significant correlation between the annual mean discharge in each region of the two basins and their control sections. The correlation of the annual mean discharge series and logarithmic series between the three regions of the Songhua River basin and the Jiamusi control section is relatively high, and the correlation coefficients are 0.76–0.84. The correlations of the annual mean discharge series and logarithmic series between the four regions of the Yangtze River basin and the basin control section Datong are that the correlation between the two regions above Yichang is relatively low, with correlation coefficients of 0.56–0.58, while the correlation between the regions below Yichang is relatively high, with correlation coefficients of 0.82–0.87. The correlation coefficients of the annual mean discharge series and logarithmic series in each region of the two basins and their basin control sections are almost the same.

The mean value () and mean square deviation () of the logarithmic series of annual mean discharge in the three regions of the Songhua River, the mean value () and mean square deviation () of the logarithmic series of annual mean discharge in the Jiamusi control section, the mean value () and mean square deviation () of the logarithmic series of annual mean discharge in the four regions of the Yangtze River basin, and the mean value () and mean square deviation () of the logarithmic series of annual mean discharge in the Datong control section are shown in Table 3.

Table 3

Statistical parameters of the logarithm of annual runoff of each region in the basin and the basin control section

Station (region)MeanMean square deviation
Above Jiangqiao 6.390 0.493 
Jiangqiao–Harbin 6.374 0.441 
Harbin–Jiamusi 6.528 0.460 
Jiamusi 7.566 0.376 
Above Cuntan 9.285 0.120 
Cuntan–Yichang 7.842 0.261 
Yichang–Hankou 9.071 0.182 
Hankou–Datong 8.652 0.266 
Datong 10.240 0.132 
Station (region)MeanMean square deviation
Above Jiangqiao 6.390 0.493 
Jiangqiao–Harbin 6.374 0.441 
Harbin–Jiamusi 6.528 0.460 
Jiamusi 7.566 0.376 
Above Cuntan 9.285 0.120 
Cuntan–Yichang 7.842 0.261 
Yichang–Hankou 9.071 0.182 
Hankou–Datong 8.652 0.266 
Datong 10.240 0.132 

The cumulative probability distribution of the series of annual mean discharge of Jiamusi and Datong control sections is shown in Figure 7, where the points are the empirical probability and the curve is the logarithmic normal distribution function. Additionally, the extreme tail of the runoff marginal structure should follow a Pareto distribution (Burr 1942; Dimitriadis et al. 2021). We fitted the Pareto–Burr–Feller (PBF) distribution to the annual average flow series at the Jiamusi and Datong control sections and compared it with the log-normal distribution. Both distributions fit the discharge series well, but the log-normal distribution provided a slightly better fit than the PBF distribution (Supplementary Figure S1). The low runoff probability index is commonly used in work on water resources. Therefore, the annual mean discharge () of ten control sections in Jiamusi and Datong with the cumulative probabilities of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, and 95% are extracted from the curve in Figure 7. The logarithmic conditional probability distribution parameters of annual mean discharge in each region of the two basins are calculated by Equations (3) and (4). The correlation coefficient (R) in the equations adopts the correlation coefficient of the logarithmic series of annual mean discharge. Table 4 shows the mean value of the logarithmic conditional probability distribution of annual mean discharge in the three regions of the Songhua River basin (), and the mean square deviation () is shown in Table 6. Table 5 shows the mean value of the logarithmic conditional probability distribution of annual mean discharge in the four regions of the Yangtze River basin (), and the mean square deviation () is shown in Table 6. Based on these parameters, the ten logarithmic conditional cumulative probability distributions of annual mean discharge of the seven regions in the two basins are obtained by Equation (2) when the annual mean discharge of the basin control section is the above ten given cumulative probability values.
Table 4

Mean value of logarithmic conditional distribution of annual runoff in each region of the Songhua River basin

Probability of annual runoff at Jiamusi station (%)Above JiangqiaoJiangqiao–HarbinHarbin–Jiamusi
95 5.709 5.847 5.881 
90 5.859 5.963 6.023 
85 5.961 6.043 6.120 
80 6.043 6.105 6.198 
75 6.111 6.158 6.262 
70 6.173 6.206 6.321 
65 6.231 6.250 6.376 
60 6.285 6.293 6.423 
55 6.337 6.333 6.477 
50 6.392 6.375 6.529 
Probability of annual runoff at Jiamusi station (%)Above JiangqiaoJiangqiao–HarbinHarbin–Jiamusi
95 5.709 5.847 5.881 
90 5.859 5.963 6.023 
85 5.961 6.043 6.120 
80 6.043 6.105 6.198 
75 6.111 6.158 6.262 
70 6.173 6.206 6.321 
65 6.231 6.250 6.376 
60 6.285 6.293 6.423 
55 6.337 6.333 6.477 
50 6.392 6.375 6.529 
Table 5

Mean value of logarithmic conditional distribution of annual runoff in various regions of the Yangtze River basin

Probability of annual runoff of Datong station (%)Above CuntanCuntan–YichangYichang–HankouHankou–Datong
95 9.175 7.603 8.810 8.300 
90 9.199 7.656 8.867 8.376 
85 9.216 7.691 8.906 8.429 
80 9.229 7.719 8.937 8.470 
75 9.240 7.744 8.964 8.507 
70 9.250 7.766 8.988 8.539 
65 9.260 7.786 9.010 8.569 
60 9.268 7.805 9.031 8.587 
55 9.277 7.823 9.051 8.625 
50 9.285 7.842 9.071 8.652 
Probability of annual runoff of Datong station (%)Above CuntanCuntan–YichangYichang–HankouHankou–Datong
95 9.175 7.603 8.810 8.300 
90 9.199 7.656 8.867 8.376 
85 9.216 7.691 8.906 8.429 
80 9.229 7.719 8.937 8.470 
75 9.240 7.744 8.964 8.507 
70 9.250 7.766 8.988 8.539 
65 9.260 7.786 9.010 8.569 
60 9.268 7.805 9.031 8.587 
55 9.277 7.823 9.051 8.625 
50 9.285 7.842 9.071 8.652 
Table 6

Mean square deviation of the logarithmic conditional probability distribution of annual runoff in each region of the basin

RegionMean square deviation
Above Jiangqiao 0.26731 
Jiangqiao–Harbin 0.30382 
Harbin–Jiamusi 0.23733 
Above Cuntan 0.09940 
Cuntan–Yichang 0.21733 
Yichang–Hankou 0.08899 
Hankou–Datong 0.15764 
RegionMean square deviation
Above Jiangqiao 0.26731 
Jiangqiao–Harbin 0.30382 
Harbin–Jiamusi 0.23733 
Above Cuntan 0.09940 
Cuntan–Yichang 0.21733 
Yichang–Hankou 0.08899 
Hankou–Datong 0.15764 
Figure 7

Empirical probability in relation to the logarithmic normal distribution function for annual mean discharge.

Figure 7

Empirical probability in relation to the logarithmic normal distribution function for annual mean discharge.

Close modal
The cumulative probabilities of 50%, 75%, and 90% of the annual mean discharge, which are commonly used in the design of water resources utilization of the basin control section, are selected. Figures 8 and 9 show the logarithmic conditional cumulative probability distribution curves of annual mean discharge in the three regions of the Songhua River basin and the four regions of the Yangtze River basin. In the meantime, the empirical cumulative probability and the conventional logarithmic distribution cumulative probability of the logarithmic annual mean discharge series in each region of the two basins are calculated.
Figure 8

Logarithmic conditional cumulative probability distribution curves for the annual mean discharge in the Songhua River basin.

Figure 8

Logarithmic conditional cumulative probability distribution curves for the annual mean discharge in the Songhua River basin.

Close modal
Figure 9

Logarithmic conditional cumulative probability distribution curves for the annual mean discharge in different regions of the Yangtze River basin.

Figure 9

Logarithmic conditional cumulative probability distribution curves for the annual mean discharge in different regions of the Yangtze River basin.

Close modal

As shown in Figures 8 and 9, the gradient of cumulative probability distribution of annual mean discharge under various regional conditions is steeper than that of the conventional cumulative probability distribution of annual mean discharge, that is, the greater the correlation between the annual mean discharge of each region in the basin and the control section of the basin, the steeper the conditional cumulative probability distribution of regional annual mean discharge. In addition, as the cumulative probability value of annual mean discharge of a given basin control section increases (the annual mean discharge decreases), the conditional cumulative probability distribution curve moves in a smaller direction, while the shape of the distribution curve changes little. This means that when the annual mean discharge of the control section of the basin is a given value, that is, the design probability value, the annual mean discharge of each region in the basin will be randomly distributed within a relatively small range. With the decrease of the given annual mean discharge of the basin control section, the mean value of the logarithmic conditional cumulative probability distribution of annual mean discharge of each region in the basin decreases, and the distribution curve will move to the smaller side approximately.

The empirical cumulative probability of the virtual logarithmic annual mean discharge series (lnY*) in each region of the Songhua River and Yangtze River basins is calculated by using Equation (1). Figures 8 and 9 show the plot and the cumulative probability of the logarithmic conditional distribution. For the seven regions of the Songhua River and Yangtze River basins, the empirical cumulative probability of the virtual logarithmic annual mean discharge series and the logarithmic conditional cumulative probability of the logarithmic annual mean discharge are well fitted. To test the feasibility of using the control section to predict sub-basin interval discharge, we used a copula function to predict the discharge for the Yihan interval and compared it with the observed data (Supplementary Figure S2). The results showed a high degree of fit, with a Nash–Sutcliffe efficiency coefficient reaching 0.77 (Supplementary Figure S3).

It can be seen from Equation (4) that since the annual runoff of each region in the basin is related to the annual runoff of the basin control section, the mean square deviation () of the logarithmic conditional probability distribution of annual runoff of each region is smaller than the mean square deviation () of the logarithmic series of the annual runoff of each region. By comparing the data in Tables 2, 3 and 6, it can be verified that the larger the correlation coefficient between them, the smaller the mean square deviation () of conditional probability distribution of the regional annual runoff. Therefore, for a given annual runoff value of the basin control section, the range of the random distribution of the regional annual runoff within the basin becomes smaller, so the slope of the conditional probability curve becomes steeper. Similarly, because the annual runoff of each region in the basin is positively correlated with the basin control section, if the annual runoff value () given by the basin control section decreases, the mean value () of logarithmic conditional probability distribution of annual runoff of each region in the basin also decreases, and the greater the correlation coefficient, the more it decreases. It can be seen from the logarithmic distribution characteristics in Equation (3) that when the cumulative probability of the annual runoff of the basin control section is 50%, the value , therefore , that is, the mean value of the logarithmic conditional probability distribution of the regional annual runoff of the basin is equal to the mean value of its logarithmic series (see Tables 35). With the increase of the cumulative probability value of annual runoff of the basin control section (the annual runoff value x* decreases), the mean value () of the logarithmic conditional probability distribution of annual runoff of each region in the basin gradually decreases (see Tables 4 and 5). Therefore, the annual runoff of the basin control section decreases, and the conditional cumulative probability distribution range (curve) of the logarithmic annual runoff of each region in the basin moves to a smaller direction. In addition, since the mean square deviation of the logarithmic conditional probability distribution of the regional annual runoff corresponds to the different annual runoff of basin control sections, the logarithmic conditional cumulative probability distribution curve of regional annual runoff within the basin shifts to the direction of low flow, and its shape is basically unchanged.

As mentioned above, given the annual runoff value of the control section of the basin, the mean value of conditional cumulative probability of logarithmic regional annual runoff () in the basin corresponds to a probability of 50%. This value is the mean value of the logarithmic regional annual runoff (lny) under this condition (), but not the mean value of the regional annual runoff (y) under such condition. Therefore, the mean value of regional annual runoff probability distribution in the basin under this condition is calculated by Equation (13). Intuitively, the annual runoff value of the basin control section can be substituted into the regression equation between the annual runoff of each region in the basin and the basin control section to calculate the mean value, that is:
(14)
Whether the mean value of conditional probability distribution of the regional annual runoff calculated by these two methods is consistent needs further verification. For the seven regions of the Songhua River and Yangtze River basins, when the annual runoff probability distribution of the control section of the basin is 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, and 95%, respectively, the mean values of the conditional probability distribution of the regional annual runoff calculated by Equations (13) and (14) are basically the same (see Tables 7 and 8). The mean value of the conditional probability distribution of regional annual runoff can also be calculated by integrating the conditional probability distribution density function:
(15)
However, the integral calculation is complex, which is not suitable for hydrologists engaging in the practical work of water resources management.
Table 7

Comparison of the two methods for calculating the mean annual runoff of the conditional probability distribution in each region of the Songhua River basin

Probability of annual runoff at Jiamusi station (%)Above Jiangqiao
Jiangqiao–Harbin
Harbin–Jiamusi
Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)
95 293 312 369 361 378 367 
90 348 363 409 405 434 424 
85 391 402 440 438 476 467 
80 428 436 467 467 513 504 
75 461 467 491 492 546 538 
70 493 497 514 516 578 571 
65 524 526 536 540 609 603 
60 556 555 559 563 641 635 
55 587 585 581 586 672 667 
50 621 618 606 611 706 703 
Probability of annual runoff at Jiamusi station (%)Above Jiangqiao
Jiangqiao–Harbin
Harbin–Jiamusi
Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)
95 293 312 369 361 378 367 
90 348 363 409 405 434 424 
85 391 402 440 438 476 467 
80 428 436 467 467 513 504 
75 461 467 491 492 546 538 
70 493 497 514 516 578 571 
65 524 526 536 540 609 603 
60 556 555 559 563 641 635 
55 587 585 581 586 672 667 
50 621 618 606 611 706 703 
Table 8

Comparison of the two methods for calculating the mean annual runoff of the conditional probability distribution in various regions of the Yangtze River basin

Probability of annual runoff at Datong station (%)Above Cuntan
Cuntan–Yichang
Yichang–Hankou
Hankou–Datong
Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)
95 9,744 9,701 2,071 2,049 6,748 6,724 3,957 4,063 
90 9,959 9,940 2,179 2,160 7,154 7,124 4,338 4,394 
85 10,110 10,104 2,255 2,238 7,439 7,407 4,606 4,632 
80 10,232 10,234 2,316 2,302 7,670 7,638 4,823 4,828 
75 10,342 10,351 2,371 2,359 7,878 7,847 5,018 5,008 
70 10,441 10,455 2,421 2,411 8,065 8,035 5,194 5,172 
65 10,533 10,551 2,468 2,459 8,241 8,313 5,359 5,327 
60 10,625 10,646 2,514 2,508 8,414 8,390 5,522 5,483 
55 10,714 10,736 2,558 2,554 8,583 8,561 5,680 5,635 
50 10,804 10,827 2,603 2,602 8,753 8,735 5,840 5,791 
Probability of annual runoff at Datong station (%)Above Cuntan
Cuntan–Yichang
Yichang–Hankou
Hankou–Datong
Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)Equation (14)Equation (13)
95 9,744 9,701 2,071 2,049 6,748 6,724 3,957 4,063 
90 9,959 9,940 2,179 2,160 7,154 7,124 4,338 4,394 
85 10,110 10,104 2,255 2,238 7,439 7,407 4,606 4,632 
80 10,232 10,234 2,316 2,302 7,670 7,638 4,823 4,828 
75 10,342 10,351 2,371 2,359 7,878 7,847 5,018 5,008 
70 10,441 10,455 2,421 2,411 8,065 8,035 5,194 5,172 
65 10,533 10,551 2,468 2,459 8,241 8,313 5,359 5,327 
60 10,625 10,646 2,514 2,508 8,414 8,390 5,522 5,483 
55 10,714 10,736 2,558 2,554 8,583 8,561 5,680 5,635 
50 10,804 10,827 2,603 2,602 8,753 8,735 5,840 5,791 
According to the principle of water balance, the sum of the mean value () of the conditional probability distribution of annual runoff in each region should be equal to the given (design) annual runoff (x*) of the basin control section. Since the results calculated from Equations (13) and (14) are consistent, the relatively simple Equation (14) is therefore used for the analysis. If the sum of parameter a of each region of the basin in Equation (13) is equal to 1 and the sum of parameter b is equal to 0, then the sum of the mean value of each region of the basin is equal to x* of the control section. If there are number m regions in the basin, the annual runoff series of the kth region is , and the annual runoff series of the basin control section is X, and then the parameter of the annual runoff regression Equation (14) of the kth region and the basin control section is:
(16)
where is the mean square deviation of the annual runoff series of the kth region, and is the mean square deviation of the annual runoff series X at the basin control section. After substituting the calculation of R:
(17)
where n is the length of the annual runoff series, is the ith runoff of the annual runoff series , is the mean value of the series , is the ith runoff of the annual runoff series X, and is the mean value of the series X. The values of of m regions are added to obtain:
(18)
Parameters of the annual runoff regression equation in the kth region are:
(19)
The values of of number m regions are added to obtain:
(20)

It is clear that by substituting the annual runoff of the basin control section into the annual runoff regression Equation (14), the sum of the annual runoff of each region in the basin is equivalent to the annual runoff of the control section of the basin.

Under the condition of the given annual runoff of the basin control section, the logarithmic conditional probability distribution of annual runoff of each region in the basin can be regarded as the probability distribution of the logarithmic annual runoff series. Solution of the given annual runoff of the basin control section on the regression line (Equation (5)) of the correlation diagram of the logarithmic annual runoff of the regional and basin control sections is taken as the mean value () of this logarithmic annual runoff series. The difference between the data points on the regression line (Equation (5)) in the correlation diagram between the logarithmic annual runoff of each region and the basin control section will lead to the fluctuation of the logarithmic annual runoff series every year.

Overall, this paper comprehensively analyzes the correlation and probability combination between the annual runoff of controlled sections in the Songhua River and Yangtze River basins and the annual runoff in various areas of the basin. Details are as follows.

This study provides a comprehensive analysis of the annual runoff in the basin and its sub-basins, utilizing both the year method and the regional composition method. The integration of these methods allows for a detailed understanding of the temporal and spatial distribution of runoff, revealing the significant contributions of sub-basin flows to the overall basin runoff.

By fitting the entire empirical cumulative distribution function (CDF) using the log-normal distribution, our analysis captures the full range of variability in annual runoff. This approach provides a robust framework for understanding the probabilistic characteristics of runoff, which is crucial for effective water resource management and flood risk assessment.

The study confirms the presence of strong long-term auto-correlation in the runoff process, characterized by long-term persistence behavior. This finding aligns with previous studies and underscores the importance of considering temporal dependencies in hydrological modeling.

The authors appreciated the editor and anonymous reviewers for their constructive comments and suggestions on the revision of this paper. This work was supported by the National Key R&D Program of China [grant number 2021YFB3900604-04/05] and the National Natural Science Foundation of China [grant number 42271084].

F.S.: Formal analysis, Writing – original draft. G.W.: Methodology. S.N.: Software, Supervision. Y.T.: Funding acquisition, Resources. J.Y.: Data curation. H.L.: Software. X.X.: Writing – review and editing. M.Z.: Software, Visualization. Y.C.: Software. All authors have read and agreed to the published version of the manuscript.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Burr
I. W.
(
1942
)
Cumulative frequency functions
,
Annals of Mathematical Statistics
,
13
(
2
),
215
232
.
Chen
L.
,
Singh
V. P.
,
Shenglian
G.
,
Hao
Z.
&
Li
T.
(
2012
)
Flood coincidence risk analysis using multivariate copula functions
,
Journal of Hydrologic Engineering
,
17
(
6
),
742
755
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0000504
.
Dimitriadis
P.
,
Koutsoyiannis
D.
,
Iliopoulou
T.
&
Papanicolaou
P.
(
2021
)
A global-scale investigation of stochastic similarities in marginal distribution and dependence structure of key hydrological-cycle processes
,
Hydrology
,
8
(
2
),
59. https://doi.org/10.3390/hydrology8020059
.
Fischer
S.
&
Schumann
A. H.
(
2021
)
Multivariate flood frequency analysis in large river basins considering tributary impacts and flood types
,
Water Resources Research
,
57
(
8
),
e2020WR029029
.
https://doi.org/10.1029/2020WR029029
.
Giandotti
(
1934
)
Previsione delle piene e delle magre dei corsi d'acqua, Memorie e Studi Idrografici, 8, 107–117
.
Grimaldi
S.
,
Petroselli
A.
,
Tauro
F.
&
Porfiri
M.
(
2012
)
Time of concentration: a paradox in modern hydrology
,
Hydrological Sciences Journal
,
57
(
2
),
217
228
.
https://doi.org/10.1080/02626667.2011.644244
.
Hangshing
L.
&
Dabral
P. P.
(
2018
)
Multivariate frequency analysis of meteorological drought using copula
,
Water Resources Management
,
32
(
5
),
1741
1758
.
https://doi.org/10.1007/s11269-018-1901-0
.
Herman
J. D.
,
Quinn
J. D.
,
Steinschneider
S.
,
Giuliani
M.
&
Fletcher
S.
(
2020
)
Climate adaptation as a control problem: review and perspectives on dynamic water resources planning under uncertainty
,
Water Resources Research
,
56
(
2
),
e24389
.
https://doi.org/10.1029/2019WR025502
.
Hu
Y. M.
,
Liang
Z. M.
,
Solomatine
D. P.
,
Wang
H. M.
&
Liu
T.
(
2021
)
Assessing the impact of precipitation change on design annual runoff in the headwater region of Yellow River, China
,
Journal of Environmental Informatics
,
37
(
2
),
122
129
.
Jiang
Y.
(
2009
)
China's water scarcity
,
Journal of Environmental Management
,
90
(
11
),
3185
3196
.
Jiang
C.
,
Xiong
L.
,
Guo
S.
,
Xia
J.
&
Xu
C.
(
2017
)
A process-based insight into nonstationarity of the probability distribution of annual runoff
,
Water Resources Research
,
53
(
5
),
4214
4235
.
https://doi.org/10.1002/2016WR019863
.
Jianping
B.
,
Pengxin
D.
,
Xiang
Z.
,
Sunyun
L.
,
Marani
M.
&
Yi
X.
(
2018
)
Flood coincidence analysis of Poyang Lake and Yangtze River: risk and influencing factors
,
Stochastic Environmental Research and Risk Assessment
,
32
(
4
),
879
891
.
https://doi.org/10.1007/s00477-018-1514-4
.
Latif
S.
&
Mustafa
F.
(
2020
)
Copula-based multivariate flood probability construction: a review
,
Arabian Journal of Geosciences
,
13
(
3
),
132
.
https://doi.org/10.1007/s12517-020-5077-6
.
Lee
M.
,
An
H.
,
Jeon
S.
,
Kim
S.
,
Jung
K.
&
Park
D.
(
2022
)
Development of an analytical probabilistic model to estimate runoff event volumes in South Korea
,
Journal of Hydrology
,
612
,
128129
.
Lilienthal
J.
,
Fried
R.
&
Schumann
A.
(
2018
)
Homogeneity testing for skewed and cross-correlated data in regional flood frequency analysis
,
Journal of Hydrology
,
556
,
557
571
.
Morán-Vásquez
R. A.
,
Cataño Salazar
D. H.
&
Nagar
D. K.
(
2022
)
Some results on the truncated multivariate skew-normal distribution
,
Symmetry
,
14
(
5
),
970
.
Nerantzaki
S. D.
&
Papalexiou
S. M.
(
2022
)
Assessing extremes in hydroclimatology: a review on probabilistic methods
,
Journal of Hydrology
,
605
,
127302
.
Niu
C.
,
Chang
J.
,
Wang
Y.
,
Shi
X.
,
Wang
X.
,
Guo
A.
,
Jin
W.
&
Zhou
S.
(
2022
)
A water resource equilibrium regulation model under water resource utilization conflict: a case study in the Yellow River Basin
,
Water Resources Research
,
58
(
6
),
e2021WR030779
.
https://doi.org/10.1029/2021WR030779
.
Peng
Y.
,
Chen
K.
,
Yan
H.
&
Yu
X.
(
2017
)
Improving flood-risk analysis for confluence flooding control downstream using Copula Monte Carlo method
,
Journal of Hydrologic Engineering
,
22
(
8
),
04017018
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0001526
.
Vivekanandan
N.
&
SriShailam
(
2021
)
Selection of best fit probability distribution for extreme value analysis of rainfall
,
Water and Energy International
,
63
(
10
),
13
19
.
Wang
Q. J.
,
Robertson
D. E.
&
Chiew
F. H. S.
(
2009
)
A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites
,
Water Resources Research
,
45
(
5
),
W05407
.
https://doi.org/10.1029/2008WR007355
.
Wen
Y.
,
Yang
A.
,
Kong
X.
&
Su
Y.
(
2022
)
A Bayesian-model-averaging copula method for bivariate hydrologic correlation analysis
,
Frontiers in Environmental Science
,
9
,
744462
.
Yang
L.
,
Zhao
G.
,
Tian
P.
,
Mu
X.
,
Tian
X.
,
Feng
J.
&
Bai
Y.
(
2022
)
Runoff changes in the major river basins of China and their responses to potential driving forces
,
Journal of Hydrology
,
607
,
127536
.
Zhong
M.
,
Zeng
T.
,
Jiang
T.
,
Wu
H.
,
Chen
X.
&
Hong
Y.
(
2021
)
A copula-based multivariate probability analysis for flash flood risk under the compound effect of soil moisture and rainfall
,
Water Resources Management
,
35
(
1
),
83
98
.
https://doi.org/10.1007/s11269-020-02709-y
.
Zhou
Y.
,
Cui
Z.
,
Lin
K.
,
Sheng
S.
,
Chen
H.
,
Guo
S.
&
Xu
C.-Y.
(
2022
)
Short-term flood probability density forecasting using a conceptual hydrological model with machine learning techniques
,
Journal of Hydrology
,
604
,
127255
.
Zhu
X.
,
Wei
Z.
,
Dong
W.
,
Ji
Z.
,
Wen
X.
,
Zheng
Z.
,
Yan
D.
&
Chen
D.
(
2020
)
Dynamical downscaling simulation and projection for mean and extreme temperature and precipitation over central Asia
,
Climate Dynamics
,
54
(
7–8
),
3279
3306
.
https://doi.org/10.1007/s00382-020-05170-0
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data