## Abstract

The length of record (LOR) method is an evaluation method that provides quantitative advice for the amount of computational data required for use of the indicators of hydrological alteration (IHA). The use of multi-index hydrological indicators to reflect river hydrological–ecological characteristics is the essence of the IHA method, while the LOR evaluation result using a single index does not have practical application value in the absence of IHA data volume. In this paper, we expand the LOR method from single index version into multi-index version, apply it to comprehensively analyze the credibility of hydrological alteration (HA) multi-indicators under different data volumes, and explore the relationship between multi-index LOR results and data requirements. Combined with the hydrological–ecological relationship, the practical application criteria of LOR dimension reduction under the condition of multiple HA indicators is given. The results show that the LOR results corresponding to each group of indicators in IHA have different data requirements, so an in-depth understanding of the hydrological–ecological relationship is the key to LOR's application of IHA data dimension reduction. In addition, we discuss the limitations of the LOR method of multi-index dimension reduction and its application value in IHA calculations.

## INTRODUCTION

Hydrologic regimes play major roles in determining the biotic composition, structure, and function of aquatic, wetland, and riparian ecosystems (Richter *et al.* 1996). The flow stability and flow rate change create habitat conditions for all kinds of organisms (Richter *et al.* 1997). Extreme hydrological conditions and the timing of occurrences of flow determine the integrity of the life cycles of living species and the direction of superiority in the survival competition (Lytle & Poff 2004). Moreover, the magnitude and duration of the pulse flow affect the success and failure of the breeding process (Lake 2008). In addition, hydrological regimes indirectly affect ecosystems, riparian systems, and wetland ecosystems by altering the water temperature, oxygen content, water chemistry, and sediment transport. Therefore, hydrological alterations are widely considered to be the main drivers of biotic and abiotic conditions in river ecosystems (Poff *et al.* 1997; Scarcella *et al.* 2016). In other words, the indicators of hydrological alteration (IHA) can be used to measure the degree of ecological damage (Richter *et al.* 1997; Poff *et al.* 2006). Among the many existing methods, the IHA method can be used worldwide. This is because the IHA method uses statistical methods, and analyzes the hydrological indicators associated with biological attributes to quantify how much the flow conditions deviate from natural conditions (Poff & Matthews 2013).

With the enhanced understanding of hydrological alterations and ecological responses, the IHA evaluation has rapidly changed from an ecological impact assessment of a single river hydrological alteration (HA) to the ecological impact assessment of the river network. Increasingly, smaller watersheds are gradually becoming the object of IHA evaluation, leading scientific researchers to increase their data volume demand. That is, scientists and scholars using the IHA evaluation method will face the problem of ‘missing data’ in the future (Carlisle *et al.* 2010). In some developed countries, almost complete scientific-data-sharing management mechanisms have been implemented. For example, the official website of the United States Geological Survey (USGS) fully discloses long-sequence daily flow data, from which it is relatively easy to obtain basic flow data (Isaacson & Coonrod 2011; Afshari *et al.* 2017). However, owing to monitoring equipment failures and the impacts of natural phenomena, such as earthquakes, floods, or landslides, and data transmission issues, a lack of data from the storage and retrieval process is inevitable (Makhuvha *et al.* 1997; Panu *et al.* 2000).

In some developing countries, the problem of missing data has become further complicated. For example, the rate at which new hydrological monitoring stations are being constructed is not sufficient to meet the increasing data needs of scientists and scholars (Eng *et al.* 2013; Harrigan *et al.* 2018). Moreover, the relations among neighboring countries have been strained for a long time, and cross-border river hydrology data have strict national confidentiality. Even in non-state-confined inland rivers, there are scattered years of hydrological data that have not occurred in the entire publication. When applying IHA calculations, users usually arbitrarily define the amount of data to be used according to their own subjective understanding, or ignore IHA calculation data limits. Two major bottlenecks in the application of IHA thus exist (Giustarini *et al.* 2016). On one hand, the results of the IHA evaluation based on inadequate hydrological data will be seriously questioned. On the other hand, no IHA evaluation can be performed at any time in a basin where any data are missing.

To accelerate the globalization of IHA applications, scientifically reliable results are increasingly being shared by developers and users. Many studies have been conducted to address the above issues. In earlier studies, some scholars used mathematical statistical interpolation to compensate for the missing data. Khalil *et al.* (1999) used a simple interpolation method (such as average interpolation and correlation analysis) to satisfy some data-filling requirements. Wang *et al.* (2015) used the regression relationship between adjacent hydrological stations in the Tarim River Basin in China to compensate for the missing data. All interpolated flow time series that satisfied homogeneity were evaluated by the penalized maximal F-(PMF) uniformity test. Harvey *et al.* (2012) evaluated 15 simple filling techniques, including regression, scaling, and equidistance methods, and found that flow data can improve the accuracy of filling data by performing seasonal grouping or modifying data sets to improve homogeneity. However, in many cases, the relationship between predictors is not a true linear relationship (Govindaraju & Rao 2000; Yang *et al.* 2017).

The artificial neural network (ANN) technique has gradually replaced the data-filling method of hypothetical variables (simple linear, nonlinear, etc.) by representing the nonlinear mapping between variables (French *et al.* 1992). Khalil *et al.* (2001) applied the concept of a population and ANN to compensate for the peak flow and medium flow of missing data, and proved that it is reasonable to fill gaps in river flow data based on the neural network concept and group value data method. Ilunga & Stephenson (2005) used three different back propagation neural network learning methods to complete the missing data, and verified the difference in accuracy between them and the observed data.

Meanwhile, as researchers strive to improve the accuracy of data, the combination of basic mathematical statistical interpolation methods and hydrological feature conditions is gradually being applied to data filling. Poff *et al.* (2010) and Kennard *et al.* (2010) and others used the method of river classification and grading of characteristics similar to flow rates to establish the baseline conditions for the flow. The hydrological index was used to effectively compensate for the amount of data in non-hydrological stations and to solve the problem of missing IHA data. Hughes & Smakhtin (1996) supplemented the missing data for six sets of catchment areas in southern Africa based on the use of 1-day flow duration curves for each month of the year. The approach assumes that flows that simultaneously occur at sites in reasonably close proximity to each other correspond to similar percentage points on their respective duration curves. Mobley *et al.* (2012) used the simple drainage area ratio (DAR) method to fill the data based on the assumption that the observed and simulated watersheds had the same physical characteristics. It was proved that the results of the DAR comparison method could well represent extremely low-flow conditions. However, these methods still lack practical physical meaning, and their data accuracy is relatively low.

Other scholars compensated for the missing data through the daily average runoff simulation results by hydrological models. The general hydrological model uses relatively simple mathematical equations to describe the highly correlated processes of water, energy, and vegetation in complex spatial distributions in the watershed (Vrugt *et al.* 2005). The hydrological model with clear physical meaning greatly improves the accuracy of data filling. Shrestha used the distributed variable infiltration capacity (VIC), a distributed hydrological model, to simulate the 1991–2000 long-sequence flow of the Fraser River in British Columbia, Canada. It was found that the VIC model could simulate the annual flow of the IHA index, the median flow rate, the seasonal change of flow rate, and the annual minimum flow rate. However, uncertainties remained in the monthly average flow, pulse flow, and frequency indicators (Shrestha *et al.* 2014). Ryo *et al.* (2015) used the distributed hydrological model (DHM) to produce similar conclusions with respect to the Sagami River basin in Japan.

To increase the ability of the model to portray extreme hydrological events, Kusangaya *et al.* (2018) used downscaled general circulation model (GCM) data and historical climate data, respectively, in hydrological models to compare the results. It was found that GCM data results perform poorly when simulating extremely high and low flows. However, IHA features can be more accurately described if the climatic data are downscaled at finer spatial scales and with adequate spatial detail. Nevertheless, in the establishment of hydrological models, natural geospatial characteristics, such as precipitation, canopy interception, evaporation, seepage, and groundwater flow, must be used (Engeland *et al.* 2001; Todini 2011; Jin *et al.* 2017). These characteristics are actually types of implicit data. Such data can greatly increase the initial workloads of IHA applications.

To solve the current bottleneck, Timpe & Kaplan (2017) proposed the concept of the length of record (LOR) method. LOR gives the reliability of the IHA calculated by the existing data volume through statistical methods, thereby avoiding collecting too much unnecessary data. Under the assumption of similar area, altitude, and flow, it considers that the reliability calculated by LOR can reflect the IHA calculation data demand within a specific geographical range around a given region. This method requires no addition to the data volume and simplifies the data processing workload for calculating IHA, such as data interpolation or hydrological models. At the same time, the credibility of the existing data is engendered by the fluctuation range and confidence interval of the average value in the IHA results, which reduces the data dimension. However, in that study (Timpe & Kaplan 2017), only the 1-day-max indicator in the IHA was used for the example of the dimensionality evaluation. The corresponding LOR dimensionality reduction effects of the 33 indicators in the IHA have not yet been fully elucidated. Further explanations are needed for the data volume selection and the LOR dimension reduction application method under the IHA multi-indicator.

So far, LOR is the only evaluation method that gives a clear estimate of the amount of data required for IHA calculations through mathematical statistics, as a single indicator cannot comprehensively reflect changes in river hydrological characteristics, and, in the case of identical amounts of data, there may be differences in the sizes of LOR results for different IHA indicators. Therefore, the LOR result of a single indicator cannot provide substantial help for the IHA method application in the choice of data volume. In order to overcome this shortcoming, we attempted to analyze the LOR results of all IHA indicators and discover the mechanism of influence between the hydrological data structure and the LOR evaluation results. On the basis of fully understanding the LOR evaluation results of different IHA indicators, the applicability scope of the LOR method in IHA multi-index applications is proposed to prove the importance of this method in dimension reduction application. The research results can provide effective suggestions for data volume selection in eco-hydrological evaluation.

## MATERIALS AND METHODS

### Study area and data

The Jinsha River is in the upper reaches of the Yangtze River, China's largest river, with a total length of 3,481 km and a total drainage area of 572,300 km^{2}. The runoff and rainfall in the Jinsha River are concentrated in the flood season from June to October, and the runoff from July to September accounts for about 55% of the entire year. The impact of human activities in the basin was relatively late. After 2005, in the mainstream of the Jinsha River, hydropower stations began to be built. The Panzhihua hydrological station is located at the last station of the mainstream of the middle reaches of the Jinsha River and was built in 1965. The daily hydrologic data from 1976 to 2015 used in the article was provided by the Hydrology Yearbook of the Ministry of Water Resources of the People's Republic of China. Panzhihua hydrological station was chosen because the interannual runoff is stable and has complete daily runoff data, which is beneficial to the LOR method calculation.

### IHA method

The IHA method is an open-source interface model developed by the Nature Conservancy (2009). It uses daily average flow data and calculates 33 indicators closely related to ecological attributes (Richter *et al.* 1996), as shown in Table 1. The 33 indices generated by IHA consist of five major categories: (1) magnitude of monthly water conditions; (2) magnitude and duration of annual extreme water conditions; (3) timing of annual extreme conditions; (4) frequency and duration of high and low pulses; and (5) rate and frequency of condition changes (Magilligan & Nislow 2005). Among them, the characteristics of extreme events are more related to the indicators in Groups 3 and 4, and the average characteristics of traffic are more closely correlated with Group 1 and Group 5 indicators.

IHA statistics group | Hydrologic parameters | Abbreviated name |
---|---|---|

Group 1: Magnitude of monthly water conditions (12 indices) | Mean value for January | January |

Mean value for February | February | |

Mean value for March | March | |

Mean value for April | April | |

Mean value for May | May | |

Mean value for June | June | |

Mean value for July | July | |

Mean value for August | August | |

Mean value for September | September | |

Mean value for October | October | |

Mean value for October | October | |

Mean value for December | December | |

Group 2: Magnitude and duration of annual extreme water conditions (12 indices) | Annual minima 1-day means | 1-day min |

Annual minima 3-day means | 3-day min | |

Annual minima 7-day means | 7-day min | |

Annual minima 30-day means | 30-day min | |

Annual minima 90-day means | 90-day min | |

Annual maxima 1-day means | 1-day max | |

Annual maxima 3-day means | 3-day max | |

Annual maxima 7-day means | 7-day max | |

Annual maxima 30-day means | 30-day max | |

Annual maxima 90-day means | 90-day max | |

Base flow index | Base flow | |

Number of zero flow days | Zero flow days | |

Group 3: Timing of annual extreme water conditions (2 indices) | Julian date of each annual,1-day maximum | Date min |

Julian date of each annual,1-day minimum | Date max | |

Group 4: Frequency and duration of high and low pulses (4 indices) | Number of low pulses within each water year | Lo pulse # |

Mean or median duration of low pulses (days) | Lo pulse L | |

Number of high pulses within each water year | Hi pulse # | |

Mean or median duration of high pulses (days) | Hi pulse L | |

Group 5: Rate and frequency of water condition changes (3 indices) | Rise rates: Mean or median of all positive differences between consecutive daily values | Rise rate |

Rise rates: Mean or median of all negative differences between consecutive daily values | Fall rate | |

Number of hydrologic reversals | Reversals |

IHA statistics group | Hydrologic parameters | Abbreviated name |
---|---|---|

Group 1: Magnitude of monthly water conditions (12 indices) | Mean value for January | January |

Mean value for February | February | |

Mean value for March | March | |

Mean value for April | April | |

Mean value for May | May | |

Mean value for June | June | |

Mean value for July | July | |

Mean value for August | August | |

Mean value for September | September | |

Mean value for October | October | |

Mean value for October | October | |

Mean value for December | December | |

Group 2: Magnitude and duration of annual extreme water conditions (12 indices) | Annual minima 1-day means | 1-day min |

Annual minima 3-day means | 3-day min | |

Annual minima 7-day means | 7-day min | |

Annual minima 30-day means | 30-day min | |

Annual minima 90-day means | 90-day min | |

Annual maxima 1-day means | 1-day max | |

Annual maxima 3-day means | 3-day max | |

Annual maxima 7-day means | 7-day max | |

Annual maxima 30-day means | 30-day max | |

Annual maxima 90-day means | 90-day max | |

Base flow index | Base flow | |

Number of zero flow days | Zero flow days | |

Group 3: Timing of annual extreme water conditions (2 indices) | Julian date of each annual,1-day maximum | Date min |

Julian date of each annual,1-day minimum | Date max | |

Group 4: Frequency and duration of high and low pulses (4 indices) | Number of low pulses within each water year | Lo pulse # |

Mean or median duration of low pulses (days) | Lo pulse L | |

Number of high pulses within each water year | Hi pulse # | |

Mean or median duration of high pulses (days) | Hi pulse L | |

Group 5: Rate and frequency of water condition changes (3 indices) | Rise rates: Mean or median of all positive differences between consecutive daily values | Rise rate |

Rise rates: Mean or median of all negative differences between consecutive daily values | Fall rate | |

Number of hydrologic reversals | Reversals |

### LOR analysis

The characterization of natural and altered flow regimes using IHA or other statistical methods requires adequate flow data. Richter *et al.* (1996) analyzed three rivers with different hydrological conditions and considered that the flow data at the time of calculating IHA would take at least 20 years. However, no one has conducted further research on the reliability of data required by IHA. Given this uncertainty, we modify the definition of data volume requirements in IHA data analysis by Richter. The number of years corresponding to the evaluation result of LOR is used as the lower limit of data volume demand calculated by IHA, and is applied to the demand limit of IHA calculation data volume in similar hydro-geographic spatial feature areas (Hughes & Smakhtin 1996).

The numerator and denominator in the LOR results are normalized statistics representing confidence level and confidence interval, respectively. First, the LOR data calculation selects the flow station with the least anthropogenic impact and longest record lengths in the desired evaluation area. Then, we determine the IHA indicator that provides the LOR calculation reference. Next, we calculate the target indicator for each year in a data set along with the long-term mean for this indicator. The reference indicator's long-term average values are randomly ordered and grouped into record-length increments ranging from two years to the full LOR. Finally, the mean of each record length increment is calculated for a comparison with the long-term mean. This process is repeated 50,000 times, from which 95%, 90%, and 85% confidence intervals are calculated. Using these statistics, we calculate the LOR required to produce a specified confidence level for a given long-term mean confidence interval for the river in the study (Table 2).

In this study, we examine as a calculation example the hydrological flow of the Panzhihua hydrological station in the middle areas of the Jinsha River in China from 1976 to 2015. Using the 1-day maximum index as a reference, the 40-year LOR evaluation results at this station are shown in Figure 1. The data volume required to characterize the annual maximum flow within 10% of the long-term mean with 85% confidence is 10 years, and the LOR result is abbreviated as 10/85. Figure 1 illustrates this example, with dashed vertical lines indicating the data volume required for a desired calculation output at the intersection of a dashed horizontal line, representing a given confidence level, and a normal curve, representing a given confidence interval (as CI in Figure 1). Therefore, the LOR method can be deemed an effective method for reducing data by giving the degree of reliability of the IHA result in the absence of the data amount.

## RESULTS AND DISCUSSION

### Analysis of LOR evaluation results and influencing factors of IHA indicators

The LOR dimensionality reduction results of the IHA hydrological flow outcomes over the past 40 years in the Jinsha River Panzhihua hydrological site are shown in Table 2. It is observed that the amount of data required to produce a confidence level at 10% of the average flow is two or three times the amount of data required to produce a confidence level at 5% of the average flow for the same CI in Group 1 and Group 2. For example, the amount of data required to produce a 5/95 LOR result for the January indicator is three times the amount required for a 10/95 LOR result. Likewise, more than twice as many years are required for a 5/90 LOR result relative to a 10/90 LOR result for an indicator of the 7-day maximum. In addition, the amount of data required to produce a given LOR result to sustain N-day max in Group 2 is almost twice the amount of data required to sustain N-day min. The date min indicator in Group 3 has a higher data volume requirement (nearly 40 years) to produce a given LOR confidence, while the date max indicator shows a lower data volume requirement (less than 10 years) to produce a given LOR confidence. Although the two indicators are in the same group, there is a significant difference in the amount of data required for the same LOR result confidence. In Group 4 and Group 5, the amount of data required to produce a given LOR result for the reversals indicator is relatively low, as the amount of data required for other indicators is significantly higher (nearly 40 years).

The amount of data required to produce a given LOR result for each indicator of HA showed a positive correlation with the size of the flow during the year. According to the average monthly flow rate of Panzhihua station, the flood season is concentrated from June to October, and floods are more frequent from July to September. The indicators in Group 1 of the IHA can reflect the flow change during the year. We can observe from Figure 2 that the greater the monthly average flow, the greater the amount of data that is required to produce a given LOR result. For example, in the flood season in August, the data volume required to produce a 10/95 LOR result will take 20 years; in the dry season in March, the data volume required to produce a 10/95 LOR result will only take four years.

The amount of data required to produce a given LOR result for each indicator in Group 2 has a consistent relationship with the flow rate during the wet and dry seasons. Because the N-day maximum appears in the flood season, the N-day minimum appears in the dry season, and 70% of the water in the whole year appears in the wet season (Proctor *et al.* 2011). Therefore, it is also stated that the larger the flow rate, the larger the amount of data is required to produce a given LOR result in IHA.

On the other hand, regardless of the level of LOR credibility, the data volume requirements for the indicators in Group 3, Group 4, and Group 5 remain relatively high value, which may be related to the uncertainty of the component parameter attributes. The finding also indicates that fewer years cannot guarantee the reliability of the IHA results.

### Data variability affects the amount of data reflected by LOR

The coefficient of variation of the hydrological flow data and the size of the mutation intensity will have an impact on the amount of data reflected by the LOR. The LOR corresponding to IHA has its own variation characteristics. The coefficient of variation for the long series of data for each indicator of IHA is shown in the coefficient of variation column in Table 3, and the mutation intensity for the long series of data is shown in the mutation intensity column. We define Mutation intensity = Max/Average, which reflects the deviation intensity of the most catastrophic point. In this paper, the value of Mutation intensity >1.7 (Dimensionless) is used as the boundary of mutation points. In comparing the coefficient of variation of IHA for each group of indicators in Table 3, and combining them with data volumes reflected by the LOR results in Table 2, we find that the larger the coefficient of variation, the greater the number of years that is required for evaluation of the LOR results.

In particular, the coefficient of variation of the date minimum, Lo pulse #, Lo pulse L, and Hi pulse L in the IHA exceeds 0.8, and the amount of data at each LOR credibility level exceeds 30 years. On the other hand, even if there are differences in the variability of the two indicators in the same group, the data volume results of the date minimum and date maximum indicators in Group 3 are 38 years and two years, respectively, for the 10/85 LOR result.

The mutation intensity results are shown in the mutation intensity column in Table 3. By comparing the LOR results of Tables 2 and 3, we can observe that the higher the intensity of the mutation, the greater the amount of data that is required to produce a given LOR result. For example, the mutation intensity in Group 4 exceeds 2.0 while the mutation intensity in Group 1 is less than 1.7. The data volume required to produce a given LOR result for Group 4 is 10 years more that of Group 1.

After removing these high intensity points, the amount of data required to produce a given LOR result among the 33 indicators in IHA was reduced by varying degrees. We define Drf as the amount of data required after the removal of the mutation point, Taf as the total amount of data after the removal of the mutation point, Dta as the amount of data required before the removal of the mutation point, and Tab as the total amount of data before the removal of the mutation point, The y-axis percentage value in the figure indicates Drf/Taf – Dtb/Tab, and reflects the degree of change in the amount of data before and after the removal of the mutation point in Figure 3. Take the data volume at 5/95 LOR credibility as an example; after removing these mutation points, the amount of data will be significantly reduced for some indicators in Figure 3. For example, although the LOR dimensionality reduction effect after removal of the mutation point improves by 30.83%, fall rate still requires 24 years of long-sequence data. However, the effects of Lo pulse #, Hi pulse #, and the Rise rate indicator reduction of data volume are not significant. This means that Lo pulse #, Hi pulse #, and the Rise rate indicators do not significantly improve the dimensionality reduction effect. This finding shows that, for these indicators, the removal of the mutation point does not further improve the data demand lower limit. In other words, the LOR evaluation itself has a lower limit value, and it is not possible to use just a few years of flow data to make the IHA calculation result extremely reliable.

In addition, for the September index, Lo pulse L showed a slight decrease in the amount of data required to produce a given LOR result after removing the mutation point. This result does not mean that the LOR dimensionality reduction effect of the two indicators has a significant downward trend after the mutation point removal. It is worth mentioning that, when performing LOR calculations, it is necessary to properly remove the abnormal mutation points, and the results after the elimination of errors can become a truly practical reference. However, some physical meanings must be considered when removing the mutation points. For example, after a dam is built, the frequency of rising tides will significantly increase. Therefore, at this time, LOR calculations should retain these mutation points.

### Ecological standard for multi-index LOR dimension reduction evaluation application

The amount of data required to produce a given LOR result differs between the 33 indicators in IHA. Therefore, when the maximum volume of all 33 indicators is selected as the standard data amount for better LOR results, the required data volume is still close to the total amount of data, which has no practical significance. In fact, the maximum amount of data required to produce a 5/95 LOR result using all 33 indicators in IHA is 39 years, only reducing the amount by one year. Even producing a lower 10/85 LOR result using all 33 indicators in IHA, the amount of data required is 35 years. However, it is not necessary to use the maximum amount of data available as the lower limit of the data volume when producing a given LOR result, as the maximum amount required varies among the 33 indicators in IHA. Instead, we should focus on the hydrological–ecological response that we must evaluate. If the maximum amount of data required to produce a given LOR result among all the indicators in IHA that affect the key hydrological indicators of the biological life history process as the amount of IHA data, then IHA data dimensionality reduction will no longer be difficult. The application of multi-indicator LOR dimension reduction can achieve wide recognition for the reliability of IHA evaluation results in the absence of data, because such data volume evaluation results are not only the evaluation of applied mathematical statistics, but also carry sufficient hydrological, ecological, and physical significance.

According to the ecological–hydrological response relationships that must be analyzed by IHA, we review the relevant literature and summarize a series of results obtained through experiments and observations (Barbour *et al.* 1999; Greig *et al.* 2007; Turkoglu 2010; Murgulet *et al.* 2016; Mwedzi *et al.* 2017; Xia *et al.* 2017). Additionally, we propose the LOR dimensionality reduction reference indicators with ecological and hydrological significance, as shown in Table 4. Some ecological and hydrological relationships are obviously only closely related to the indicators in a certain group. For example, if water temperature, sediment transport, and animal shelter area are related to Group 1, then all indicators in Group 1, or several of them, can be used as reference indicators for LOR. Such reference indicators are recorded as Key in Table 4. Taking the water temperature at Panzhihua station as an example, even the data volume required to produce a highly confident 5/90 LOR result can be reduced by 25%. Some ecological–hydrological relationships may have a greater relationship with individual indicators in two or three groups. For example, changes in the invertebrate community are importantly associated with Groups 3 and 4, while changes in the aquatic plant community are associated with Groups 2, 4, and 5 in IHA. The amount of data reflected by the LOR results of these indicators can then be collectively used as data constraints, and are recorded as Key in Table 4.

### Study limitations and application

This study has several limitations. We generally believe that changes in the average volume of the flow become the driving force affecting the habitat, and that the reliability parameter value calculated by the LOR is based on the data sample mean as the evaluation criterion. In the case of identical amounts of data, the LOR result of the reference indicator of the data variation amplitude equalization feature will have higher credibility. Therefore, we also need to further understand the hydrological indicators, so that the LOR fitting results curve reflects a close ecological response relationship.

In addition, the influencing factors of LOR results are obviously positively correlated with the flow. This is because Panzhihua station, used in this paper, is located in a high-altitude area where human activity is relatively low, with continuous daily flow and relatively long data record length (>40 years). However, if it is replaced by other regions, or is impacted by other factors such as the relatively large impact of human activities, lower elevations of hydrological stations, or a larger watershed area, the correlation between LOR results and monthly average flow will require more hydrological sample data to justify.

Nonetheless, the rational application of the LOR dimension reduction method does not mean that the workload of data statistics and collection can be reduced in future work. When using the IHA method to evaluate the ecological–hydrological relationship of a particular fish habitat, the reference indicators for ecologically significant LOR data reduction will vary across fish species. In the Yunnan region of China (Pan *et al.* 2015), owing to hydropower development and construction, fish reproduction mainly depends on proliferation and release. Therefore, the indices of Groups 3, 4, and 5, which are closely related to the breeding season, are not used as the main reference for LOR. In the United States, Missouri (Westhoff *et al.* 2016) temperature mainly affects the life history of the smallmouth bass, and Group 1 will be more reasonable as a reference index for LOR dimensionality reduction calculation. Therefore, different organisms in different regions have different key influencing factors and are subject to human influences (Paus *et al.* 2016). Our future workload should thus be placed on strengthening the statistics of key biological influencing factors. Table 4 presents only general statistics of some of the dimensionality reduction criteria; the presentation is not yet perfect. The LOR method will achieve its practical value only by means of a large number of statistics and experiments, which are used to continuously supplement and improve the priorities of key hydrological indicators for important periods of various organisms. This practical value will be achieved by refining the detailed list of biological species, geographic characteristics, and data reduction criteria as much as possible, after which the multi-index LOR dimension reduction method will achieve its due value.

## CONCLUSIONS

In this paper, we discuss in detail the results of multi-indicator evaluation of the LOR method and its application in data reduction of IHA calculations. Through the LOR reliability analysis of all the indicators in IHA, we found that the amounts of data required to produce a given LOR result are significantly different among the 33 indicators in IHA, and that the amount of data required has a consistent relationship with the size of average monthly flow and the variability of hydrological data. If the selection of data used is arbitrary when producing LOR results using the indicators in IHA, there will be no scientific basis or no actual data dimensionality reduction effect. By determining the objects of IHA analysis, the various key hydrological indicators affecting the target biological activities are used as reference indicators for LOR, and the multi-indicator LOR evaluation results achieve the practicality of IHA data dimensionality reduction. Continuous improvement of the relationship between biological activities and key hydrological indicators will become the development basis and future research direction of multi-index LOR data reduction.

## ACKNOWLEDGEMENTS

The authors would like to express their gratitude for the financial support of the National Basic Research Program of China (973 Program) (2015CB452701).