Abstract
To determine a suitable hydrological model structure for a specific application context using integrated modelling frameworks, modellers usually need to manually select the required hydrological processes, identify the appropriate algorithm for each process, and couple the algorithms' software components. However, these modelling steps are difficult and require corresponding knowledge. It is not easy for modellers to master all of the required knowledge. To alleviate this problem, a knowledge-based method is proposed to automatically determine hydrological model structures. First, modelling knowledge for process selection, algorithm identification, and component coupling is formalized in the formats of the Rule Markup Language (RuleML) and Resource Description Framework (RDF). Second, the formalized knowledge is applied to an inference engine to determine model structures. The method is applied to three hypothetical experiments and a real experiment. These experiments show how the knowledge-based method could support modellers in determining suitable model structures. The proposed method has the potential to reduce the knowledge burden on modellers and would be conducive to the promotion of integrated modelling frameworks.
INTRODUCTION
It is widely recognized that there is no universal hydrological model that can be applied in all application contexts (WMO 1975, 1992). To better understand the natural and human influences on watersheds, users usually need to determine a model structure that is suitable for the specific application context (Rigon et al. 2006; Baymani-Nezhad & Han 2013; Furusho et al. 2013; Lai et al. 2016; Vo & Gourbesville 2016). Integrated modelling frameworks are designed to support model construction through the dynamic coupling of fine-grained components. Examples of such frameworks include the Open Modelling Interface (OpenMI) (Moore & Tindall 2005), the Community Modelling Systems (CMS) (Lu & Piasecki 2012), the Object Modelling System (OMS) (David et al. 2013), the Community Surface Dynamics Modelling System (CSDMS) (Peckham et al. 2013), the Spatially Explicit Integrated Modelling System (SEIMS) (Liu et al. 2016), and the HydroInformatic Modelling System (HIMS) (Wang et al. 2018).
To determine a suitable model structure, modellers usually need to manually select the required processes, identify the appropriate algorithm for each process, and couple the software components of the algorithms. However, these modelling steps are difficult and always require corresponding knowledge (Elag & Goodall 2013; Peckham et al. 2013). It is not easy for modellers to master all of the required modelling knowledge.
Knowledge-based methods can be used to reduce the knowledge burden for hydrological modelling (Liu et al. 2013; Ward et al. 2019). Recently, knowledge-based methods, which incorporate modelling knowledge into integrated modelling frameworks, have been proposed to make component coupling or model selection easier.
To make component coupling easier, Islam & Piasecki (2008) proposed an ontology for the metadata of numerical models based on the Web Ontology Language (OWL), which can support the coupling of different components. Elag & Goodall (2013) presented an ontology for describing the core concepts and relationships for hydrological modelling, which could be further used for component selection and coupling. For automatic coupling of alternative algorithms for watershed modelling, Škerjanec et al. (2014) developed a knowledge library that described hydrological processes, calculation formulas, and their input/output variables using a domain-specific language. Peckham (2014) proposed a smart modelling framework for component coupling by a standardized model interface and corresponding metadata. Harpham & Danovaro (2015) sought to design standard metadata to describe environmental numerical models and their interfaces to other models. The standard metadata could be used for model coupling. Morsy et al. (2017) designed a model metadata framework, in which metadata elements were expressed as Resource Description Framework (RDF) triples, to support the sharing and reuse of hydrological models. Jiang et al. (2017) developed a service-oriented modelling framework to couple models based on the Basic Model Interface (BMI) (Peckham et al. 2013).
To make model selection easier, Chau (2007) proposed an ontology-based knowledge management system to assist users with the selection of the appropriate models for flow and water quality modelling. Qiu et al. (2017) proposed an ontology-based approach to describe environmental models and disaster-related data through semantics. Based on the ontology-based approach, the flood management system could recommend suitable models for users to apply when constructing a workflow.
The previous work mainly emphasized the usage of component-coupling knowledge or model-selection knowledge, whereas little attention has been paid to the usage of process-selection and algorithm-identification knowledge. It is still difficult for modellers to select appropriate processes and algorithms. Moreover, if modellers do not master the relevant knowledge, unsuitable models might be built (Voinov & Shugart 2013).
To alleviate this problem, this paper proposes a knowledge-based method to automatically determine hydrological model structures. Note that this study focuses on the determination of hydrological model structures. Although data preprocessing and parameter calculation are necessary after the hydrological model structure is determined, these topics are outside the scope of this study.
METHODS
Design of the knowledge-based method
To determine hydrological model structures automatically, a knowledge-based method is proposed (Figure 1). The framework of this method consists of three steps. First, process-selection and algorithm-identification knowledge are obtained from experts and literature, while component-coupling knowledge is extracted from the metadata of specific modelling frameworks. Second, the obtained knowledge is formally encoded using the Rule Markup Language (RuleML) (Boley et al. 2010) and RDF (Cyganiak et al. 2014), which can be used by an inference engine. Third, an inference engine is designed and implemented according to the typical procedures of model structure determination, and it is used to generate model structures. Each of the three steps will be discussed in detail in the following subsections.
Preparation of hydrological modelling knowledge
Knowledge of hydrological process selection
A hydrological model consists of multiple hydrological processes such as infiltration/surface runoff, depression, subsurface flow, snowmelt, and groundwater processes. Whether a hydrological process should be involved depends on the simulation purpose (i.e. expected model output), climatic and underlying surface conditions, geological conditions, and hydrological conditions of the region, as well as spatial and temporal scales. For example, when simulating a short periodic rainfall-runoff event in an arid region, where the aeration zone is not easily saturated, subsurface flow and groundwater processes can be omitted. To simulate the water balances in alpine regions where snowmelt might be a source of the discharge peak and a major cause of flooding, the snowmelt process should be involved. If special hydrological or geological conditions such as frozen soil or glaciers exist in a watershed, corresponding processes should be considered. Information regarding which processes are needed for certain application contexts can be listed. This type of knowledge can be obtained from hydrologists and literature.
Knowledge of algorithm identification
There are multiple algorithms that can be used to simulate one hydrological process, and each algorithm has specific application conditions. Algorithm selection depends on watershed physiographic conditions, temporal scale, and data availability.
Watershed physiographic conditions
Every algorithm has assumptions and can be used only under specific physiographic conditions. For example, regarding the infiltration/surface runoff process, the infiltration excess algorithm should be selected for arid watersheds, whereas the saturation excess algorithm should be selected for humid watersheds.
Temporal scale
The temporal scale should be considered in time-based physical algorithms. For example, the Soil and Water Assessment Tool (SWAT) contains two methods for simulating the surface runoff process, i.e. the Soil Conservation Service Curve Number (SCS-CN) method and the Green–Ampt method. The SCS-CN method is suitable for simulations at a daily scale, while the Green–Ampt method is suitable for simulations at an hourly scale or finer time steps (Grimaldi et al. 2013).
Data availability
Data availability limits the applicability of an algorithm. In real applications, due to the lack of data, it is common to replace an algorithm that can simulate a process accurately using detailed data with an algorithm that has fewer input data requirements. For example, among the methods for simulating the potential evapotranspiration (PET) process, the Penman–Monteith method requires the mean daily temperature, relative humidity, solar radiation, and wind speed as input (Allen et al. 1998); the method proposed by Hargreaves and Samani requires daily or monthly maximum and minimum temperatures as input (Hargreaves & Samani 1985), while Thornthwaite's method takes only the mean monthly temperature as input (Thornthwaite 1948). If only data for the mean monthly temperature are available, Thornthwaite's method can be applied.
Knowledge of component coupling
One hydrological modelling component is the software component of a hydrological algorithm in a specific modelling framework. There may exist slight differences in the input/output interfaces for different components of the same algorithm. After identifying the appropriate algorithm for each process, it is necessary to check whether the components of these algorithms are compatible and could be used to assemble a complete composite model (i.e. whether every component can obtain its input from the existing input data or other components' output) (Peckham 2014). For each component, the corresponding algorithm and its input and output should be clearly stated.
Encoding hydrological modelling knowledge
For the unified description and reuse of modelling knowledge, a standard-name library was built (accessible on https://github.com/lreis2415/SEIMS/tree/master/knowledge/rdfBase/variable). The library includes the names of processes, algorithms, variables, and keywords of application context descriptions. These standard names are used to encode hydrological modelling knowledge. Process-selection and algorithm-identification knowledge can be naturally expressed as conditional sentences in the form of ‘if …, then …’. Knowledge of this type is called procedural knowledge or a production rule (Anderson 1983). Component-coupling knowledge can be expressed as a statement called declarative knowledge (Anderson 1983). The encoding form for each type of hydrological modelling knowledge is described below.
Encoding process-selection knowledge and algorithm-identification knowledge
Both process-selection knowledge and algorithm-identification knowledge are types of procedural knowledge, and they can be encoded using RuleML, which is a markup language designed for the interchange of web rules in an XML format. The language is uniform across various rule languages and platforms (Boley et al. 2010). Figures 2 and 3 show two examples of encoding process-selection knowledge in RuleML. Figure 4 provides an example of encoding the algorithm-identification knowledge for the surface runoff process. For a more detailed description of RuleML, please refer to Boley et al. (2010).
Encoding component-coupling knowledge
Hydrological component-coupling knowledge is declarative; thus, the knowledge can be treated as a statement and formalized as an object-attribute-value triple. The triple can be encoded using RDF, a W3C standard for describing identifiable resources (Cyganiak et al. 2014). For example, the component ‘SCS-CN_Com’ has NEPR (i.e. net precipitation), DEPST (i.e. depression storage), and SOTE (i.e. soil temperature) as input variable names, it has EXCP (i.e. excess precipitation), SOMO (i.e. average soil moisture), and INFIL (i.e. infiltration) as output variable names, and its algorithm name is SCS-CN. Figure 5 gives an example of encoding the component ‘SCS-CN_Com’ in RDF. For a more detailed description of RDF, please refer to Cyganiak et al. (2014).
Inference engine
There are three main steps for the inference engine to determine hydrological model structures (Figure 6).
The first step is to select m required hydrological processes through the RuleML-based inference method according to the application purpose, spatial and temporal scales, and watershed physiographic conditions. The collection of selected processes can be considered to be an abstract conceptual model. The second step is to identify ni (i = 1, 2, …, m; ni ≥ 1) appropriate algorithms for each process through the RuleML-based inference method according to the watershed physiographic conditions, data availability, and time step. The collections of these selected algorithms can be considered to be specific conceptual models. This study assumes that each algorithm corresponds to only one component in the specific modelling framework. Therefore, there are a total of component combinations. The third step is to check whether all the inputs for every component in a component combination are available. If so, the component combination will be selected as an alternative one. If not, the component combination cannot be executed and will be excluded. If there are multiple alternative combinations, the inference engine will select one of them as the optimal one. Then, the components of this optimal component combination are sequenced into an optimal workflow. The source code of the inference engine is available on GitHub (https://github.com/lreis2415/SEIMS/tree/master/knowledge/inference_code). The detailed designs of the RuleML-based and RDF-based inference methods are as follows.
RuleML-based inference method for process selection and algorithm identification
The RuleML-based inference method is designed to be forward chaining. Taking process selection as an example to illustrate the steps of the RuleML-based inference method, the first step is to load the process-selection knowledge from the knowledge base and store each rule in a ‘ruleml’ class object. The fields of the ‘ruleml’ class include the name of the rule, the ‘if’ part and the ‘then’ part. The ‘if’ and ‘then’ parts are object instances of the ‘atom’ class. The fields of the ‘atom’ class include ‘var’ (variables), ‘rel’ (predicates), and ‘ind’ (constants). The second step is to initialize the application context and to store it in a HashMap <context keyword, context value> map_c. The third step is to determine for each ‘ruleml’ object whether the ‘if’ part can be triggered by the application context. This step aims to determine whether the ‘var’ value obtained by map_c can match the ‘rel’ value of the ‘if’ part. If the ‘if’ part is triggered, then the ‘then’ part is used to determine whether the corresponding hydrological process is selected. That is, according to the semantic description of ‘rel’, it is determined whether the hydrological process described by the ‘ind’ in ‘then’ parts is selected. For example, by using the RuleML-based inference method to the process-selection knowledge (Figure 2), the subsurface flow process will be selected if the climate type value in the application contexts is ‘humid’.
RDF-based inference method for component coupling
The RDF-based inference method is used to determine whether the components of the selected algorithms are compatible and can be used to assemble a complete composite model. The RDF-based inference method consists of three steps. The first step is to load the RDF knowledge of all components from the knowledge base and store the RDF knowledge in a HashMap <algorithm name, ‘rdf’ class object> map_a. The fields of the ‘rdf’ class include component name, algorithm name, input variable names, and output variable names. The second step is to obtain the ‘rdf’ object set according to map_a for a given algorithm name set. The third step is to determine whether each input variable of the object can be satisfied (i.e. whether the data required for each input variable can be obtained from the existing data or the output of other components) for each ‘rdf’ object. Specifically, for each input variable, if the name of the input variable is included in the existing data names or the output variable names of other ‘rdf’ objects, the input variable can be satisfied; otherwise, the input variable cannot be satisfied. If all the input variables of each ‘rdf’ object can be satisfied, the corresponding component set can be coupled into a workflow; otherwise, the corresponding component set will be excluded.
The components are sorted by the dataflow among components. The dataflow can be abstracted as a directed graph. For a dataflow without loops, a directed acyclic graph (DAG) is used to describe the dataflow. Each component is treated as a vertex of the DAG, and the input–output relationship of the variables between two components is treated as a unidirectional edge of the DAG. The topological sorting method is used to determine the linear ordering of the DAG's vertices (i.e. the execution sequence of the components). For a dataflow with loops, which can be described by a directed cyclic graph (DCG), each loop needs to be broken to convert the DCG into a DAG.
Figure 7 shows an example of the cyclic dependency in coupling SUR_EXCESS (i.e. runoff yield under the excess infiltration component) with DEP_FS (i.e. depression component). SUR_EXCESS needs DPST (i.e. distribution of depression storage) as an input, which is the output of the DEP_FS. Meanwhile, DEP_FS needs EXCP (i.e. excess precipitation) as an input, which is the output of SUR_EXCESS. Due to this loop between SUR_EXCESS and DEP_FS, the sequence between these two components is undecidable. Note that hydrologists usually assume that SUR_EXCESS happens before DEP_FS. Therefore, the loop between these two components can be broken by removing the ‘DPST’ edge from the current temporal step of the simulation. Thus, the DCG can be converted into a DAG, whose sequence can be determined by the topological sorting method. According to the DAG, the SUR_EXCESS and DEP_FS components will be executed sequentially within the current temporal step of the simulation. The output of DEP_FS, DPST, will be the input to SUR_EXCESS in the next temporal step of the simulation.
Applying the method to a specific modelling framework
In the proposed method, knowledge of process selection and algorithm identification is framework-independent (or generic knowledge), while metadata of components is framework-specific (or local knowledge). The metadata descriptions of the components in a specific modelling framework could be different from the ontological descriptions of the generic knowledge. To apply the proposed method to a specific modelling framework, the metadata descriptions of the local knowledge should be mapped to the ontological description of the generic knowledge. Specifically, the metadata information of each component, including the algorithm to which this component belongs and its input/output variables, is first extracted from the modelling framework. Second, the variable names and the algorithm names of the components are described using standard names, if they are inconsistent with the ones in the standard-names library. Then, the metadata of components is formalized as component-coupling knowledge and stored in the RDF knowledge base. Thus, the proposed method can be applied to a specific modelling framework.
RESULTS
Integrated modelling framework and modelling knowledge
To evaluate the applicability of the proposed knowledge-based method, we applied the method in a hydrological integrated modelling framework (i.e. SEIMS). SEIMS consists of a parallel hydrological module library, a runtime environment (including model initialization, model execution, and model calibration), auxiliary functions (such as preprocessing, postprocessing, and data management), and a hydrological database. The hydrological module library of SEIMS contains over 30 modules for the main hydrological processes covering hydrological, crop growth, and nutrient migration/transformation processes (Qin et al. 2018). Each module includes one or several components. These components can be coupled to build models. The source code of SEIMS is available on GitHub (https://github.com/lreis2415/SEIMS).
Currently, there are a total of 47 rules and 208 RDF triples in the knowledge base (available at https://github.com/lreis2415/SEIMS/tree/master/knowledge).
Software prototype of the knowledge-based method
A software prototype called the Intelligent Hydrological Modelling Customizing System (IHMCS) has been developed as the shell of the knowledge-based method to assist modellers in the generation of hydrological model structures. IHMCS is a browser-based system (Figure 8), and it can be accessed on the URL (http://114.215.153.178:8080/IHMCS/main.jsp). This interface consists of a list of radio boxes and check boxes. These boxes represent the application context and are linked to the knowledge base. After users fill in the application context through the interface, the inference engine will be invoked to generate suitable model structures. The names of the selected hydrological processes, algorithms, and components are displayed in a grid form. The model structure can be saved in a configuration file in a format that SEIMS can utilize. It should be noted that the browser-side user interface is loosely coupled with the server-side inference engine. Different user interfaces can be designed to meet the needs of different users.
Experiments
Three hypothetical experiments and a real experiment are used to illustrate the capability of the knowledge-based method in the automatic determination of hydrological model structures.
Hypothetical experiments
Three different hypothetical experiments are designed. Each hypothetical experiment has an application context consisting of the application purpose, spatial and temporal scales, watershed conditions, data availability, and particular processes (Table 1).
Experiment . | Purpose . | Time scale/step . | Spatial scale . | Watershed conditions . | Data availability . | Particular processes . |
---|---|---|---|---|---|---|
Experiment 1 | Rainfall-runoff | Event/hourly | Small-sized | Humid, rural region | Precipitation | |
Experiment 2 | Evapotranspiration | Continuous/daily | Medium-sized | Humid, rural region | Precipitation, minimum temperature, maximum temperature | Snow |
Experiment 3 | Soil erosion | Event/hourly | Small-sized | Semi-humid, rural region | Precipitation |
Experiment . | Purpose . | Time scale/step . | Spatial scale . | Watershed conditions . | Data availability . | Particular processes . |
---|---|---|---|---|---|---|
Experiment 1 | Rainfall-runoff | Event/hourly | Small-sized | Humid, rural region | Precipitation | |
Experiment 2 | Evapotranspiration | Continuous/daily | Medium-sized | Humid, rural region | Precipitation, minimum temperature, maximum temperature | Snow |
Experiment 3 | Soil erosion | Event/hourly | Small-sized | Semi-humid, rural region | Precipitation |
As an example, the modelling context and model structure for the first hypothetical experiment are shown in Figure 8. The model structures for the three hypothetical experiments are listed in Tables 2–4, respectively.
Process . | Algorithm . | Component ID . |
---|---|---|
Interception | Hourly interception algorithm in WetSpa | PI_STORM |
Infiltration | Green–Ampt method | SUR_SGA |
Percolation | Darcy method | PERCO_DARCY |
Ground water | Linear reservoir method | GWATER_RESERVOIR |
Depression | Fill and Spill method | DEP_FS |
Subsurface flow | Kinematic wave method | INTERFLOW_IKW |
Overland flow | One-dimension kinetic wave formula | IKW_OL |
Channel flow | Hourly Muskingum method | CH_MSK |
Process . | Algorithm . | Component ID . |
---|---|---|
Interception | Hourly interception algorithm in WetSpa | PI_STORM |
Infiltration | Green–Ampt method | SUR_SGA |
Percolation | Darcy method | PERCO_DARCY |
Ground water | Linear reservoir method | GWATER_RESERVOIR |
Depression | Fill and Spill method | DEP_FS |
Subsurface flow | Kinematic wave method | INTERFLOW_IKW |
Overland flow | One-dimension kinetic wave formula | IKW_OL |
Channel flow | Hourly Muskingum method | CH_MSK |
Process . | Algorithm . | Component ID . |
---|---|---|
PET | Hargreaves method | PET_H |
Interception | Daily interception algorithm in WetSpa | PI_MSM |
Snow melt | Degree-day method | SNO_DD |
Soil temperature | Finn and Plauborg method | STP_FP |
Infiltration | SCS-CN method | SUR_CN |
Percolation | Brooks and Corey formula | PER_PI |
Subsurface flow | Darcy formula | SSR_DA |
Depression | Linsley method | DEP_LINSLEY |
Soil evaporation | Evaporation method in WetSpa | SET_LM |
Process . | Algorithm . | Component ID . |
---|---|---|
PET | Hargreaves method | PET_H |
Interception | Daily interception algorithm in WetSpa | PI_MSM |
Snow melt | Degree-day method | SNO_DD |
Soil temperature | Finn and Plauborg method | STP_FP |
Infiltration | SCS-CN method | SUR_CN |
Percolation | Brooks and Corey formula | PER_PI |
Subsurface flow | Darcy formula | SSR_DA |
Depression | Linsley method | DEP_LINSLEY |
Soil evaporation | Evaporation method in WetSpa | SET_LM |
Process . | Algorithm . | Component ID . |
---|---|---|
Interception | Hourly interception algorithm in WetSpa | PI_STORM |
Infiltration | Green–Ampt method | SUR_SGA |
Percolation | Darcy method | PERCO_DARCY |
Ground water | Linear reservoir method | GWATER_RESERVOIR |
Depression | Fill and Spill method | DEP_FS |
Subsurface flow | Kinematic wave method | INTERFLOW_IKW |
Overland flow | One-dimension kinetic wave formula | IKW_OL |
Splash erosion | Park equation | SplashEro_Park |
Overland erosion | Erosion–Govers method | KinWavSed_OL |
Channel flow | Hourly Muskingum method | CH_MSK |
Channel erosion | Srinivasan Galvao function | KinWavSed_CH |
Process . | Algorithm . | Component ID . |
---|---|---|
Interception | Hourly interception algorithm in WetSpa | PI_STORM |
Infiltration | Green–Ampt method | SUR_SGA |
Percolation | Darcy method | PERCO_DARCY |
Ground water | Linear reservoir method | GWATER_RESERVOIR |
Depression | Fill and Spill method | DEP_FS |
Subsurface flow | Kinematic wave method | INTERFLOW_IKW |
Overland flow | One-dimension kinetic wave formula | IKW_OL |
Splash erosion | Park equation | SplashEro_Park |
Overland erosion | Erosion–Govers method | KinWavSed_OL |
Channel flow | Hourly Muskingum method | CH_MSK |
Channel erosion | Srinivasan Galvao function | KinWavSed_CH |
During this model building procedure by IHMCS, the complicated professional details (i.e. selecting the processes, identifying the algorithms, and checking the compatibilities of the components) are transparent to the modellers. The modellers can easily obtain model structures. In addition, the model structures determined by IHMCS show that the proposed method can generate different model structures for different modelling contexts.
Real experiment
Application context
This real experiment aims to simulate the daily rainfall-runoff in the Meichuan River watershed in Jiangxi Province, China (Figure 9). The watershed has a drainage area of approximately 6,366 km2, with an elevation ranging from 151 to 1,425 m. The climate is humid and subtropical, with an average annual precipitation of 1,706 mm and a mean annual temperature of 17 °C. The lowest air temperature in the watershed is below 0 °C. The available spatial dataset includes a gridded digital elevation map, land-use map, and soil-type map. The dataset has a spatial resolution of 90 m. The available time series dataset includes the precipitation, minimum temperature, maximum temperature, and discharge data at a daily time step from 1 January 2002 to 31 December 2005 and from 1 January 2007 to 31 December 2010. There was a lack of observation data in 2006.
Model structure
The modelling context and model structure for the real application context are shown in Figure 10. The processes, algorithms, and components of the model structure are listed in Table 5.
Process . | Algorithm . | Component ID . |
---|---|---|
PET | Hargreaves method | PET_H |
Interception | Daily interception algorithm in WetSpa | PI_MSM |
Snow melt | Degree-day method | SNO_DD |
Soil temperature | Finn and Plauborg method | STP_FP |
Infiltration | SCS-CN method | SUR_CN |
Percolation | Brooks and Corey formula | PER_PI |
Subsurface flow | Darcy formula | SSR_DA |
Depression | Linsley method | DEP_LINSLEY |
Soil evaporation | Evaporation method in WetSpa | SET_LM |
Ground water | Linear reservoir method | GWA_RE |
Water balance | Soil water balance method | SOL_WB |
Overland flow | Nash unit hydrograph formula | IUH_OL |
Channel flow | Muskingum method | MUSK_CH |
Process . | Algorithm . | Component ID . |
---|---|---|
PET | Hargreaves method | PET_H |
Interception | Daily interception algorithm in WetSpa | PI_MSM |
Snow melt | Degree-day method | SNO_DD |
Soil temperature | Finn and Plauborg method | STP_FP |
Infiltration | SCS-CN method | SUR_CN |
Percolation | Brooks and Corey formula | PER_PI |
Subsurface flow | Darcy formula | SSR_DA |
Depression | Linsley method | DEP_LINSLEY |
Soil evaporation | Evaporation method in WetSpa | SET_LM |
Ground water | Linear reservoir method | GWA_RE |
Water balance | Soil water balance method | SOL_WB |
Overland flow | Nash unit hydrograph formula | IUH_OL |
Channel flow | Muskingum method | MUSK_CH |
The related knowledge and application contexts for hydrological process selection and algorithm identification are shown in Tables 6 and 7, respectively. The RDF-based reasoning method confirms that the components of these selected algorithms can be assembled as a complete workflow.
Related knowledge . | Application context . | Selected processes . |
---|---|---|
Process rule 1: If the purpose is rainfall-runoff simulation and the time step is daily, then infiltration/surface runoff, depression, soil evaporation, potential evapotranspiration, overland flow, channel flow, water balance, and soil temperature are selected | Purpose: rainfall-runoff simulation; Time step: daily | Infiltration/surface runoff, depression, soil evaporation, potential evapotranspiration, overland flow, channel flow, water balance, and soil temperature |
Process rule 2: If the spatial scale is medium-sized and the application purpose is rainfall-runoff simulation, then percolation and ground water are selected | Spatial scale: medium-sized; Purpose: rainfall-runoff simulation | Percolation and ground water |
Process rule 3: If the particular process is snow, then snowmelt is selected | Particular process: snow | Snowmelt |
Process rule 4: If the underlying surface type is rural, then interception is selected | Underlying surface type: rural | Interception |
Process rule 5: If the climate type is humid, then the subsurface flow is selected | Climate type: humid | Subsurface flow |
Related knowledge . | Application context . | Selected processes . |
---|---|---|
Process rule 1: If the purpose is rainfall-runoff simulation and the time step is daily, then infiltration/surface runoff, depression, soil evaporation, potential evapotranspiration, overland flow, channel flow, water balance, and soil temperature are selected | Purpose: rainfall-runoff simulation; Time step: daily | Infiltration/surface runoff, depression, soil evaporation, potential evapotranspiration, overland flow, channel flow, water balance, and soil temperature |
Process rule 2: If the spatial scale is medium-sized and the application purpose is rainfall-runoff simulation, then percolation and ground water are selected | Spatial scale: medium-sized; Purpose: rainfall-runoff simulation | Percolation and ground water |
Process rule 3: If the particular process is snow, then snowmelt is selected | Particular process: snow | Snowmelt |
Process rule 4: If the underlying surface type is rural, then interception is selected | Underlying surface type: rural | Interception |
Process rule 5: If the climate type is humid, then the subsurface flow is selected | Climate type: humid | Subsurface flow |
Related knowledge . | Application context . | Identified algorithms . |
---|---|---|
Algorithm rule for the interception process: If the time step is daily, then the maximum storage method is selected | Time step: daily | Daily interception algorithm in WetSpa |
Algorithm rule for the snowmelt process: If the time step is daily, then the degree-day method is selected | Time step: daily | Degree-day method |
Algorithm rule for the soil temperature process: If the time step is daily, then the Finn and Plauborg method is selected | Time step: daily | Finn and Plauborg method |
Algorithm rule for the infiltration process: If the time step is daily, then the SCS-CN method is selected | Time step: daily | SCS-CN method |
Algorithm rule for the depression process: If the time step is daily, then the Linsley method is selected | Time step: daily | Linsley method |
Algorithm rule for the potential evapotranspiration process: If the climate inputs are maximum temperature and minimum temperature, then the Hargreaves method is selected | Climate input: maximum temperature and minimum temperature | Hargreaves method |
Algorithm rule for the percolation process: If the time step is daily, then the Brooks and Corey formula is selected | Time step: daily | Brooks and Corey formula |
Algorithm rule for the soil evaporation process: If the time step is daily, then the Thornthwaite and Mather formula is selected | Time step: daily | Thornthwaite and Mather formula |
Algorithm rule for the subsurface flow process: If the time step is daily or hourly, then the Darcy formula is selected | Time step: daily | Darcy formula |
Algorithm rule for the overland flow process: If the time step is daily or hourly, then the Nash unit hydrograph formula is selected | Time step: daily | Nash unit hydrograph formula |
Algorithm rule for the ground water process: If the time step is daily, then the linear reservoir method is selected | Time step: daily | Linear reservoir method |
Algorithm rule for the water balance process: If the time step is daily, then the soil water balance method is selected | Time step: daily | Soil water balance method |
Algorithm rule for the channel flow process: If the time step is daily or hourly, then the Muskingum method is selected | Time step: daily | Muskingum method |
Related knowledge . | Application context . | Identified algorithms . |
---|---|---|
Algorithm rule for the interception process: If the time step is daily, then the maximum storage method is selected | Time step: daily | Daily interception algorithm in WetSpa |
Algorithm rule for the snowmelt process: If the time step is daily, then the degree-day method is selected | Time step: daily | Degree-day method |
Algorithm rule for the soil temperature process: If the time step is daily, then the Finn and Plauborg method is selected | Time step: daily | Finn and Plauborg method |
Algorithm rule for the infiltration process: If the time step is daily, then the SCS-CN method is selected | Time step: daily | SCS-CN method |
Algorithm rule for the depression process: If the time step is daily, then the Linsley method is selected | Time step: daily | Linsley method |
Algorithm rule for the potential evapotranspiration process: If the climate inputs are maximum temperature and minimum temperature, then the Hargreaves method is selected | Climate input: maximum temperature and minimum temperature | Hargreaves method |
Algorithm rule for the percolation process: If the time step is daily, then the Brooks and Corey formula is selected | Time step: daily | Brooks and Corey formula |
Algorithm rule for the soil evaporation process: If the time step is daily, then the Thornthwaite and Mather formula is selected | Time step: daily | Thornthwaite and Mather formula |
Algorithm rule for the subsurface flow process: If the time step is daily or hourly, then the Darcy formula is selected | Time step: daily | Darcy formula |
Algorithm rule for the overland flow process: If the time step is daily or hourly, then the Nash unit hydrograph formula is selected | Time step: daily | Nash unit hydrograph formula |
Algorithm rule for the ground water process: If the time step is daily, then the linear reservoir method is selected | Time step: daily | Linear reservoir method |
Algorithm rule for the water balance process: If the time step is daily, then the soil water balance method is selected | Time step: daily | Soil water balance method |
Algorithm rule for the channel flow process: If the time step is daily or hourly, then the Muskingum method is selected | Time step: daily | Muskingum method |
Simulation results
The model parameters are prepared using the preprocessing tools in SEIMS. The model is calibrated using the discharge data from 2002 to 2005 and validated using the data from 2007 to 2010. Overall, it is found that the estimations agreed well with the observations, although the simulations in some certain high water periods are a little imprecise. The Nash–Sutcliffe efficiency (NSE) is used to evaluate the simulation accuracy. The NSE of the resulting hydrological model for the calibration period is 0.90 (Figure 11), and the NSE of the validation period is 0.88 (Figure 12). The simulation accuracy is acceptable.
DISCUSSION
The above results show how the proposed knowledge-based method could support modellers in the automatic determination of hydrological model structures for different application contexts. To further illustrate the adaptability of the proposed method, we changed some conditions in the real experiment and obtained different model structures accordingly. The conditions and corresponding changes of model structures are as follows:
- (a)
If the study area size is assumed to be small-sized rather than medium-sized and the climate type is assumed to be arid rather than humid, the subsurface flow, percolation, and ground water processes are not selected by the proposed method.
- (b)
If the modelling purpose is assumed to be soil erosion simulation rather than rainfall-runoff simulation, the sediment yield process is added to the model structure.
- (c)
If the modeller assigns the maximum temperature, minimum temperature, relative humidity, and solar radiation as the meteorological data input, the Priestley–Taylor algorithm rather than the Hargreaves algorithm is selected for the PET process.
The contrast of the model structures in the different situations is shown in Table 8.
Process . | Situation of a real experiment . | Situation a . | Situation b . | Situation c . |
---|---|---|---|---|
PET | √ | √ | √ | √ (→)PET |
Interception | √ | √ | √ | √ |
Snow melt | √ | √ | √ | √ |
Soil temperature | √ | √ | √ | √ |
Infiltration | √ | √ | √ | √ |
Percolation | √ | ○ | √ | √ |
Subsurface flow | √ | ○ | √ | √ |
Depression | √ | √ | √ | √ |
Sediment yield | ○ | ○ | √ | ○ |
Soil evaporation | √ | √ | √ | √ |
Ground water | √ | ○ | √ | √ |
Water balance | √ | √ | √ | √ |
Overland flow | √ | √ | √ | √ |
Channel flow | √ | √ | √ | √ |
Process . | Situation of a real experiment . | Situation a . | Situation b . | Situation c . |
---|---|---|---|---|
PET | √ | √ | √ | √ (→)PET |
Interception | √ | √ | √ | √ |
Snow melt | √ | √ | √ | √ |
Soil temperature | √ | √ | √ | √ |
Infiltration | √ | √ | √ | √ |
Percolation | √ | ○ | √ | √ |
Subsurface flow | √ | ○ | √ | √ |
Depression | √ | √ | √ | √ |
Sediment yield | ○ | ○ | √ | ○ |
Soil evaporation | √ | √ | √ | √ |
Ground water | √ | ○ | √ | √ |
Water balance | √ | √ | √ | √ |
Overland flow | √ | √ | √ | √ |
Channel flow | √ | √ | √ | √ |
√: the corresponding process is selected.
○: the corresponding process is unselected.
(→)PET: the Priestley–Taylor algorithm rather than the Hargreaves algorithm is selected for the PET process simulation.
Using this proposed method, modellers do not need to master all of the modelling knowledge, and it is easy for modellers to determine suitable models using integrated modelling frameworks. This method can reduce the modelling knowledge requirement of modellers and enable more users, especially when hydrology is not the core area of their expertise, to make use of hydrological modelling to serve their own work. With the intelligent modelling interface, modellers can use integrated modelling frameworks in a simpler and higher-level way. Users who are not familiar with modelling frameworks or components can also participate in modelling. The latter would be conducive to the promotion of integrated modelling frameworks. It can also help experts save the time required for modelling by providing primitive solutions for them.
Actually, the rationality of model structures depends on the quality and quantity of the knowledge base. In this study, algorithm selection is mainly constrained by time step and data availability, but other limiting factors for algorithm selection should also be considered in the future. The incompleteness of the knowledge base may cause the infeasibility of our proposed method in relatively complex environments. In addition, regional difference knowledge is very important, and it should also be included in the knowledge base. The knowledge base on GitHub needs to be constantly improved and expanded. Contributions to the knowledge base from the community are welcomed.
It should be noted that certain basic knowledge on hydrological modelling is always needed for hydrological modellers. For example, modellers should understand the general knowledge of hydrology (e.g. watershed conditions, how to discover and preprocess data, how to prepare parameters, and how to calibrate model parameters). Without the basic knowledge, it is still difficult to build hydrological models and interpret the simulation results properly.
CONCLUSIONS AND FUTURE WORK
In this paper, a knowledge-based method is proposed to automatically determine hydrological model structures. Specifically, the knowledge on process selection and algorithm identification is formalized in RuleML, and the knowledge on the component coupling is formalized in RDF. Then, an inference engine is implemented to generate suitable model structures according to the formalized knowledge. Three hypothetical experiments and a real experiment are used to demonstrate how the proposed knowledge-based method can assist modellers in the determination of model structures for different application contexts. This method has the potential to reduce the modelling knowledge burden on modellers and would be conducive to the promotion of integrated modelling frameworks.
The construction of the knowledge base is a long-term project. In the future, we will continue to expand the knowledge base to enhance the feasibility of the proposed knowledge-based method in complex environments. We will attempt to develop a knowledge management platform for modellers to use to share their hydrological modelling knowledge and experience. In addition, future work should utilize other types of knowledge, such as case-based knowledge, to make hydrological modelling smarter and more robust.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (Nos. 41601423, 41601413, and 41431177), the Innovation Project of LREIS (No. O88RA20CYA), and the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province. The support received by A-Xing Zhu through the Vilas Associate Award, the Hammel Faculty Fellow, and the Manasse Chair Professorship from the University of Wisconsin–Madison is greatly appreciated.