Abstract
Water reservoirs are essential to ensure water supply to both the population and agriculture, especially in the Mediterranean basin. In some cases, analyses of water intended for human consumption have detected high levels of agrochemicals. Knowing the possible origin of these products is complex because there may be many agricultural plots within the reservoir basin. In this paper, we introduce a methodology to obtain the set of agricultural plots whose rainwater reaches the reservoir and in what proportion they affect the points where chemical analyses are performed. The method implements an extension of the D8 algorithm for the calculation of the drainage network, in which additional information about the land-use type of the area, as well as rainfall maps, are also considered. In order to facilitate the user's analysis of the data, a plugin has been implemented in QGIS. This allows usability and easy interaction with the visual information. The Rumblar reservoir basin, located in Andalusia (Spain) has been studied as a use case, surrounded by olive orchards. The result is a replicable methodology for any other water reservoir and for carrying out an individualized study of agricultural plots.
HIGHLIGHTS
Water reaching a reservoir is of great environmental interest.
Farms that contribute the most water to the reservoir can be found out.
Modification of the D8 algorithm allows adding new functionality to the drainage network creation process.
The methodology allows for the addition of real information such as land use type and rainfall maps.
The end-users can use an interactive tool implemented as QGIS plugin.
INTRODUCTION
In the countries of the Mediterranean basin, such as Spain, many municipalities are supplied with drinking water stored in reservoirs. In this area, rainfall is scarce and mostly concentrated in autumn and spring. In general terms, the pluviometry is very changeable, with particularly dry periods that seriously affect the agricultural sector. Thanks to the reservoir infrastructure, the supply to the citizens has been assured so far. However, not only is the quantity of water of concern, but also its quality indexes.
For this work, we have studied a specific case, the Rumblar reservoir, located in the province of Jaén (Spain), which supplies freshwater to about 88,200 inhabitants belonging to the Rumblar Consortium. Company technicians in charge of the reservoir have observed that, after a period of rainfall, chemical analysis reveals a higher concentration of nitrates and other unwanted components. These substances come mostly from olive groves, which are predominant in the entire river basin of the area. This situation is aggravated when it rains after a long period of drought. At the beginning of the year 2023, the reservoir is at only 10% of its capacity. This water scarcity increases the concentration of fertilizers and phytosanitary products from agricultural activities.
This paper is focused on developing a methodology for the traceability of the water (and consequently of entrained chemicals) that reaches a reservoir, in order to determine their possible origin considering the cultivation areas of the basin that supplies water to the reservoir. We have focused our study on the previously described area of the Rumblar, but the methodology can be extended to any other reservoir or rush of water whose watershed is affected by agricultural activity. Preliminary studies carried out using Geographic Information Systems (GIS) tools do not allow a true traceability of the origin of these discharges, as they use only the digital elevation model (DEM) as input data. Nor have we obtained results using simulation tools even though realistic parameters are taken into account, since they do not allow customizing input files to better adapt the simulation to the realities of the area. As a consequence of the above, we have developed our own method based on the calculation of the drainage network. In this way, we have been able to control the entire path of rainwater throughout the basin that drains into the reservoir while considering its origin and the land use type that is carried out. Although the study area is not very large (the reservoir basin), rainfall maps have been considered as input data. These maps show certain differences in rainfall over a given period of time. We have also added parameters related to soil absorption and the orography of the terrain.
The results are promising since it is possible to obtain both the farms from which the water comes and the percentages that affect the sample. In this way, water samples taken in nearby runoffs can know the most probable origin and thus make a follow-up of the type of agricultural activity they carry out. This makes it possible to focus the study on certain sites to determine whether they are performing this activity in accordance with current regulations.
The paper is structured as follows. Section 2 elaborates on this topic and describes similar solutions in the literature. In Section 3, we describe the work area and the input data needed to run the process. The methodology carried out to obtain water traceability is described in Section 4. In terms of application usability, a plugin has been implemented, as introduced in Section 5. Section 6 analyzes the results and usability of the QGIS plugin. Section 7 shows conclusions and future work, and finally, Section 8 provides links to data and source code repositories.
STATE OF THE ART
Soil, along with air and water, is one of the environmental resources that most influences the environment and biodiversity. Soil conditions are mainly affected by the use made of it by human beings, especially in agriculture and livestock farming. In recent decades, the use of chemicals in agriculture has become widespread for various reasons, such as increasing soil fertility and eliminating pests and weeds. As a result, some areas have been affected by an excess of fertilizers, pesticides and herbicides (Lizaga et al. 2020).
In particular, nitrogen is one of the essential elements for plant growth. Nitrates are the natural result of atmospheric nitrogen fixation and the decomposition of organic matter by microorganisms. Low concentrations of nitrates are found naturally in surface and groundwater; however, the amounts of nitrogen deposited in the soil are due to human activities that contribute to elevated concentrations in the aquatic environment (Min et al. 2002; Yue et al. 2020; Kim et al. 2023; Qiu et al. 2023). Fertilizers are one of the main anthropogenic sources of nitrogen, used to improve growth and increase crop production. Due to its importance and the trans-boundary nature of nitrate pollution in Europe, Council Directive 91/676/EEC of 12 December 1991 concerning the protection of waters against pollution caused by nitrates from agricultural sources was adopted as early as 1991.1 However, it is noted that progress has been made in water protection, but that certain obligations need to be clarified or additional measures put in place in order to improve this objective.
The Spanish government has recently approved a national regulation (BOE 2022) for the protection of water against diffuse pollution caused by nitrates from agriculture. In this sense, the European Union and the United Nations are carrying out initiatives to promote efficient soil management to achieve the great challenge of increasing food production while minimizing soil degradation through sustainable development plans. According to the European Commission's proposals, sustainable soil management will be a crucial component of several objectives in the coming years, in particular those focusing on landscapes and biodiversity, natural resources and climate action. Due to its starting in 2023, the new common agricultural policy for 2023–2027 (CAP 2021) will include stronger support for healthy soil, in line with the goals of the European Green Deal. As indicated in the regulatory reports, the problem of the extensive use of these chemicals is both in their passage into the food chain and also into aquifers or water sources. This requires a thorough knowledge of the agricultural plots and their impact along the drainage basin.
In short, addressing the challenge of efficient land and water management requires action on several fronts, including the ICT (Information and Communication Technologies) capabilities. The European Commission is determined to make this Europe's ‘Digital Decade’. The Global Soil Partnership (GSP) of the Food and Agriculture Organization of the United Nations (FAO) has established as one of its objectives (Pillar 4): Improving the quantity and quality of soil data and information: data collection (generation), analysis, validation, reporting, monitoring and integration with other disciplines (FAO-ITPS 2020). In general terms, this involves the continuous processes of data collection from the physical sites, their storage in specific information systems and subsequent analysis to reach conclusions and perform concrete actions.
Depending on the specific problem to be addressed, diverse types of data can be captured, ranging from the chemical and biological properties of the soil, the terrain orography or the land use, whether agricultural or forestry. This heterogeneous information must be processed considering the geospatial nature of these variables that contribute to correlating crop inventory and water management (Mitran et al. 2021).
We are interested in monitoring not only the soil but also the water that flows through these territories and eventually reaches the water reservoirs that serve both for agricultural irrigation and human consumption. The scarcity and pollution of freshwater throughout the planet is a recurring concern, especially in the Mediterranean basin. It also should be considered that part of the rainwater that falls infiltrates into the ground, filling pores and fissures which, when saturated, cause the water to flow by gravity towards springs, rivers or seas, giving rise to underground runoff.
Water quality is measured with different indexes and techniques. Normally a chemical analysis is carried out to detect different types of substances in the water. However, this is often time-consuming, costly and gives the information on a specific date and position. In order to provide water analysis over wide areas and over time, multispectral satellite images are being used for mapping water quality considering parameters such as optical clarity or colored dissolved organic matter (Griffin et al. 2018; Sufia Sultana & Dewan 2021). The problem with satellite images is that sometimes the frequency or intervals of capture does not correspond with those required. For this reason, some authors choose to use other types of remote capture mechanisms using airplanes and Unmanned Aerial Vehicles (UAVs) (Sibanda et al. 2021; Isgró et al. 2022).
In any case, this tells us whether or not there is deterioration in the quality of the water and the type of substance that affects it, but not its most likely origin. In order to carry out this type of study, it is necessary to know the topography of the terrain and the land use throughout the basin that discharges water into the reservoir. Considering that chemicals are washed away by water, watersheds, streams and rivers, the drainage network represents an important data structure for modeling the distribution and movement of surface water over the terrain. The most important feature used in this case is the cartographic representation of the surface given by the DEM.
Traditionally, this and similar studies have been carried out using GIS tools. In particular, it is possible to find both free and commercial software performing hydrological analysis. Some examples are ESRI's ArcGIS, QGIS or Saga, which include standard tools to extract the stream segments and basin watersheds from DEMs under the premise that water flow is governed by the terrain topography. The drainage system of a particular drainage basin provides the patterns formed by rivers and streams which tend to discharge water into major rivers or marshes or lakes. Along these paths, the water carries soil and the organic and inorganic substances it contains.
The hydrological algorithms supported by GIS provide different outputs such as drainage basins and networks (Jasiewicz & Metz 2011). However, the solutions they provide are conditioned by pre-established parameters and variables that in some cases are not adapted to the reality of the characteristics of the work area or the type of analysis to be carried out (Li 2014). Often, the models of certain software are better adapted to some specific conditions of the study area and poorly adapted to others. This is shown by works such as Astagneau et al. (2021), which review the advantages and disadvantages of using hydrological models implemented within packages in the R environment and compare and evaluate their applicability and usability. Most of the algorithms implemented in GIS work well with satellite images, such as Landsat, where the spatial resolution is low (Still & Shih 1985). However, when a greater level of detail is required, they need input files including the drainage network but without any control over the calculation process workflow.
A hydrological model is often needed to analyze spatially variable hydrologic behavior. Such a model can be difficult to set up, especially for an ungauged basin, as it demands a massive amount of data. Moreover, there is an additional challenge of selecting a proper grid resolution, which generally leads to predictive uncertainty and also directly determines the amount of work required. In this sense, satellite Earth observations are an accurate and reliable source of data. Their increasing spatial and temporal resolution, as well as their availability, make them attractive for hydrological modeling. Data such as precipitation, evaporation, soil moisture or snow depth obtained by satellite are incorporated into the structure of the hydrological model through a data assimilation scheme (Alfieri et al. 2022). This type of model, although very effective, nevertheless requires multidisciplinary and not merely geometric knowledge of the territory, which is not always feasible. A model that integrates all these parameters but that can be inserted in a user-friendly way is one of the challenges to be met in the work presented here.
In order to generate the drainage network, traditional methods use the DEM as the only input data. That is, they are based solely on the geometry of the terrain and not on other types of factors such as the absorption capacity of the soil. Neither do they take into account the possible sources of groundwater. Additionally, the solution is sometimes a set of 2D maps and data tables that need to be interpreted by qualified personnel. Breaking the abstraction barrier through user-friendly interfaces is important when actions are to be carried out by farmers and technicians without GIS training.
On the other hand, fluid simulation software could also provide interesting results to this monitoring problem. However, these are also closed tools in which the process cannot be manipulated at each iteration of the algorithm. In addition, they do not provide real-time response when handling large areas, such as a real river basin. On the contrary, they are able to generate realistic 3D visualization and simulation, which is well received because it is easily understood by authorities and non-technical personnel (Ramos et al. 2022).
For all of the above, many researchers have opted for specific implementations to control those aspects of the hydrological process. ICT techniques are present in many different forms. Statistical analysis is increasingly used, but mostly to correlate different hydrochemical parameters (Yue et al. 2020). For instance by means of Bayesian models on the related information about chemicals present in water and the land-use types (Kim et al. 2023). The source of data is normally soil and water sampling. In any case, this implies chemical analysis in many different locations which does not give direct information about its possible origin. The methodology proposed in this work helps to perform a more efficient sampling, allowing it to go more directly to the source of the substances under study.
Accurate tracing methods are also based on hydrologic analysis, more focused on the study about the distribution of water in the interest area. The study of the drainage network has been used for the study of soil composition in floodplain soils (Kaur et al. 2023), or the generation of geochemical maps of alluvial sediments (Lipp et al. 2023). The drainage system is normally computed as a mathematical graph or by triangulated irregular networks (TIN) (Rheinwalt et al. 2019). One of the most used graph-based methods implements the D8 algorithm for the construction of the drainage network (O'Callaghan & Mark 1984). The original algorithm has been improved and modified in different ways, for instance, to consider also specific geomorphological characteristics (Dávila-Hernández et al. 2022). In particular, it has been adapted to solve definite problems where the original versions of the algorithm do not provide valid solutions, such as in flat or steep areas (Gwak & Kim 2013; Ye et al. 2022) or when seepage occurs in the subsoil (Sahoo & Sahoo 2019). At other times, the focus is more on particular applications, such as obtaining flood-risk areas (Omran et al. 2016; Shen & Zhang 2018). Discharges of different substances in storm-water are of great environmental concern. Drainage systems of specific locations have been studied in relation with the presence of chemicals such as nitrogen or phosphorus (Yue et al. 2020; Kim et al. 2023; Qiu et al. 2023), or microplastics in marine environments (Lutz et al. 2021). In these studies, the focus is on the chemical analysis of these substances, rather than on their origin. Thus, to the best of our knowledge, these approaches have not been used for determining the areas from which they initially come from.
In short, the traceability of water reaching a reservoir is an interesting issue, which is not directly solved by GIS or simulation software. We propose a methodology to solve this problem based on a modification of the D8 algorithm. This new extended version allows more parameters to be considered in the calculation of the drainage network. Thus, it allows to load additional input information such as land uses, precipitation maps, and the different absorption rates of the soil, in addition to the terrain orography. This gives the algorithm greater versatility than the other functionalities provided by the conventional D8 method. In addition, a QGIS plugin has been implemented to facilitate the user's task of analyzing the data obtained by the process.
STUDY AREA AND DATASET
In this work, the hydrographic elements of the study area are obtained from the DEM with a spatial resolution of 2 × 2 m that forms part of the Spanish Spatial Data Infrastructure (IDEE) platform which follows INSPIRE Regulation (EC) No 1205/2008. This model was downloaded from the Download Center of the Autonomous Section of the National Center for Geographic Information.3 It is a model obtained by interpolation from the terrain class of the LIDAR flights of the second coverage of the National Aerial Orthophotography Plan (PNOA).4 The covered area is 56,497 ha.
Another important layer of information to be integrated into the system, and which will allow the identification of areas to monitor the dumping of phytosanitary products, is the land use types in the Rumblar basin (forest, agricultural or urban area). The land-use types established in the information layer have been obtained from the Land Cover Map in Andalusia (SIOSE-Andalucía 2016). This information is compiled by CORINE Land Cover and follows the Guidelines of the High Geographical Council and INSPIRE directive (INSPIRE 2023), which were obtained by photo-interpretation on satellite images Sentinel 2, Landsat in false color and Spot.
The cartographic data on land occupation are available in WMS (Web Map Service) format. The information shown to the user has different levels of disaggregation depending on the visualization scale chosen. At small scales, a synthetic classification of only four categories is shown, corresponding to the main types of use (forest, agricultural, artificial areas and water surfaces); at medium scales, there is an intermediate level of disaggregation with 16 categories that specify the type of use within each synthetic group; and at large scales only the polygon contours are shown, which are classified into 182 occupancy classes that can be found by interactive consultation, using the GetFeatureInfo property of the WMS. This feature, which allows the detailed occupancy of each parcel to be queried, is available at any scale.
METHODOLOGY
First phase: obtaining representative points (RPs)
The first phase of the algorithm is described in Figure 3. Step 1 pre-processes the input DEM of the terrain and prepares it to obtain the drainage network, following the method of O'Callaghan & Mark (1984). Initially, each cell C(i,j) of the file is assigned with a specific direction codified as , representing the eight directions where the cell C(i,j) drains its water accumulation at each iteration of the process. This is codified, respectively, as {East(), South-East(), South(), South-West(), West(), North-West(), North() and North-East()}. This direction is computed considering which of the neighboring cells have the steepest slope regarding C(i,j). These directions will be used for obtaining the drainage network, as described below.
Once directions are fixed, the following process is detecting the so-called sinks, considered as those cells that do not discharge water to their neighbors, that is, they only accumulate water. Depending on the approach followed, these types of points are worked on differently. In our approach, it is assumed that no natural lakes are located in the current orography and therefore they are seen as data errors or artifacts. Thus, the method fills sinks in order to force them to drain into one of the surrounding neighbors as in Jenson & Domingue (1988). This approach does not generate any problems in the methodology and does prevent accumulation areas outside the reservoir.
Once the DEM is pre-processed, as Figure 3 describes, the D8 algorithm obtains a drainage network, similarly to the one implemented in O'Callaghan & Mark (1984). However, this new version of the algorithm also finds a set of representative points (RPs) following Algorithm 1. The cell C(i,j) is considered an RP if a significant accumulation of water has passed through at any time during the drainage calculation process. We are going to focus our study on these points in the second phase of the methodology. Sometimes these RPs are junction points between different torrents or streams. Definitely, they are strategic points in which to focus the study over time.
Algorithm 1 summarizes this process. As input data, the algorithm provides the grid of cells, , each of these cells with this information:
h: height of the terrain in the area associated with the cell.
d: direction of the cell where water is poured codified as .
lu: the code of the land use associated with this cell. As referred to in Section 3, each cultivation plot has a different codification. Along the drainage calculation process, this value is propagated to the RPs and the reservoir.
ab: the percentage of absorption associated with that particular land use type. This value also records the slope of the terrain in order to consider whether the water advances faster or slower in the cell area (see also Section 3).
rp: is a Boolean value initialized false that indicates if this cell is a Representative Point or not. This is considered the output of the algorithm.
Algorithm 1. Calculate Representative Points (RP)
Input:, for each
Input:
Output: Cell , for each cell
1:local variables
2: Integer, , i, j, k,
3: Boolean
4: Float oldf, newf
5:end local variables
6:BEGIN
7:for alldo
8:
9:
10:end for
11:while ()
12:
13: for alldo
14:
15: ifthen
16:
17:
18:
19:
20:
21: end if
22: end for
23: for alldo
24: ifthen
25:
26: end if
27:
28:
29: end for
30:end while
31:END
The method is an iterative process in which all cells with accumulated water yield in parallel their contents to one of their eight neighbors, according to the previously assigned direction. The algorithm uses two matrices in order to simulate this synchronous water transfer at an instant of time. This allows it to work sequentially to perform a process of a parallel nature. A complete transfer of water over the entire surface is considered a full iteration of the algorithm (Alg1:while loop). In the original algorithm, all cells are given one water unit to start the simulation process in matrix (Alg1:line 8). We have also used rainfall maps as input data instead, as described above, that are not included in the algorithm for simplicity. In case of being considered, as these maps have the same DEM resolution, the corresponding value is obtained from the corresponding cell of the image. On the other hand, and according to studies of the area, there are no natural springs in the area of interest. If this were to occur in any specific site, the method would add a higher value to the associated cell. The water that accumulates the matrix is discharged into the matrix at the end of each iteration of the process (while loop) in order to simulate the parallelism. At the end of each of these iterations, gets the values of to continue with the iterative process (Alg1:line 27), and the matrix is initialized to zero to obtain new values in the next loop.
As cells give their contents to other cells, many cells receive from one or more neighboring cells their accumulated contents. The algorithm finishes when no more water is moved in any cell during the course of an iteration. This condition is reflected with the variable , which determines if the process continues (Alg1:line 11). Once into the loop, the variable is set to false (Alg1:line 12) but again reverted to true in case any water movement is detected (Alg1:line 16). This implies that a new iteration of the algorithm is performed until no more water is transferred along the matrix.
The first loop inside the while (Alg1:lines 13–22) represents the synchronous movement of all cells in unison to pour the contents of into a neighboring according to the previously defined direction d. Each cell always pours in one cell (except those on the edge).
Not all content of is eventually discharged into the neighboring cell. The original method of the algorithm only provides the drainage network and all water units move through the drainage system as if they were particles that are not lost along the process. In a real river basin, not all rainwater is incorporated into the flow to the reservoir. Several factors are involved, such as the amount of water that falls, the slope of the land, its absorption capacity or evaporation factor. These parameters are considered as follows: (1) The area under study has no natural sources, so all runoff is rainwater from the watershed. (2) Evaporation is not considered in this method. In practice, this implies considering this factor as a constant throughout the study territory. (3) Taking advantage of the fact that we have zoned in plots the entire area of the site, and all of these polygonal areas have an associated land use type, we have associated this type of land with a specific absorption factor, which can be configured individually. This factor takes into account whether a forested area has a greater capacity to absorb rainfall than an area used for olive cultivation. (4) Rainfall is quite homogeneous in the study area, in any case, rainfall maps have been added to be considered the methodological level. These maps modify the original algorithm by increasing or decreasing the initial poured water in each cell. (5) The slope of the terrain is considered as it is a parameter associated with the topography of the terrain. For each cell and for each iteration of the drainage algorithm, it is considered what percentage of water is yielded to the neighboring cell and what percentage is lost by absorption. This reduces the percentage of water falling in the highlands that reaches the reservoir, but does not nullify it. All these factors are considered in order to take advantage of the knowledge of the terrain, described in Section 3, in order to be more accurate in assessing the origin of the flows.
The second loop (Alg1:lines 23–29) is executed once all cells have received water from their neighbors. Then, it is possible to know which of them have received a greater amount of water than others. If they exceed a fixed threshold value, they are considered as RP or representative points. This threshold value is defined in advance as input data to the algorithm and can be changed in order to obtain more or less of these points. A flow rate greater than the threshold may pass through the cell on more than one occasion. The first time this happens, the rp value is set to true forever (Alg1:line 25). Finally, the loop interchanges the role of with the matrices and initializes the latter to zero in order to start another iteration.
In summary, the first phase of the algorithm obtains the drainage network and obtains the RPs as described in Algorithm 1. It is possible to add specific additional representative points (ARP) that are treated as the original RPs. These are points expressly chosen by the technicians to carry out chemical analyses. All this information is the input data of the second phase of the methodology: the study of the traceability of the water reaching these points under study.
Water traceability study
Once RPs are obtained in Algorithm 1 (Step 1 in Figure 3), the next step is to add the information associated with the sequence of the drainage process at each of these RPs, represented as Step 2 in Figure 3. It is possible to include new RPs customized by the user as an annexed file (previously defined as ARPs). These can be useful if the technicians in charge of monitoring water quality want to follow up on a specific geographical point.
Algorithm 2. Traceability of RPs
Input: Cell , for each
Input:
Output: Cell , for each cell
1:local variables
2: Integer, , i, j, k, l,
3: Boolean
4: Float oldf, newf
5:end local variables
6:BEGIN
7:for all
8:
9:
10:end for
11:while() do
12:
13:
14: for alldo
15:
16: ifthen
17:
18:
19:
20:
21:
22: end if
23: end for
24: for alldo
25:
26: ifthen
27:
28: end if
29:
30:
31:
32: end for
33:end while
34:END
Algorithm 2 describes the process. The procedure is similar to that of Algorithm 1, but this time a specific data structure is associated with each RP in each iteration of the process. In both algorithms, the accumulated water from one cell is poured into the cell next to it, but now at each iteration of the process, it also adds information about the original plot from which the water comes. Summarizing, each RP point stores for each iteration:
The algorithm iteration number in order to know the temporal sequencing.
The water cumulus that passes through (which coincides with that of Algorithm 1).
The number of the parcel where the water comes from. This reference number is associated with a type of land use.
Thus, for N RPs and K iterations of the algorithm and with P plots cultivated inside the basin of the reservoir, we can obtain a data structure with a capacity of data. However, this is considered the worst case, since the information is only stored for those iterations where there has been flow movement. Additionally, only those plots pouring water into an RP in a specific iteration are considered.
Algorithm 2 works as follows: it initializes the accumulation flow matrices (lines 7–10). Then, it performs the accumulation values during the first for-loop (lines 14–23, process identical to that of Algorithm 1). The second for-loop (lines 24–32) adds to each tc data structure, and for each iteration the new set of two values: flow accumulation and land use type. If is an RP, then the TC data structure stores the temporal sequence. TC (in capital letters) is only available for RPs. Otherwise, the structure is finally reset in order to obtain new values in the next iteration.
The result is a set of RP which stores for each iteration both the cumulus water that has been passing through the cell and its land use origin. This data sequence is later analyzed in the following section of the process with the help of a QGIS plugin (see Figure 3).
QGIS PLUGIN FOR WATER TRACEABILITY
QGIS is a versatile open-source Geographic Information System for managing spatial data. Among its features, we highlight the capacity of adding Python plugins with new functionality. Plugins are developed by independent organizations and developers and can be used by the user community.
In order to facilitate the analysis task to technicians about the agricultural plots discharging rainwater on these RPs, we have developed a specific QGIS plugin. This way the data analysis requires only a few notions about this GIS tool.
The geo-referenced DEM file in tiff format within the terrain rectangle in which the RPs obtained by Algorithm 1 are included. This region should include the entire reservoir basin.
The drainage network image was obtained from Algorithm 1 (tiff format). This is not compulsory but interesting to observe the route of the water. Figure 5(a) overlaps the DEM and the drainage network.
The RPs also calculated by Algorithm 1 are specific points of this drainage network (csv format).
The geometry of agricultural plots within the basin of the reservoir area (.shp format). The geo-referenced information of this file includes the polygon of each plot, the type of land use (agricultural, forestry, etc.) and the parcel number according to the standard numbering. Figure 5(b) shows RPs and the plot polygons.
The output file (csv format) of Algorithm 2 with the information obtained associated with each RP. As mentioned in Section 4, each of these RPs has an entry for all those plots, which sooner or later, pour water into that specific point position.
The RP layer (.asc) with the information obtained from Algorithm 2 for all RPs, and ARPs if necessary.
The polygon layer of plots (.shp).
If the checkbox Load existing result layer is unchecked, the process of searching for plots associated with the selected RP is done from the beginning, generating a new layer with the found plots. Otherwise, the layer used has been previously generated in another execution of the plugin, and can be directly selected.
After the plugin execution, the list of plots whose water has passed through the selected RP is presented, as depicted in Figure 6(b). The next step is the analysis of the results.
RESULTS AND DISCUSSION
This section first analyzes the impact of cumulative rainfall maps with respect to the original O'Callaghan algorithm. Secondly, we see the process of analyzing data from the user view point by using the designed plugin.
Impact of rainfall mapping on methodology
The horizontal axis represents the plots that have the greatest influence when discharging water in that RP. The vertical axis provides water accumulation values passing through that RP. As observed, there is no appreciable difference between the three lines. The reason is that the variations in water accumulation are not very representative, partly because it is a short period of time and also because the work area is small. In any case, this methodology can be interesting for larger areas with more marked climatic variations, or in order to observe the behavior of rainfall over specific points in certain periods of time.
Percentage of water from plots
Study of individual plots in the Rumblar basin
Results in the Cazorla mountain area
In order to test the proposed methodology in another natural area and check its validity, we have run the algorithmic process in another study area. This is the Natural Park of Cazorla, Segura and Las Villas, a Biosphere Reserve located in the north-east of the province of Jaén (38° 07′52 ‘N 2° 45′52 ‘W, ETRS89). The reason for selecting an area also belonging to the province of Jaén is that the olive trees are treated with the same type of phytosanitary products, so the problem posed by the arrival of these residues in the drinking water reservoirs is the same. On the other hand, it is an extensive mountainous massif of abrupt relief, the source of the Guadalquivir and Segura rivers, in which karst formations abound. There are differences in the morphology of the terrain, with a topography and steeper slopes, up to 1,800 m. However, forestry and agricultural land use also coexist. An area of smaller land area has been selected, the Rumblar area was 63,186.50 Ha, and this second one is 1,098.71 Ha. A spatial resolution of 20 m has been selected, in order to evaluate the results when processing at the same level of detail.
CONCLUSIONS AND FUTURE WORK
In this work, we introduce the methodology for obtaining the information about those agricultural plots that end up discharging their waters into specific areas, such as a stream at the entrance to a reservoir. The process consists of modifying the classical algorithm to obtain the drainage network, the so-called O'Callaghan's D8. For this purpose, the origin of the water is tracked at all times along its path until it reaches the point of interest (or RP). Each individual agricultural parcel that discharges at that point is taken into account, also considering the proportion in which it discharges.
Users interested in this application may be ecologists or environmental technicians, so the usability of the process is important, in case they are not very familiar with GIS tools. To facilitate this task, a plugin has been implemented in QGIS that allows direct interaction with the maps. The user can click on points of interest and plots to obtain the information both graphically and textually. Many of the input data, as well as the parameters, are configurable to perform different types of simulation and also in different time phases. The latter allows the algorithm to be adapted to different study objectives
The initial premise of the work is to be able to control the agrochemicals that are carried by the water and reach the water reserves. The pending work consists of contrasting this methodology with chemical analysis of water at sensitive points. This should also be completed with the chemical analysis of the soil of the agricultural plots considered most representative by this methodology. With the analysis, it is possible to focus the study on the areas that are most likely to be affected.
DATA AND CODE AVAILABILITY
In order to make the data and code of this paper findable, accessible, interoperable and reusable according to the FAIR Guiding Principles for scientific data management and stewardship (Wilkinson et al. 2016), we have included both the data and the code in public repositories.
Data and QGIS plugin is available through this Drive link: https://drive.google.com/drive/folders/1yOJ_rarpUfYV-ylcdsYAgiOYmTidziNV.
Data available for the Rumblar use case:
oInput data: all data required to run the application.
oOutput data: the data generated after running the application in this area.
QGIS plugin: the plugin and installation instructions
The application is implemented in C + +. The source code including Algorithms 1 and 2 as well as the end-user interface is available through this GitHub public repository:https://github.com/lidiort/WaterTraceability_D8.git.
ACKNOWLEDGEMENTS
This research has been partially funded through the research projects PYC20-RE-005-UJA and PID2021-126339OB-I00. This first one is co-financed with the European Union (ERDF funds) and the second is partially supported by the Spanish Ministry of Science and Innovation and the European Union. We are also grateful for the support provided by the Ministry for Ecological Transition and the Demographic Challenge, Spanish Government (AEMET, Agencia Estatal de Meteorología). We would also like to thank SOMAJASA for their collaboration.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories.
CONFLICT OF INTEREST
The authors declare there is no conflict.