ABSTRACT
This paper explores the use of Echo State Networks (ESNs), a subset of reservoir computing, in modeling and predicting streamflow variability with a focus on biogeochemical patterns. Multiple ESNs were tested alongside a comparable long short-term memory model (LSTM), another deep learning model commonly used in time-series modeling, in the hope of finding a more robust streamflow chemistry predictor. Testing revealed that for our specific modeling of water temperature and dissolved oxygen (DO) levels, ESNs outperform LSTMs in both model fit and time necessary for training and testing. Our conclusions are that for hydrological tasks where data forms a chaotic time series, ESNs provide a useful and efficient alternative to LSTMs, being quicker to train, providing better results, and being easier to apply to the given task.
HIGHLIGHTS
Echo State Networks combine good performance with ease of use to solve water quality problems.
Echo State Networks provide a useful alternative to more advanced deep learning architectures for streamflow forecasting problems.
Where time and computational resources are a concern, Echo State Networks provide quick, easy, reproducible results with little tuning necessary.
INTRODUCTION
The chemistry and flow of water through stream networks affect human health, economy, and ecological functioning on global scales (Díaz et al. 2019; Frei et al. 2021; Basu et al. 2022; Hannah et al. 2022). Hydrochemistry is in turn controlled by complex interactions in the contributing watershed and stream network, including vegetation, direct human disturbance, climate, groundwater dynamics, and water infrastructure (Dupas et al. 2019; Godsey et al. 2019; Barbarossa et al. 2020; Goeking & Tarboton 2022; Brown et al. 2023). Accurate prediction of variance in stream flow and chemistry is increasingly important as the global human footprint and disruption of climate put more humans and habitat at risk from flooding, pollution, and ecological collapse (Abbott et al. 2023; Hagen et al. 2023; Rockström et al. 2023; Willcock et al. 2023).
The fractal interactions of the factors controlling stream behavior make hydrochemical time series difficult to describe and predict (Blöschl et al. 2019; Kolbe et al. 2019; Brown et al. 2023). Seasonal and annual variation in both flow and nutrient flux, as well as the nature of disturbance (such as flood, drought, and fire) in chaotic systems like river networks, make accurate prediction of hydrochemical patterns very challenging. Some difficulty also lies in handling chaotic variation at varying time scales, where some factors, such as groundwater, vegetation succession, and precipitation and/or flood events, exhibit strong control over chemical behavior at specific times of year and appear non-existent for the rest of it (Saberi et al. 2021). Machine Learning (ML) methods have recently been applied to hydrochemical problems with great success and have been shown to be more accurate than traditional physical-based models in some cases (Jimeno-Saezm et al. 2022). Consequently, there has been a surge in the use of ML tools for hydrochemical applications, particularly artificial neural networks (ANNs) and the more recent long short-term memory (LSTM) approaches (Asadollah et al. 2020). LSTMs are a type of recurrent neural network (RNN) that successfully avoid the vanishing- and exploding-gradient problems common in traditional RNNs, making them highly resistant to bifurcations, which historically have made RNNs difficult to train (Doya 1992). LSTMs also integrate internal interactions across time scales, which allows them to successfully model spatio-temporal datasets with time-dependent dynamics, such as when previous events influence current and future behavior. This makes them well suited to problems involving hydrochemical prediction (Shen et al. 2021).
METHODOLOGY
Although LSTMs have successfully predicted streamflow forecasts (Hunt et al. 2022) and more recently water quality (Liu et al. 2019; Wang et al. 2017), they are complex and costly making it difficult to apply them in areas where computational resources are limited. Echo State Networks (ESNs), a subset of Reservoir Computing (RC), are significantly simpler than LSTMs, yet retain many of the beneficial attributes and remain robust to the chaotic variation inherent in water quality time series. ESNs are commonly used as an alternative to RNNs because of their accuracy and ease of use. ESNs and LSTMs differ in model architecture, training methods, and simplicity in modeling and forecasting applications. LSTMs consist of a series of interconnected cells that are made up of ‘gates’ that handle signal propagation, enabling them to both forget unnecessary long-term information and retain important short-term information. ESNs, on the other hand, are composed of a single set of sparsely connected ‘nodes’, called a reservoir, which propagates a signal through to a single output layer that decodes data and whose outcome is a final prediction. The single output layer, called a readout, is the only trainable piece of the network, saving both time and space when compared with other model architectures.
ESNs are notably simpler than more modern deep learning models, but are still commonly used for their efficiency and accuracy in spatio-temporal problems. When used for temporal problems, ESNs and LSTMs accept data in the form of a time series, where each data point represents a value, or set of values, at a specific point in time. Datasets are compilations of readings of the same set of features across a time-scale and have an ordering through time. Both ESNs and LSTMs make use of feedback connections which take into account previous timesteps’ information while considering future timesteps’ outcomes. While LSTMs possess non-linearity in each cell that helps to capture chaotic signal behavior, they often need large networks to handle increasingly complex signals. ESNs possess inherent non-linearity, which comes from the connectivity between reservoir nodes, which allows them to successfully handle largely chaotic time series, and are much easier to train on long-term natural signals (Jaeger & Haas 2004).
Where resources and time are not limitations, LSTMs have been shown to provide accurate predictions at the cost of time and complexity (Zhou et al. 2018). In cases where resources such as memory and compute power are limited or quick training and prediction are needed, ESNs can serve as a useful alternative to LSTMs. ESNs that have been correctly initialized also possess (and get their name from) the Echo State Property (ESP), which is similar to the fading memory possessed by LSTMs. The ESP possessed by ESNs arises when the spectral radius, which is the largest eigenvalue of the reservoir’s corresponding weight matrix, is close to 1. The ESP essentially guarantees that the long-term memory of the reservoir decays at a slow enough rate to maintain an amount of control over the generated signal, while amplifying the short-term memory appropriately in order to accurately predict short-term signal variation. In order for ESNs to effectively handle chaotic signals, they must have this property (Jaeger 2002). This differs from the fading memory of LSTM cell blocks by nature of the connectedness of an ESN reservoir. Where each LSTM cell possesses its own trainable memory due to the connectivity of the cell gates, the state of the ESN reservoir at each timestep represents a high-dimensional mixture of both past and current signals (Ceni & Gallicchio 2023). Building an accurate model correctly strikes a balance between the non-linearity of the signal propagation and the memory capability of the model (Antonelo et al. 2017). When initialized correctly, ESNs can be an efficient method for handling long-term, multivariate, temporal data. Part of this contribution is to serve as a user-friendly introduction to ESNs in the context of water quality time series.
ESN structure and hyperparameter tuning
To build our models, we used a python library called reservoirpy, which makes building and optimizing ESNs straightforward, and has many built-in tools to help fine-tune models for performance (Trouvain et al. 2020). Though the structure of an ESN is relatively simple, optimization is notoriously difficult, and is often found through trial and error. The reservoirpy library contains functionality built around the Hyperopt package, a framework designed for hyperparameter optimization across model type (Bergstra et al. 2013). Because of the nature of the ESN architecture, there are fewer hyperparameters of interest in tuning than more complex deep learning architectures like LSTMs. The hyperparmeters with the most impact on model performance were: units (reservoir size), leaking rate, reservoir connectivity, and ridge regularization. Hyperparameter tuning of both the temperature and DO ESN models returned very similar optimal settings, suggesting that ESNs are relatively robust to hyperparameter tuning, likely due to the nature of the relationship between the reservoir and the trainable output layer. Optimal reservoir size is highly task-dependent; a reservoir too big or too small dramatically impacts model success in generating an accurate signal. The ridge regularization parameter balances both the stability and the fine detail of the generated signal, and prevents over-fitting. Reservoir connectivity, which governs the random connections between nodes in the reservoir, also influences the signal generated by the reservoir, which may be either too chaotic or not chaotic enough, which in either case leads to inaccurate predictions. ESNs will commonly be initialized with very sparse connectivity rates with the hope that less connectivity between nodes will increase the variation in reservoir response signals, which is good for overall training. Typically a connectivity rate of 1% is used, meaning each node is connected with approximately 1% of the other nodes in the reservoir. With the connectivity rate remaining constant, a network that is too large creates an insensitive signal, which cannot accurately predict minute daily variation. A network too small generates a signal that is too sensitive and becomes even more chaotic than the time series, which also gives inaccurate predictions. After testing multiple values for each of the hyperparameters above using reservoirpy’s built-in tuning functionality, both the temperature and DO models were initialized with similar values. Each model was initialized with a small leaking rate of 0.9, and the default reservoir connectivity of 1%. Both model types showed an optimal ridge regularization parameter of 1 × 10−7, and a reservoir size of 1,000 nodes.
Various reservoir sizes and ridge regularization parameters and their effects on signal generation. Reservoir size is perhaps the most important component in developing an accurate model. Too small a reservoir, and the model cannot accurately generate large- or small-scale signal variation, while too large a reservoir places too much emphasis on daily variation, leading to highly chaotic and inaccurate signals. Ridge regularization had comparatively less impact than reservoir size, showing that reservoir size and connectivity were significantly more important to accurate signal generation than ridge regularization.
Various reservoir sizes and ridge regularization parameters and their effects on signal generation. Reservoir size is perhaps the most important component in developing an accurate model. Too small a reservoir, and the model cannot accurately generate large- or small-scale signal variation, while too large a reservoir places too much emphasis on daily variation, leading to highly chaotic and inaccurate signals. Ridge regularization had comparatively less impact than reservoir size, showing that reservoir size and connectivity were significantly more important to accurate signal generation than ridge regularization.




Data
Water temperature and DO are two of the most important hydrochemical variables that affect both stream habitat and impact on human society (Hem 1985). Water temperature and DO are directly connected due to the temperature dependence of oxygen solubility and oxygen production by primary producers. These variables have both daily and seasonal variation, the relationship between which is difficult to accurately model, but key to understanding and predicting long-term changes to streamflow (Cao et al. 2021). Another important consideration relates to the availability of chosen data. Water temperature has consistently been reported daily by the United States Geological Survey (USGS) in many sites dating back to the 1950s or earlier. However, DO and other nutrient recordings are sporadic in most sites before the year 2018, which makes it difficult to find enough long-term data for both training and testing. We found in our initial testing that models trained on the limited amounts of DO datasets were unreliable and inaccurate. In order to circumvent this problem, we added multiple random permutations of the same set of years to our dataset in order to simulate seasonal changes across a larger time-scale than was available. Our results here are useful as a proof of concept and as a tool for hypothesizing long-term changes to DO patterns. As will be highlighted below, as the signal to process becomes more complex, the amount of data needed to successfully train an ESN grows at a significant rate. This can significantly affect model performance in scenarios where total amount of data is a limitation. If, on the other hand, the amount of data is not a limitation but the signal is extremely chaotic, and increased amounts of data only add more chaos, an ESN will not be able to successfully predict the signal without an extremely large reservoir. This makes the use of ESNs challenging in situations where compute power is not an issue, but system storage is a limitation.
Multiple long-term monitoring sites were chosen for training and testing based on similar elevation, discharge, hydrochemical behavior, and general topography. Data for this project came from various USGS gauge stations on the Colorado and Green rivers near the Colorado-Utah border. The site numbers used were: USGS09095500, USGS09261000, and USGS09163500. These sites were selected because they were geographically close, allowing us to assume similar chemical responses and patterns for both temperature and DO across all sites, and because they contained significantly long time-series data for both temperature and DO. Because ESN accuracy is so closely tied to the amount of training data available, the final datasets for both temperature and DO came from the sites with the longest recording periods. Temperature data came from site USGS09163500, on the Colorado River near the Colorado-Utah border, while the DO dataset used to generate the results below came from data gathered at site USGS09095500 on the Colorado river near Cameo, Colorado. This site contained the longest period of recording of daily DO, but was still not sufficient to generate an accurate signal, and so was augmented with random permutations, then fine-tuned on the real-time-series. DO data from the other sites were tested, however we found that the recording periods were too short and the signal was too chaotic for our model to accurately reproduce either seasonal or daily variation. Both temperature and DO had maximum, minimum, and mean values recorded daily by the USGS.
Training and testing
When predicting time-series data, especially chaotic natural signals like streamflow, it helps to isolate the chosen features and train the model separately on each feature of interest. This can help to highlight connections or relationships among tested features and help the model to accurately predict some of the more chaotic relationships between chemical behavior and streamflow. One common use for ESNs is in future signal generation, which can be extremely valuable for modeling flow regime and long-term hydrochemical patterns. Once the model has been sufficiently trained with long-term data, the model can successfully highlight trends that occur over a long period of time (e.g., the growth of maximum water temperature in recent years (Pörtner & Roberts 2022)). This project demonstrates the use of ESNs in future signal generation, examines the impact or random reservoir initialization, and directly compares ESNs to a similar LSTM model.
In order to capture the effects of random reservoir initialization, 10 models each for temperature and DO were initialized then trained and tested on the same datasets, initialized with the same hyperparameters described above. A train/test split of approximately 70/30 was chosen (the first 70% of the recorded data was used to train the models and the remaining 30% was used for testing). After training, each model was used to predict the signal pattern for the test portion of the data. For each prediction, the model was given the previous day’s value for temperature or oxygen, and asked to predict what the next day’s value would be. Each model produced a new time series which was compared to the withheld portion of the data. Model accuracy was recorded and stored in a list for comparison to other models. These results were also plotted for visual comparison to the actual time series.
Comparison with a similar LSTM
Comparable LSTM models were trained and tested on the same data splits and generation periods for both temperature and DO. The goal of this comparison was not to prove that one model outperformed the other, but to accentuate where one model might be a better fit depending on user resources and time constraints. The nature of complex deep learning architectures like LSTMs is that as model fit increases, so does model complexity. With this in mind, our LSTM models were constructed to strike a balance between the complexity needed for accurate signal generation, and the size and compute constraints that would justify considering an ESN as an alternative. Both LSTM models were designed to be only as large as absolutely necessary to deliver comparable results to the ESN models described above in order to compare the differences in resources and training time necessary to achieve the same results.
For model training and testing, we used the python library scalecast, which provides a wrapper over the commonly used TensorFlow Keras LSTM layer, and streamlines LSTMs for use with time-series problems (Keith 2024). Scalecast automatically optimizes model performance based on chosen parameters for the given time series, and provides built-in functionality for hyperparameter optimization (such as GridSearch) that is distinct for each model type. Our LSTM models were initialized on the same training period for both temperature and DO ESN models, and tuned using the GridSearch tuning functionality contained in the scalecast LSTM implementation. When tuning a scalecast model, the optimal parameter values are stored in a model attribute called . Tuning was done for both temperature and DO models, where it was found that the most influential variables in LSTM model performance were time-lag (how many previous days each prediction takes into account), and the total number of training epochs.
Tuning both LSTMs revealed little difference in performance with a time-lag greater than 150 steps for the temperature model and a time-lag greater than 75 steps for the DO model. The optimal structure for the temperature model was comprised of two layers, the first containing 100 cells and a dropout rate of 0.3, with the second containing 50 cells and a dropout rate of 0.3. The optimal number of training epochs for the temperature model was 100. The optimal structure for the DO model contained three layers, containing 150, 100, and 50 cells, and dropout rates of 0.4, 0.3, and 0.2, respectively. The optimal number of training epochs for the DO model was 50. Both models used the same train/test split as our ESN models, a standard Adam optimizer, and an early-stopping criterion monitoring validation loss for efficient training. Testing with various lengths of time-lag showed that finding the optimal lag input is a difficult problem. In order to compare the simplest usable form of LSTM, a time-lag of 100 days was chosen for temperature, while 50 days was chosen for DO, in order to strike a balance between length of time needed to train and quality of results. One important note is that these are likely not the optimal hyperparameter settings for training to maximize task accuracy, but rather these were the optimal settings for working with the constraints of minimizing model size and training time as much as possible while still providing adequate results.
RESULTS
Metrics
Model accuracy was tested using several metrics: Root Mean Square Error (RMSE), R-squared (R2), and Nash–Sutcliffe Efficiency (NSE) (Saqr et al. 2023; Abd-Elmaboud et al. 2024). RMSE is a commonly used regression metric to test standard deviation of model predictions from true values, with values closer to 0 representing a more accurate model. A weakness of RMSE is that the return value can be highly relative (a value between 0 and infinity can be returned), which makes it difficult to judge real-world model performance. R2 provides a solution to this problem, returning a value between 0 and 1, where a value of 1 represents a perfect correlation between predictions and true values, and values closer to 0 represent a lack of or no correlation between predicted and observed values. NSE is very similar to R2, however, it is primarily used to judge model simulation fit and is commonly used to measure hydrological model accuracy. Together these metrics give a broad view of model performance and give insight into real-world application.
Temperature
Echo State Network (ESN) model Nash–Sutcliffe Efficiency (NSE) distribution. Small distributions in model performance suggest that random reservoir initialization has little effect on model performance, showing that the ESN model architecture is a good fit for chaotic, univariate time-series modeling.
Echo State Network (ESN) model Nash–Sutcliffe Efficiency (NSE) distribution. Small distributions in model performance suggest that random reservoir initialization has little effect on model performance, showing that the ESN model architecture is a good fit for chaotic, univariate time-series modeling.
ESN temperature model predicted vs actual time-series. The signal accurately predicts daily and seasonal variation, however struggles to match peak variation at the coldest and warmest seasons. This generated signal matches more closely to the actual signal than the LSTM temperature model does, showing ESNs to be a better fit for this problem than LSTMs when compute resources and training time are a concern.
ESN temperature model predicted vs actual time-series. The signal accurately predicts daily and seasonal variation, however struggles to match peak variation at the coldest and warmest seasons. This generated signal matches more closely to the actual signal than the LSTM temperature model does, showing ESNs to be a better fit for this problem than LSTMs when compute resources and training time are a concern.
LSTM temperature model predicted vs actual time-series. Despite ample data, the chaotic nature of the training signal prevented the LSTM from accurately generating a signal matching either seasonal or daily variation of the actual signal.
LSTM temperature model predicted vs actual time-series. Despite ample data, the chaotic nature of the training signal prevented the LSTM from accurately generating a signal matching either seasonal or daily variation of the actual signal.
Performance comparison between one of the 10 ESN temperature models and an LSTM trained on the temperature dataset. The LSTM model not only had a worse model fit, but trained almost 100 times slower than the average time needed to train an ESN model. This perfectly highlights the main key advantage ESNs have over LSTMs.
Performance comparison between one of the 10 ESN temperature models and an LSTM trained on the temperature dataset. The LSTM model not only had a worse model fit, but trained almost 100 times slower than the average time needed to train an ESN model. This perfectly highlights the main key advantage ESNs have over LSTMs.
Dissolved oxygen
ESN dissolved oxygen model predicted vs. actual time series. The extra sensitivity to small-scale (daily) variation show that the length of the training dataset is significant in determining the accuracy of the generated signal.
ESN dissolved oxygen model predicted vs. actual time series. The extra sensitivity to small-scale (daily) variation show that the length of the training dataset is significant in determining the accuracy of the generated signal.
LSTM dissolved oxygen model predicted vs actual time series. Model performance shows adequate seasonal variation in the signal, however, it fails to correctly generate small-scale (daily) variation. Similar to the dissolved oxygen ESN above, this suggests that as training sets become smaller, difficulty in generating accurate small-scale signals increases. This makes LSTM models an unoptimal choice for modeling time-series where the amount of training data is a concern.
LSTM dissolved oxygen model predicted vs actual time series. Model performance shows adequate seasonal variation in the signal, however, it fails to correctly generate small-scale (daily) variation. Similar to the dissolved oxygen ESN above, this suggests that as training sets become smaller, difficulty in generating accurate small-scale signals increases. This makes LSTM models an unoptimal choice for modeling time-series where the amount of training data is a concern.
Performance comparison between one of the 10 dissolved oxygen models and an LSTM trained on the augmented dataset. Although the disparity between training time was much smaller, LSTM model fit was significantly worse. This shows that data quantity is a major concern when using LSTMs for time-series modeling.
Performance comparison between one of the 10 dissolved oxygen models and an LSTM trained on the augmented dataset. Although the disparity between training time was much smaller, LSTM model fit was significantly worse. This shows that data quantity is a major concern when using LSTMs for time-series modeling.
Similar to temperature recordings, the more dramatic nature of the variance during spring and fall seasons made it hard for the model to differentiate between the more stable winter months, and the rest of the year where the recorded levels varied greatly. DO levels are affected by more than just temperature, relying on groundwater discharge, the atmosphere, and light levels which affect the amount of oxygen primary producers (plants) add to the water. During the summer, spring, and fall seasons these contributions from other sources could be responsible for the greater variance found in the recorded amount. Because variance of the signal differs from season to season, it is difficult to build a model that can accurately predict these trends without access to each contributing variable.
DISCUSSION AND ANALYSIS
This comparison highlights the major advantage ESNs have over LSTMs: in order to generate accurate time series, LSTM models must be deep enough, and have a training set large enough, and train for sufficient time to handle the chaotic signal variance and balance between small- and large-scale signal behavior. This often means that a sufficiently trained model is too complex and costly to be realistic in a real-world scenario. ESNs excel in this regard primarily because they store only one large set of fixed weights for the reservoir and one small set of weights for the output layer, where LSTMs must contain a significant amount of layers and cells in order to handle chaotic signals. ESNs also have an extremely low computational cost and memory requirement, because no back-propagation through time is required for training, unlike LSTMs. LSTMs store both the weight matrices and gate activations through the training phases leading to much higher computational and memory requirements, and significantly longer training time. ESNs also have been successfully developed for low-power devices, which have strict resource requirements, making LSTMs and other more complex architectures an impossible choice (Jaeger 2002).
Perhaps the largest contrast between the training application of ESNs and LSTMs can be described in their real-time adaptability. Because of the low compute cost and quick training time of ESNs, training, tuning, and testing models happens very quickly, and multiple models can be run simultaneously with varying hyperparameter values. LSTMs have no real-time adaptability, instead requiring significantly more time and compute cost to adapt or re-train the model. The simplicity of ESNs allows for almost any machine to build and run a model that provides accurate results., but could struggle with extremely complex signals, where reservoir size would need to be unrealistically large in order to accurately capture long-term temporal patterns. A sufficiently deep LSTM would almost certainly be more accurate than our relatively simple ESN architecture, however it would take significantly more time for training and testing. For the quality of results given (and the time and resources needed to test them), ESNs are a powerful choice for quick predictions and signal generation, especially where immediate results are needed. ESNs provide very quick and efficient training and handle chaotic signals well with little optimization compared to more modern deep learning models.
Necessity of consistent data
As shown by our results, ESNs can provide effective modeling and generation in long-term streamflow and hydrochemistry prediction problems, and can perform better than state-of-the-art model architectures like LSTMs. The efficiency of their initialization and training make them a good choice for hydrological modeling problems, and they can be extremely sensitive to changes in streamflow dynamics. The temperature models had markedly better results than the DO models, likely because of the length of the training sets, and by augmenting the available data for DO, we were also able to produce reasonably good results with that model as well. This is a worthwhile discussion, because the nature of using augmented data comes with some key advantages and disadvantages. The use of artificial data in our case provided enough training stimulation for the models to produce adequate results, and with even more data available, we would likely be able to improve model results further. The inclusion of artificial data can be beneficial by improving data availability, model generalization, eliminating gaps in time series data, and serving as an aid in transfer learning, but comes at a cost such as the risk of introducing bias, adding unrealistic variability to time series, and increased difficulty in model validation (Wen et al. 2020; Iglesias et al. 2023; Semenoglou et al. 2023). While the lack of sufficient training data inhibits successful real-world modeling in this specific watershed, for any watershed where either DO data from a longer period is available or including the assumption of the risks stated above, our model could be used as a more accurate predictor than a traditional LSTM, with less time and effort needed for model initialization, training and tuning. Where there are sufficient long-term recording periods, a fully trained model could be used either as a control, tracking what a healthy watershed should look like, or as a model of watershed reaction to major events.
Other variables that were initially considered as key metrics were discharge, specific conductance, turbidity, and pH, however no sites were found with enough consistent daily recordings to enable successful training. This highlights a potential use for data augmentation techniques, like the ones discussed above. Many of the recorded periods were far apart, with inconsistent period lengths. These metrics were significantly less autocorrelated than temperature or DO which, combined with the lack of consistent recording periods, made developing an accurate model unrealistic. More advanced deep learning architectures may have produced better results with the data available, and if training data were not an issue, ESNs trained on these parameters would likely generate similar results to the models developed in this experiment. For these variables, most sites with large periods of recorded data only contained seasonal recordings (e.g., daily recordings for the summer or winter season, or a few years of monitoring after a major event), which prevented our model from generating an accurate seasonal spread. This highlights the importance of finding consistent, long-term data in developing a model that holds real-world importance.
One model, many applications
Some water quality metrics are dependant on others, which are more readily available in large quantities. In cases where some variables directly depend on one or more independent feature in the data, it is worth exploring the use of a model fully trained on the independent feature and passed through a relational function to predict dependent variables of interest. In our case, DO levels directly depend on water temperature. Further experiments could use our fully trained temperature model along with salinity values and percent oxygen saturation levels to predict a range for DO levels for modeling or planning purposes. ESNs can also be used with higher-dimensional data, or to generate a single prediction based on multiple previous state outputs. With streamflow chemistry being a dynamic web of interactions between variables, it is worth future effort exploring how a model trained on a specific target could be used to predict other variables contained in the training set by switching the target with the desired variable for prediction.
Analysis of ESNs
ESNs have been shown to be effective in signal processing applications as described above, and we have shown them to be effective in hydrological applications as well. In problems with temporal datasets, ESNs shine as a simple and efficient model architecture that provides accurate temporal predictions and time-series generation. When early RNN algorithms were introduced, they suffered from many problems related to gradient descent (such as bifurcations). This made them hard to apply in real-world scenarios, and led many researchers to explore the use of ESNs as an alternative. Today, thanks to developments like autodifferentiation, RNNs are much more useful. Because of this, ESNs’ only advantage over modern RNN architectures is their highly adaptive and quicker training. RNNs today are very effective in solving highly complex signal processing problems like speech recognition (Graves et al. 2013). For problems like this, ESNs would likely need unrealistic amounts of memory to create a model sensitive enough to compete with an RNN. It remains to be seen whether ESNs are subsumed or even made irrelevant by modern deep learning techniques in these types of applications. Regardless, in many signal processing problems, ESNs remain a simple, highly effective, and broadly applicable architecture. For their ease of use and accuracy alone, ESNs are an extremely viable ML architecture for time series modeling, especially where compute power, storage, or time are limited resources.
In regards to streamflow dynamics and hydrochemical modeling, ESNs can be used to create realistic models of high-dimensional scenarios, as well as single variable applications like the one shown here. Streamflow dynamics is a challenging area of hydrology, with individual watershed catchments having dramatically different reactions to similar weather events. It is worth exploring the differences in ESN model reaction to extreme weather events when models have been trained on different watershed catchments of similar landscape and topography. In order for this to work, there must be well documented extreme event data on a scale large enough to compare models.
Ensemble learning for hydrological problems
There is significant potential for future work exploring the use of ESNs in conjunction with models like LSTMs as part of an ensemble to solve water quality problems. Ensemble learning is an effective approach which has been shown to be successful in hydrological applications (Zounemat-Kermani et al. 2021). Ensemble learning is a type of meta-learning where multiple models’ predictions are combined on a task, and then results are given to a parent model which will learn through training which model is best for the given problem. Models can be chosen based on some threshold or accuracy level in order to maximize model performance on a difficult task, or based purely on predictions from the parent model. Because they are so efficient and easy to implement, ESNs can be used in collaboration with other models as part of an ensemble in order to maximize ensemble performance in difficult hydrological tasks.
Ensembles can also be used to increase ESN performance, by helping to stabilize the training and tuning process (Wu et al. 2018). One downside to ESNs we found was that our ESN models were relatively unstable, with good results being highly dependent on an optimal combination of hyperparameters. Because finding the perfect set of parameters was a very difficult problem, this provides an opportunity for ensemble learning to improve robustness and help to stabilize model performance. Because of the natural simplicity of ESNs, many individual models of various layouts and levels of optimization, with different combinations of hyperparameters, can be combined in an ensemble in order to maximize performance on specific problems. In conjunction with other well-known machine learning models for hydrological problems, ESNs can provide insight and help to validate insights and findings gained from other models.
CONCLUSIONS
Importance of monitoring and prediction tools
As the effects of climate change become more visible around us, it becomes increasingly important to monitor vital resources in locations where those resources are strained. In the western United States, drought has significantly affected the lives of the approximately 80 million people who live there. In order to consciously and ethically manage resources and keep people safe, there is a great need for tools that can give accurate predictions of water resources. Streamflow chemistry is a key indicator of the quality of those resources, and their importance for biodiversity and overall ecosystem health make successful prediction and monitoring tools an essential part of our efforts to understand and mitigate the effects of climate and land-use change. There is growing interest in applying machine learning tools to predict and model streamflow, which has proven to be a very effective combination and helped to better manage limited water resources. Streamflow is made up of chaotic natural signals, which are difficult to model and predict in physical-based or statistical models. ESNs are another application of machine learning used to create more robust streamflow predictors which are sensitive to these types of signals. ESNs handle chaotic signals well, and provide another opportunity for real-world modeling and prediction that is accessible to a wider range of scientists due to their ease of use and broad application.
ESNs have already been proposed as an alternative to traditional neural networks and RNNs in rainfall forecasting (De Vos 2013), and while LSTMs have been shown to be effective under certain conditions (Hunt et al. 2022), this project gives an introductory comparison between the two and gives an introduction to (1) predicting hydrochemical behavior of streams and river systems, (2) long-term modeling of these systems, and (3) provides a template for when ESNs would provide a better fit than LSTMs and other model architectures for water quality time series problems. The success we have shown in applying ESNs to this problem warrants further exploration of their use in the broader field of hydrology, and more specifically in the field of streamflow hydrochemistry.
Future work
One of the most impressive features of ESNs is their dynamic reservoir memory, and how that memory is affected by model feedback. Many forms of online training make special use of these feedback connections, which can be beneficial as the signals become more complex. Future efforts comparing and contrasting the use of these forms of training, and their effects on model feedback in cases with extremely complex signals, have the potential to benefit humanity by creating more robust and accurate tools for water quality prediction. It is also worth exploring the use of ESNs in predicting reaction patterns of DO to other key variables like turbidity, percent oxygen saturation, and primary producer activity in a more high-dimensional space. This problem is of particular interest in areas where flow regimes are affected by discharge from joining river systems, dam construction and regulation, and unique biochemical processes (Zhong et al. 2021). ESNs could provide key insights into this problem in areas where remote sensing and monitoring are essential to measuring watershed health.
Another key area of interest is in the transferability of these models to other similar sites along the Colorado and Green rivers, or in other similar watersheds to test for model generalization. While outside the scope of this comparison, our results affirm that ESNs are a powerful alternative to more complex deep learning architectures for hydrochemical time series forecasting, though little research has been done on how ESNs or other RC architectures generalize across dataset assuming a shared problem type. If ESNs are able to transfer across dataset well with minimal fine-tuning, they can solidify themselves as a key architecture for time-series forecasting for hydroinformatics problems.
ACKNOWLEDGEMENTS
This research was funded by the US National Science Foundation (grant numbers EAR-2012123 and EAR-2011439) and Utah’s Watershed Restoration Initiative. We thank those involved in gathering, analyzing and publishing the data necessary for this experiment. We also thank the Brigham Young University Honors Department, who established connections between the authors and provided the resources needed for the experiment itself.
CREDIT AUTHORSHIP CONTRIBUTION STATEMENT
P.A. conducted model training, testing and analysis, and writing of the manuscript. B.W.A. advised on application of models to the data, as well as revised the manuscript. B.B. advised in gathering data, as well as revised manuscript. C.G.-C. advised on model selection, application and comparison, as well as revised the manuscript.
CODE AND DATA AVAILABILITY
Language: python 3.11.0.
Software required: reservoirpy, scalecast, keras, tensorflow, pandas, hydroeval, matplotlib.
The source code, data, and manuscript are available for downloading at the link: https://doi.org/10.5281/zenodo.12584469.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.