The present study aims to streamline the long-term spatiotemporal river water quality assessment and forecasting utilizing three intuitive Python-based modules: (1) Python toolbox for scalable Outlier Detection (PyOD) to classify significant deviations from the expected water quality norms (outliers), (2) Statsmodels to decompose the river time series data into its trend, seasonal, and residual components, and (3) Automatic Time Series forecasting model (AutoTS) to forecast (and compare) the future water quality state of the Karun River (the case study) in Southern Iran. The findings indicate that the outlier elimination has a remarkable impact on the outcomes of the Karun time series data analysis. Additionally, a significant increase in total dissolved solids (TDS) concentrations and a cyclic pattern were discernable in the decomposed time series. Furthermore, the water quality values were found to be clustered around the median of their datasets. Based on the forecasting validation metrics, the proposed automated forecasting model was found to be promising in predicting the future water quality state of the river.

  • An intuitive model for a rapid spatiotemporal river water quality analysis was proposed.

  • Python toolbox for scalable Outlier Detection was leveraged to identify outliers in the river water quality dataset.

  • Statsmodels was applied to decompose the data time series.

  • Automatic Time Series forecasting model (AutoTS) was trained to automatically forecast the future water quality state of the river.

  • Violin and Box plots were visualized to illustrate the quality variation patterns along the river.

n

number of data

STN. 1

station 1 (Mollasani)

STN. 2

station 2 (Ahvaz)

STN. 3

station 3 (Farsiat)

STN. 4

station 4 (Darkhoveyn)

AutoTS

automatic time series forecasting model

CV

coefficient of variation

LOESS

locally estimated scatterplot smoothing

ML

machine learning

WQI

water quality indicator

sMAPE

symmetric mean absolute percentage error

TDS

total dissolved solids

MAE

mean absolute error

PyOD

Python toolbox for scalable Outlier Detection

EC

electrical conductivity

SD

standard deviation

SPL

scaled pinball loss

STL

seasonal-trend decomposition using LOESS

k-NN

k-nearest neighbors

Water is crucial not only for the survival of all living organisms but also for preserving ecological balance and supporting human activities (Chintalapudi et al. 2022; Sajan & Christopher 2023). Over the past decades, the quality of water resources has steadily declined due to a wide range of anthropogenic factors, including rapid industrialization, unrestrained urbanization, and intensive agricultural practices. These artifacts contribute to the pollution of water bodies, posing a grave threat to both aquatic ecosystems and the well-being of human populations (Chung et al. 2021a; Arabameri et al. 2023; Guo et al. 2023; Locke 2024).

Surface waters, particularly rivers, as the primary sources of community water supply, assume a significant role in protecting the environment and human health (Barakat et al. 2016; Gil-Rodas et al. 2023). The main sources of river contamination consist of municipal and industrial wastewater, as well as agricultural drainage water containing a variety of physical, chemical, and biological pollutants (Ahmadmoazzam et al. 2021; Wu & Chen 2023). One of the effective solutions to ensure the safety of river ecosystems and the health of humans is the regular monitoring of river water quality (Singh et al. 2022). The monitoring process reveals plenty of valuable information, including water characteristics, trends and changes in components, and potential water quality issues. Subsequently, the obtained data assists decision-makers in developing pollution prevention scenarios, treatment plans, and in making informed decisions (Shi et al. 2021).

Spatial and temporal river water quality monitoring, analysis, and forecasting is a long-term process that has been globally conducted by hydrologists. While the spatial assessment is less complex, the temporal part comprises multiple arduous steps, including (1) data provisioning, (2) time series decomposition, and (3) future quality state prediction (Georgescu et al. 2023; Long et al. 2023; Mijošek et al. 2023; Qian et al. 2023; Dilekoğlu et al. 2024).

In the first step (data provisioning), the outliers' detection lists are among the most critical tasks (Garces & Sbarbaro 2009). Outliers refer to data points that stand out from the remainder of the dataset; consequently, they have the potential to significantly affect the outcomes of time series analysis. In this regard, we conducted a comprehensive literature review on the tools and methods that have been used by hydrologists to trap outliers in river water quality datasets. Accordingly, the functional depth method (Di Blasi et al. 2013), isolation forest, kernel density estimation (Liu et al. 2020; Uddin et al. 2024), quartiles and box plots (Alberdi Igartua et al. 2024; Qian et al. 2024), linear prediction correction filter, multivariate nearest neighbor (Nafsin & Li 2021), support vector machine (Vellingiri et al. 2023; Kushwaha et al. 2024), Grubbs's test (Casillas-García et al. 2021), Scikit-learn library of Python (as the tool) (Mdegela et al. 2023), nearest neighbor-high dimensional algorithm, aggregated k-nearest neighbors (k-NN), sum of distance to k-NN, local distance-based outlier factor, density-based local outlier factor, connectivity-based outlier factor, influenced outlierness, robust kernel-based outlier factor (Talagala et al. 2019; Mokua et al. 2021), are among the studied approaches for anomalies/outliers detection.

The present research, for the first time, examines the performance of a versatile Python library, namely Python toolbox for scalable Outlier Detection (PyOD) (PyOD 2.0.1 documentation 2024; Zhao et al. 2019), as the tool, and a proximity-based supervised learning algorithm, namely, k-NN (Beckmann et al. 2015; Wang et al. 2022a), as the algorithm, for rapidly detecting and eliminating outliers from a river water quality dataset. PyOD is open-source with detailed documentation and supports advanced models. This toolkit is comprehensive and scalable, which provides easy-to-use and quickly executable features for researchers to isolate outliers in a set of data (PyOD 2.0.1 documentation 2024). By using this module, we seek to reduce the amount of time, effort, and expertise needed to spot the outliers in a river water quality dataset.

The second step, time series decomposition, refers to the process of breaking down time series data into its individual components, i.e., trend, seasonal, and residual. It is a well-known technique applied in time series analysis to better understand the underlying patterns and characteristics of the data. There are several commonly used time series decomposition methods, including classical decomposition, moving averages, seasonal decomposition of time series by locally estimated scatterplot smoothing (seasonal-trend decomposition using LOESS (STL)), exponential smoothing state space model, and Fourier analysis (Hyndman & Athanasopoulos 2015).

Based on the literature review, the STL is among the most frequently employed methods used by hydrologists (Cheng et al. 2021; Deng et al. 2021; Dong et al. 2023; Huan 2023; Wu et al. 2023; Yin et al. 2023). Therefore, we tried to facilitate the use of this method by introducing a user-friendly Python module, namely Statsmodels (Seabold & Perktold 2010; Statsmodels 0.14.1 2024), to unveil the underlying patterns (trend, seasonal, and residual) of a river water quality dataset. Statsmodels is an open-source Python module that provides functions for the estimation of various statistical models, as well as for swiftly implementing statistical tests and data exploration. To the authors' best knowledge, there has been no investigation into the performance of this toolkit for river water quality assessment.

The third step, water quality forecasting, provides valuable information to support water resources management and protect public health (Yu et al. 2022). To date, various methods and techniques such as autoregressive integrated moving average, nonlinear autoregressive neural network, long short-term memory (Hien Than et al. 2021), support vector model (Liu & Lu 2014), cascade-forward network, radial basis function network (Georgescu et al. 2023), and Thomas-Fiering (Kurunç et al. 2005) have been applied to forecast the water quality of different rivers. However, to the best of our understanding, there is no study that applies 42 self-standing forecasting models as an integrated system to predict the water quality parameters of a river. In the present work, we propose an automated forecasting framework, namely Automatic Time Series forecasting model (AutoTS) (Intro – AutoTS 0.6.13 documentation 2024; Wang et al. 2022b), in Python. The module is particularly created for the rapid implementation of accurate forecasts at scale. The novelty of using this approach lies in the execution of 42 different forecasting models and an ensemble of them automatically and within an integral framework.

Regarding the case study, Iran, with an area of approximately 1,648,000 km2, is located in the southwest of Asia and lies roughly between 25N and 40N in latitude and between 44E and 64E in longitude. Karun, Dez, Karkhe, Jarahi, and Maroon are the main rivers of this country, which are all located in Khuzestan province (Emamgholizadeh et al. 2014). The Karun River, with a length of 950 km and a catchment expanse of 67,000 km2, is the longest and largest river by discharge in Iran. This only navigable river in the country collects runoff from extensive regions and conveys it to the Persian Gulf (Golshan et al. 2020). The river serves as a source of hydroelectric power generation, irrigation (covering an area of more than 280,000 ha), potable water supply for several cities, and also as a critical commercial waterway. However, in recent decades, significant contamination and ecological destruction have occurred due to the discharge of industrial, agricultural, and domestic wastewater into the Karun without appropriate (standard) treatment (Noori et al. 2010).

Despite the significance of the Karun River, limited studies have been conducted to investigate its long-term spatiotemporal variation in water quality. In two earlier studies, Naddafi et al. (2007) traced the changes in water quality along the Karun River by monitoring two stations of Gotvand and Khorramshahr from 1967 to 2005. Furthermore, a statistical technique, namely factor analysis, was employed by Zarei & Pourreza Bilondi (2013) to evaluate temporal variations in the Karun water quality from 1976 to 2005 in Gotvand station. Besides, a long-term (1968–2015) evaluation of water quality parameters for the Karun River was conducted by Mahmoodabadi & Rezaei Arshad (2018), which was limited to a single station, Ahvaz. Moreover, a univariate water quality (electrical conductivity) evaluation of the Karun was carried out by Ahmadmoazzam et al. (2017) spatially and temporally from 1968 to 2014 for six different stations (Gotvand, Shushtar-Gargar, Mollasani, Arab-Asad, Ahvaz, and Darkhoveyn). Additionally, the Ebadati & Hooshmandzadeh study group (2019), focusing exclusively on the Mollasani station, reported the water quality data over a period of 49 years for the Karun River.

Based on the literature review, there is no up-to-date long-term evaluation of the Karun River water quality, which includes both spatial and temporal studies on a comprehensive set of water quality indicators (WQIs). The current study presents an intuitive Pythonic framework for long-term spatiotemporal river water quality monitoring, analysis, and forecasting, using the Karun water quality dataset from 1985 to 2020. Accordingly, six WQIs, including total dissolved solids (TDS), electrical conductivity (EC), pH, calcium (Ca), magnesium (Mg), and sodium (Na), were taken into consideration. These indicators were measured at four different hydrometric stations along the river from upstream to downstream, i.e., Mollasani (STN. 1), Ahvaz (STN. 2), Farsiat (STN. 3), and Darkhoveyn (STN. 4).

The present study aims to streamline the long-term spatiotemporal river water quality assessment and forecasting by the exertion of user-centered Python-based methods and modules. In the first place, we performed data provisioning and temporal analysis. Subsequently, spatial analysis was carried out by visualizing two types of graphs: (i) the Violin and Box plots, and (ii) the annual mean values of the Karun monthly dataset, to draw a water quality comparison among the four surveyed hydrometric stations. As mentioned, the temporal analysis process is much more complicated than the spatial one, so the majority of this paper is devoted to the temporal water quality evaluation.

Study area and data source

The Karun River (∼48.2°E and ∼52.5°E, ∼30.2°N and ∼33.5°N) originates from the Zagros mountains, passes through the Ahvaz metropolis, and finally flows into the Persian Gulf. For the present investigation, four hydrometric stations along the river, namely, Mollasani (STN. 1), Ahvaz (STN. 2), Farsiat (STN. 3), and Darkhoveyn (STN. 4), in order from upstream to downstream, were considered to measure the values of six riverine WQIs including TDS (mg/L), EC (μS/cm), pH, Ca (mg/L), Mg (mg/L), and Na (mg/L). Figure 1 shows the geographic locations and elevations of the four stations surveyed.
Figure 1

Location and elevation of the four surveyed hydrometric stations along the Karun River.

Figure 1

Location and elevation of the four surveyed hydrometric stations along the Karun River.

Close modal

The data obtained over a period of 36 years (1985–2020) was taken into account on a monthly basis to perform a spatiotemporal evaluation of the Karun River water quality. Table 1 presents basic descriptive statistics, including maximum, minimum, mean, standard deviation (SD), and number of the quality data. It is worth mentioning that the complete form of the dataset containing monthly sampling data will be available upon request.

Table 1

Descriptive statistics (maximum, minimum, mean, SD, and number) of the water quality dataset

STN.WQIMax.Min.MeanSDn
Mollasani TDS (mg/L) 1,722 470 997.5 288.2 327 
EC (μS/cm) 2,690 716 1,533.4 448.8 324 
pH (-) 8.6 7.1 7.9 0.3 379 
Ca (mg/L) 159.7 53.1 93.4 24 339 
Mg (mg/L) 74.6 10.9 32.8 10.1 338 
Na (mg/L) 388.1 55.2 179.6 71.6 336 
Ahvaz TDS (mg/L) 1,850 478 1,073.6 317 326 
EC (μS/cm) 2,920 750 1,689.5 499.6 326 
pH (-) 8.6 7.2 7.9 0.3 361 
Ca (mg/L) 181 46.1 100 30.7 342 
Mg (mg/L) 69.3 10.3 36.7 11.8 330 
Na (mg/L) 388.1 69 199.2 73.2 332 
Farsiat TDS (mg/L) 1,983 530 1,093.1 316.3 327 
EC (μS/cm) 2,770 820 1,706 480.7 326 
pH (-) 8.6 7.2 7.9 0.3 383 
Ca (mg/L) 173.3 52.1 100.9 29.5 332 
Mg (mg/L) 72.9 13.4 37.6 12 331 
Na (mg/L) 411.5 64.4 201.1 73.7 326 
Darkhoveyn TDS (mg/L) 2,310 520 1,191.6 470.1 327 
EC (μS/cm) 3,610 855 1,871.2 742.3 328 
pH (-) 8.6 7.2 7.9 0.3 346 
Ca (mg/L) 187.8 53.1 107.8 35 331 
Mg (mg/L) 83.3 17 41.3 15.8 339 
Na (mg/L) 540.3 57.5 235.4 124.4 328 
STN.WQIMax.Min.MeanSDn
Mollasani TDS (mg/L) 1,722 470 997.5 288.2 327 
EC (μS/cm) 2,690 716 1,533.4 448.8 324 
pH (-) 8.6 7.1 7.9 0.3 379 
Ca (mg/L) 159.7 53.1 93.4 24 339 
Mg (mg/L) 74.6 10.9 32.8 10.1 338 
Na (mg/L) 388.1 55.2 179.6 71.6 336 
Ahvaz TDS (mg/L) 1,850 478 1,073.6 317 326 
EC (μS/cm) 2,920 750 1,689.5 499.6 326 
pH (-) 8.6 7.2 7.9 0.3 361 
Ca (mg/L) 181 46.1 100 30.7 342 
Mg (mg/L) 69.3 10.3 36.7 11.8 330 
Na (mg/L) 388.1 69 199.2 73.2 332 
Farsiat TDS (mg/L) 1,983 530 1,093.1 316.3 327 
EC (μS/cm) 2,770 820 1,706 480.7 326 
pH (-) 8.6 7.2 7.9 0.3 383 
Ca (mg/L) 173.3 52.1 100.9 29.5 332 
Mg (mg/L) 72.9 13.4 37.6 12 331 
Na (mg/L) 411.5 64.4 201.1 73.7 326 
Darkhoveyn TDS (mg/L) 2,310 520 1,191.6 470.1 327 
EC (μS/cm) 3,610 855 1,871.2 742.3 328 
pH (-) 8.6 7.2 7.9 0.3 346 
Ca (mg/L) 187.8 53.1 107.8 35 331 
Mg (mg/L) 83.3 17 41.3 15.8 339 
Na (mg/L) 540.3 57.5 235.4 124.4 328 

Proposed approach

Outlier detection

Within the collected water quality dataset, ‘outliers’ are the observations that are numerically distant from the rest of the data; hence, they are of high potential to cause deviations in the results of the time series analysis. Thus, their identification and elimination are crucial before stepping into the water quality data evaluation process. In the present study, a user-centered PyOD (PyOD 2.0.1 documentation 2024; Zhao et al. 2019) was employed to detect and remove the outliers from the WQIs dataset. PyOD is an open-source Python library for performing scalable outlier detection on univariate and multivariate data. One of the most substantial advantages of this library is that it provides access to a wide range of detection algorithms such as Ensemble Clustering Outlier Detection (ECOD), Median Absolute Deviation (MAD), Stochastic Outlier Selection (SOS), Quantile-based Minimum Covariance Determinant (QMCD), and k-NN. The algorithm used in this research, i.e., k-NN, is a non-parametric supervised machine learning (ML)-based method that utilizes proximity to make classifications about the grouping of an individual data point (Beckmann et al. 2015; Wang et al. 2022a).

Time series decomposition

Hydrological data collected over time can display a variety of patterns. In order to understand hidden patterns, it has always been helpful to break down the time series into its components. A time series mainly consists of three decomposed forms: a trend, a seasonality, and a residual (remainder). The term ‘trend’ represents the consistent upward or downward movement in the series over an extended period. The component ‘seasonality’ refers to a time series in which there are regular changes that occur during a certain period of time, while ‘residuals’ remain after fitting the model. In our proposed framework, we used Statsmodels (a handy Python module that provides classes and functions for the estimation of many different statistical models) (Seabold & Perktold 2010; Statsmodels 0.14.1 2024) to decompose the WQIs time series into their components by the STL decomposition technique. This technique uses locally fitted regression models to decompose a time series (Cleveland et al. 1990).

Forecasting

Forecasting the quality of water bodies entails predicting the future state of miscellaneous WQIs, including physical, chemical, and biological factors, based on existing available data. These predictions play a vital role in assisting decision-making processes associated with water resource management, public health, and environmental protection (Ubah et al. 2021). There exist multiple methods of forecasting used across various disciplines and industries, namely, time series analysis, regression analysis, ML, qualitative and expert judgment, ensemble methods, scenario analysis, Delphi method, market research and surveys, simulation and modeling, and data mining (Petropoulos et al. 2022).

In this study, we employed an auto-forecasting system, namely the AutoTS from Python (Wang et al. 2022b), that is specifically designed for rapidly deploying high-accuracy forecasts at scale. Accordingly, 42 models from the module (listed in Table S1), along with their ensembles, were tested in the automated learning stage, and the best-fitting model was considered. The invaluable advantage of the auto-forecasting library is that the system is capable of implementing and cross-validating the models. To learn about the details and features of the applied models, refer to the AutoTS Extended documentation (Intro – AutoTS 0.6.13 documentation 2024). The computational error of forecasting models was calculated using three different metrics: (1) the symmetric mean absolute percentage error (sMAPE), (2) mean absolute error (MAE), and (3) scaled pinball loss (SPL). As the values obtained for sMAPE and MAE decrease, the forecasting results become more reliable. The sMAPE and MAE values were calculated as follows (Palani et al. 2008; Lu et al. 2023):
(1)
(2)
where WQIF and WQIA represent the forecasted and actual water quality indicators, respectively. Also, the SPL calculates the difference between the forecasted range and the actual range for each data point. The SPL is higher when the forecasted range is far from the actual range. The value for SPL was obtained by the following equation (Chung et al. 2021b):
(3)
where q is the quantile level, which is assumed to be 0.50 in the applied models.

Spatial tracing

Following the elimination of the outliers (part 2.2.1.), Violin and Box plots, alongside the annual mean value diagrams, were drawn to compare the variations in WQIs among four sampling stations. With the Violin and Box plots, the distribution of the data around five fundamental descriptive statistical points, i.e., minimum, maximum, first and third quartiles, and median, became visible and comparable. Moreover, the relative dispersion patterns of the contaminants along the river were revealed using the annual mean value curves. Figure 2 outlines the framework proposed in this study for the rapid analysis and forecasting of spatiotemporal variations in river water quality.
Figure 2

The proposed approach for the rapid spatiotemporal assessment and forecasting of river water quality.

Figure 2

The proposed approach for the rapid spatiotemporal assessment and forecasting of river water quality.

Close modal

Detected outliers

Outliers in the collected data were detected and removed to conduct a precise time series analysis. The outlier detection results of STN. 1 are shown in Figure 3. In this illustration, the red points represent the observations that lie at an abnormal distance from other values in the dataset. As it is discernible, within a certain time span, adjacent points determine whether an observation is an abnormal measurement or not; i.e., the dataset of the problem is considered as a nearest neighbors-dependent function and not as a set of discrete values in the whole duration of 36 years. For instance, in Figure 3(a), a TDS level of 960 mg/L was detected as an outlier in May 1985, but the same amount for TDS in September 1989 was not recognized as an outlier (these two points are shown by black arrows in Figure 3(a)). Such a scenario was applied to all WQIs at all stations. Figure 3(b)–3(f) demonstrate the results of outlier detection for EC, pH, Ca, Mg, and Na, respectively, in STN. 1. Furthermore, Figures S1S–S3 show the outliers of six WQIs belonging to STN. 2, STN. 3, and STN. 4, sequentially.
Figure 3

Detected outliers in the dataset of (a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na at STN. 1.

Figure 3

Detected outliers in the dataset of (a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na at STN. 1.

Close modal

Basic descriptive statistics, mean and SD, were adopted to explore the impact of removing outliers on the dataset. In that, the mean value and SD of the dataset for TDS at STN. 1 were found to be 1,043.2 and 363.7 mg/L before and 997.5 and 288.2 mg/L after the outlier removal process, respectively. Also, the outlier elimination for EC at STN. 1 led to a 6.1 and 21.7% reduction in the mean and SD, sequentially. Moreover, at STN. 1, Ca experienced 7.8 and 34.7%, Mg 6.5 and 22.7%, and Na 9.1% and 20.8% of alteration in their mean and SD, respectively. In contrast, the outlier detection and removal implementation had no impact on the dataset of pH; accordingly, the calculated values for the mean and SD for pH remained at 7.9 and 0.3 at STN.1. Similar results were obtained for other sampling stations, i.e., the mean values and standard deviations for all WQIs shifted before and after the application of outlier removal, except for pH (data not shown here).

We compared the performance of the PyOD with two other similar research works that specifically investigated outlier detection approaches for a river water quality dataset. Accordingly, Talagala et al. (2019) introduced an automated procedure, namely oddwater (an open-source R package), which includes eight different detection algorithms. Besides, Di Blasi et al. (2013) applied the functional depth method. The results of both studies are significant and in accordance with the expectations; however, the paths used to achieve the results are either confined to a limited number of detection models (eight algorithms) or not user-centered (composed of complicated mathematical modeling). The proposed module in this study (PyOD) has the potential to use more than 50 various detection algorithms, including individuals and ensembles, to be applied depending on the type of problem. Moreover, the intuitive interface that this toolkit provides for the researchers removes the necessity for specialized mathematical and programming skills.

Decomposed time series

A detailed analysis, i.e., decomposition of the time series into its components (trend, seasonal, and residual), was carried out to unveil the hidden patterns in the data. As discussed, a trend is a pattern indicating the movement from relatively high (low) to relatively low (high) values over a long period of time. Seasonal fluctuations occur when a series demonstrates periodic variations (for example, every month, quarter, or year), and the remainder component contains anything else in the time series. Figure 4 illustrates the decomposition results of the TDS and EC time series for STN. 1. The three components (trend, seasonal, and residual) are graphed separately in the bottom three panels of each section. These components can be added together to reconstruct the data shown in the top panel (observed).
Figure 4

Time series decomposition of the dataset of TDS (a) and EC (b) at STN. 1.

Figure 4

Time series decomposition of the dataset of TDS (a) and EC (b) at STN. 1.

Close modal

Figure 4 reveals the presence of a significant upward trend in the dataset of TDS and EC, suggesting serious changes in the water quality of the Karun River throughout this study (1985–2020). A similar increasing trend was found for Ca, Mg, and Na (data not shown here). Conversely, based on the considerable downward trend in the pH level of the Karun (data not shown here), the water of this river was becoming acidic during these 36 years.

A discernible annual periodic behavior is observed in these two indicators (TDS and EC), reflecting the influence of cyclic patterns of hydrological phenomena on the river environment. The pattern in the ‘seasonal’ panels barely changes over time, implying similar behavior in the periodic fluctuations of TDS and EC in the 36-year records. Moreover, given that the seasonal variations exhibit equal magnitude, the additive decomposition model (versus the multiplicative decomposition) proved to be the appropriate model for decomposing the WQIs dataset.

The residual component, shown in the bottom panel (Figure 4), represents what remains after subtracting the seasonal and trend components from the observed data. The same outcomes and evaluations were achieved for other WQIs within STN. 1 (data not shown here). Figures S4–S6 demonstrate the decomposition results of the WQIs associated with STN. 2, STN. 3, and STN. 4, respectively. The obtained patterns and outcomes for these three stations were found to be similar to STN. 1.

Forecasted values (from 2018 to 2020)

Considering the increasing pollution of the Karun River, reliable prediction of its water quality is critical for environmental protection, safeguarding human health, preserving ecosystems, guiding agricultural and industrial practices, and formulating effective policies and regulations. In the present study, we employed and experimented with a fast and straightforward automated time series module, i.e., the AutoTS library in Python, to forecast six WQIs of the Karun River from 2018 to 2020 (the last 36 months). The module was fed with the monthly dataset from the previous eight years (2010–2017) for the automated training/testing stage. Subsequently, the performance of the module was evaluated by comparing the actual data of the last 3 years (2018–2020). Accordingly, 42 models from the AutoTS library were called and examined. Finally, a combination of independent models (ensemble) demonstrated the best performance in forecasting the WQIs values.

The accuracy of the forecasted results was subject to appraisal by calculating the sMAPE parameter. Accordingly, the smaller the sMAPE value, the better the accuracy of the forecasting model; to wit, the model can be categorized as either excellent sMAPE ≤ 10%, good (10% < sMAPE ≤ 20%), acceptable (20% < sMAPE ≤ 50%), or inaccurate (50% < sMAPE) (Aldrees et al. 2023). Moreover, two additional error metrics, MAE and SPL, were considered to further inspect the accuracy of the model. In the scale of the target value, a lower MAE and SPL indicate better predictive performance.

In Figure 5, the actual and forecasted values for the six WQIs of STN. 1 are illustrated in black and green colors, respectively. Based on this figure, pH, Ca, TDS, EC, Na, and Mg showed the highest to lowest prediction accuracy with sMAPE values of 1.75% (excellent), 13.00% (good), 18.32% (good), 20.20% (acceptable), 20.25% (acceptable), and 33.68% (acceptable), sequentially. The high level of prediction accuracy for pH is mainly attributed to the low fluctuation range in its dataset (due to its measurement based on a logarithmic scale), while for Mg, the high rate of fluctuation came at the cost of low accuracy. Specifically, the coefficient of variation (CV) for pH and Mg was measured as 3.6 and 33.3%, respectively, making them the lowest and highest CV values in the dataset. A strong correlation between CV value and prediction accuracy was also established for the other WQIs of STN. 1 (data not stated here).
Figure 5

Actual versus forecasted values of (a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na at STN. 1 for the years 2018, 2019, and 2020.

Figure 5

Actual versus forecasted values of (a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na at STN. 1 for the years 2018, 2019, and 2020.

Close modal

Figures S7–S9 depict the forecasting results for STN. 2, STN. 3, and STN. 4, respectively. According to these figures, the highest prediction accuracy was also associated with pH in the other three stations (Ahvaz, Farsiat, and Darkhoveyn), while the lowest accuracy belonged to Mg for the discussed reason. There was only one exception, which was related to STN. 4, where the fluctuations in the Na dataset were greater than the rest of the indicators, and this led to the lowest prediction accuracy. Hence, the CV parameter (relative spread) of the WQIs' dataset played the most crucial role in determining the accuracy level of the forecasting model. However, contributors such as multiple pollution sources and their variation, complex behaviors of the pollutants, and the involvement of unexpected climatic and hydrological phenomena strongly influence the future state of the Karun River water quality and affect the results of the predictive model.

In comparison to the studies conducted by Nouraki et al. (2021) and Salari et al. (2021) on the prediction of the Karun water quality parameters using ML methods, the present research offers an improved forecasting framework by eliminating the need for manual data division during the training/testing process. Moreover, Emamgholizadeh et al. (2014) examined three separate forecasting models, namely multi-layer perceptron, radial basis network, and adaptive neuro-fuzzy inference systems, without combining the models (no ensemble), to forecast the WQIs of the Karun, while the automated forecasting system proposed in this research applied an ensemble of forecasting models autonomously. Consequently, there is no need to manually divide the data for the training/testing process, and implementing an ensemble of forecasting models automatically made the AutoTS framework a rapid tool for predicting the water quality of the Karun River.

Spatial patterns

The distribution of the WQI values around the minimum, maximum, median, and quartiles (25–75%) of the dataset was inspected using Violin and Box plots (Figure 6). Accordingly, the wider sections of the Violin plot correspond to a higher probability of the WQI holding a given value, while narrower sections represent a lower probability. From Figure 6(a), TDS values were found to be highly concentrated around the median and the first quartile (25%) across all stations. Also, the STN. 4 hydrometric station displayed a significantly larger interquartile range (represented by the black bar in the center of the Violin), indicating a wider fluctuation in TDS values. As expected, Violin and Box plots of TDS and EC (Figure 6(b)) demonstrated somewhat similar patterns. On the contrary, according to Figure 6(c), pH values were strongly clustered around the third quartile (75%) in all four stations, suggesting an alkaline tendency along the river. Figure 6(d)–6(f) illustrate Violin and Box plots associated with WQIs, Ca, Mg, and Na, respectively. These plots, similar to the TDS and EC plots, displayed a higher probability of recurrence for the values around the median and the first quartile in stations. The shape of the four Violin plots in each WQI is almost the same, demonstrating consistency in the distribution, and therefore form, of the data among the four sampling stations.
Figure 6

Violin and Box plot of six WQIs ((a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na) in four hydrometric stations.

Figure 6

Violin and Box plot of six WQIs ((a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na) in four hydrometric stations.

Close modal
The annual mean value of six WQIs in four stations is compared in Figure 7. Based on the diagrams in this figure, water quality at the Darkhoveyn station is considerably lower than it is at the three stations upstream. There are two main reasons for this: (1) Darkhoveyn is located 85 km downstream of the metropolis of Ahvaz and is also the most downriver station; (2) due to the station's altitude (1 m above sea level) and proximity to the sea, tides affect flow rate and direction, at times even reversing it. This can lead to the accumulation of both pollutants and dissolved and suspended solids. Moreover, according to Figure 7, the dispersion pattern of the contaminants was found to be almost equal across the first three stations (Mollasani, Ahvaz, and Farsiat). The reason mainly lies in the approximate equality in the discharge of these three stations (data not shown here). As expected, the highest level of quality is at the most upstream station, STN. 1. Ultimately, in terms of spatial study, the Karun River water quality steadily decreased from STN. 1 to STN. 3, and the critical decline occurred at STN. 4. The only WQI that remained relatively constant among the four stations was pH. It implies that the factors that could affect pH (both natural and man-made) were consistent along the river.
Figure 7

Annual mean value comparison of six WQIs ((a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na) measured in four hydrometric stations.

Figure 7

Annual mean value comparison of six WQIs ((a) TDS, (b) EC, (c) pH, (d) Ca, (e) Mg, and (f) Na) measured in four hydrometric stations.

Close modal

The present study focused on a rapid spatiotemporal river water quality evaluation by applying user-centered Python-based modules, namely PyOD (for outlier detection), Statsmodels (for time series decomposition), and AutoTS (for water quality prediction). For the case study, we measured six riverine WQIs, including TDS, EC, pH, Ca, Mg, and Na in the Karun, the largest river in Iran. A dataset of 36 years of sampling (1985–2020) from four hydrometric stations along the river, i.e., Mollasani, Ahvaz, Farsiat, and Darkhoveyn, was considered. In the spatial analysis, Violin and Box plots visualization, and the annual mean value of WQIs comparison were taken into account to illustrate the water quality patterns along the Karun River.

The following highlighted conclusions are presented for this paper:

  • The PyOD package provides an easy-to-use and quickly-executable framework for researchers and hydrologists to spot abnormalities in a set of water quality data. In this study, we demonstrated that the k-NN algorithm of the PyOD module is a potent method to trap and remove outliers from the dataset of the Karun River. Accordingly, the five WQIs, including TDS, EC, Ca, Mg, and Na, were highly sensitive to the outlier's removal. This means outlier isolation had a significant impact on the results of statistical analyses for these five WQIs. On the contrary, the outlier removal had a negligible impact on the dataset of pH, implying that there were relatively smooth fluctuations in the dataset of pH during the 36 years.

  • Water quality time series data decomposition using the Statsmodels unveiled underlying information about the long-term trends, seasonal variations, and irregular fluctuations in the WQIs dataset. Accordingly, long-term trends indicated a significant increase in TDS, EC, and Ca over the past four decades. In addition, an annual periodic behavior in all WQIs was discernable, reflecting the influence of cyclic hydrological phenomena on the river environment. The STL technique of the Statsmodels is a practical and rapid method for decomposing a river water quality time series into its components. This method can be applied to any set of data, but meaningful results are only attained if a recurring temporal pattern exists in the data.

  • The auto-forecasting system, including 42 self-standing forecasting models and their ensembles as a whole package (AutoTS), was found to be promising in predicting the future water quality state of the river based on the obtained error metrics values. This toolkit eliminates the need for manual training/testing in the forecasting process. The highest and lowest forecast accuracy for the Karun belonged to the WQIs with the minimum and maximum values for CV, respectively, indicating the impression of the fluctuation level of the dataset on the forecasting accuracy.

  • The Violin and Box plots, as well as annual mean values graphs, helped us to present a clear picture of WQIs' changes along the river. A nearly consistent distribution pattern of the WQIs' values throughout the first three stations and a significant discrepancy (more contaminated) at Darkhoveyn was found in the spatial tracing. Besides, according to the Violin plots, the water quality values were clustered around the median of their datasets.

The authors gratefully acknowledge the support and funding provided by the Shahrood University of Technology (grant number: 26631) to conduct this research.

A.A. conceptualized the project, developed the methodology, performed in work, rendered support in software and programming, validated and visualized the process, wrote the original draft. S.E. and B.C. supervised the study, support in project administration, conceptualized the whole process, developed the methodology, wrote and reviewed and edited the article. E.Z. rendered support in data provisioning and project administration.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmadmoazzam
M.
,
Saki Malehi
A.
,
Jorfi
S.
,
Ramavandi
B.
&
Ahmadi
M.
(
2017
)
Evaluation of spatial and temporal variation in Karun River water quality during five decades (1968–2014)
,
Environmental Quality Management
,
27
,
71
75
.
https://doi.org/10.1002/TQEM.21526
.
Ahmadmoazzam
M.
,
Birgani
Y. T.
,
Molla-Norouzi
M.
&
Dastoorpour
M.
(
2021
)
Assessment of the water quality of Karun River catchment using artificial neural networks-self-organizing maps and K-means algorithm
,
Journal of Environmental Accounting and Management
,
9
,
43
58
.
https://doi.org/10.5890/JEAM.2021.03.005
.
Alberdi Igartua
X.
,
Rodriguez-Iruretagoiena
A.
,
Gredilla
A.
,
Fdez-Ortiz de Vallejuelo
S.
,
Arana
G.
,
de Diego
A.
&
Madariaga
J. M.
(
2024
)
Geographical distribution of metals and metalloids along the estuary of the Oka River in the biosphere reserve of Urdaibai, Spain
,
Marine Pollution Bulletin
,
199
,
116010
.
https://doi.org/10.1016/J.MARPOLBUL.2023.116010
.
Aldrees
A.
,
Javed
M. F.
,
Bakheit Taha
A. T.
,
Mustafa Mohamed
A.
,
Jasiński
M.
&
Gono
M.
(
2023
)
Evolutionary and ensemble machine learning predictive models for evaluation of water quality
,
Journal of Hydrology Regional Studies
,
46
,
101331
.
https://doi.org/10.1016/j.ejrh.2023.101331
.
Arabameri
A.
,
Alavi Moghaddam
M. R.
,
Azadmehr
A. R.
&
Karamati-Niaragh
E.
(
2023
)
Determination of optimal operating conditions for AC-powered electrocoagulation process coupling green additive tartaric acid to remove Ni2+: pyomo and RSM approach
,
Journal of Environmental Management
,
330
,
117152
.
https://doi.org/10.1016/J.JENVMAN.2022.117152
.
Barakat
A.
,
El Baghdadi
M.
,
Rais
J.
,
Aghezzaf
B.
&
Slassi
M.
(
2016
)
Assessment of spatial and seasonal water quality variation of Oum Er Rbia River (Morocco) using multivariate statistical techniques
,
International Soil and Water Conservation Research
,
4
,
284
292
.
https://doi.org/10.1016/J.ISWCR.2016.11.002
.
Beckmann
M.
,
Ebecken
N. F. F.
&
Lima
B. S. L. P. d.
(
2015
)
A KNN undersampling approach for data balancing
,
Journal of Intelligent Learning Systems and Applications
,
7
,
104
116
.
https://doi.org/10.4236/JILSA.2015.74010
.
Casillas-García
L. F.
,
de Anda
J.
,
Yebra-Montes
C.
,
Shear
H.
,
Díaz-Vázquez
D.
&
Gradilla-Hernández
M. S.
(
2021
)
Development of a specific water quality index for the protection of aquatic life of a highly polluted urban river
,
Ecological Indicators
,
129
,
107899
.
https://doi.org/10.1016/J.ECOLIND.2021.107899
.
Cheng
B.
,
Zhang
Y.
,
Xia
R.
,
Wang
L.
,
Zhang
N.
&
Zhang
X.
(
2021
)
Spatiotemporal analysis and prediction of water quality in the Han River by an integrated nonparametric diagnosis approach
,
Journal of Cleaner Production
,
328
,
129583
.
https://doi.org/10.1016/J.JCLEPRO.2021.129583
.
Chintalapudi
V. K.
,
Kanamarlapudi
R. K. S. L.
,
Mallu
U. R.
&
Muddada
S.
(
2022
)
Characterization of biosorption potential of Brevibacillus biomass isolated from contaminated water resources for removal of Pb (II) ions
,
Water Science & Technology
,
85
,
2358
2374
.
https://doi.org/10.2166/WST.2022.110
.
Chung
M. G.
,
Frank
K. A.
,
Pokhrel
Y.
,
Dietz
T.
&
Liu
J.
(
2021
)
Natural infrastructure in sustaining global urban freshwater ecosystem services
,
Nature Sustainability
,
4
,
1068
1075
.
https://doi.org/10.1038/S41893-021-00786-4
.
Chung
Y.
,
Neiswanger
W.
,
Char
I.
&
Schneider
J.
(
2021b
)
Beyond pinball loss: quantile methods for calibrated uncertainty quantification
,
Advances in Neural Information Processing Systems
,
14
,
10971
10984
.
Cleveland
R. B.
,
Cleveland
W. S.
,
McRae
J. E.
&
Terpenning
I.
(
1990
)
STL: a seasonal-trend decomposition procedure based on loess
,
Journal of Official Statistics
,
6
(
1
),
3
73
.
Deng
C.
,
Liu
L.
,
Li
H.
,
Peng
D.
,
Wu
Y.
,
Xia
H.
,
Zhang
Z.
&
Zhu
Q.
(
2021
)
A data-driven framework for spatiotemporal characteristics, complexity dynamics, and environmental risk evaluation of river water quality
,
Science of The Total Environment
,
785
,
147134
.
https://doi.org/10.1016/J.SCITOTENV.2021.147134
.
Di Blasi
J. I. P.
,
Martínez Torres
J.
,
García Nieto
P. J.
,
Alonso Fernández
J. R.
,
Díaz Muñiz
C.
&
Taboada
J.
(
2013
)
Analysis and detection of outliers in water quality parameters from different automated monitoring stations in the Miño river basin (NW Spain)
,
Ecological Engineering
,
60
,
60
66
.
https://doi.org/10.1016/J.ECOLENG.2013.07.054
.
Dilekoğlu
M. F.
,
Abdulhaq
H. A.
&
Hussein
M. H.
(
2024
)
Assessment of water quality in the Great Zab River (Erbil city, Iraq) exposed to wastewater from different industries using the WQI and GIS-based Python script approaches
,
Water Quality Research Journal
,
59
(
4
),
308
326
.
https://doi.org/10.2166/wqrj.2024.032
.
Dong
W.
,
Zhang
Y.
,
Zhang
L.
,
Ma
W.
&
Luo
L.
(
2023
)
What will the water quality of the Yangtze River be in the future?
,
Science of The Total Environment
,
857
,
159714
.
https://doi.org/10.1016/J.SCITOTENV.2022.159714
.
Ebadati
N.
&
Hooshmandzadeh
M.
(
2019
)
Water quality assessment of river using RBF and MLP methods of artificial network analysis (case study: Karoon River Southwest of Iran)
,
Environmental Earth Sciences
,
78
,
551
.
https://doi.org/10.1007/S12665-019-8472-0
.
Emamgholizadeh
S.
,
Kashi
H.
,
Marofpoor
I.
&
Zalaghi
E.
(
2014
)
Prediction of water quality parameters of Karoon river (Iran) by artificial intelligence-based models
,
International Journal of Environmental Science and Technology
,
11
,
645
656
.
https://doi.org/10.1007/S13762-013-0378-X/METRICS
.
Garces
H.
&
Sbarbaro
D.
(
2009
)
Outliers detection in environmental monitoring data
,
IFAC Proceedings Volumes
,
42
,
330
335
.
https://doi.org/10.3182/20091014-3-CL-4011.00060
.
Georgescu
P. L.
,
Moldovanu
S.
,
Iticescu
C.
,
Calmuc
M.
,
Calmuc
V.
,
Topa
C.
&
Moraru
L.
(
2023
)
Assessing and forecasting water quality in the Danube River by using neural network approaches
,
Science of The Total Environment
,
879
,
162998
.
https://doi.org/10.1016/J.SCITOTENV.2023.162998
.
Gil-Rodas
N.
,
Guevara-Mora
M.
,
Rivas
G.
,
Dávila
G.
,
García
D.
,
Contreras-Perdomo
A.
,
Alvizures
P.
,
Martínez
M.
&
Calvo-Brenes
G.
(
2023
)
A comparative study of several types of indices for river quality assessment
,
Water Quality Research Journal
,
58
,
169
183
.
https://doi.org/10.2166/wqrj.2023.029
.
Golshan
M.
,
Dastoorpour
M.
,
Birgani
Y. T.
,
Golshan
M.
,
Dastoorpour
M.
&
Birgani
Y. T.
(
2020
)
Fuzzy environmental monitoring for the quality assessment: detailed feasibility study for the Karun River basin, Iran
,
Groundwater for Sustainable Development
,
10
,
100324
.
https://doi.org/10.1016/J.GSD.2019.100324
.
Guo
J.
,
Bian
R.
,
Guan
A.
,
Cao
X.
,
Peng
J.
,
Wu
X.
,
Wang
D.
,
Qi
W.
,
Liu
H.
&
Qu
J.
(
2023
)
Assessment and identification of primary factors controllingYangtze river water quality
,
ACS ES&T Water
,
3
,
1329
1340
.
https://doi.org/10.1021/ACSESTWATER.2C00645
.
Hien Than
N.
,
Dinh Ly
C.
&
Van Tat
P.
(
2021
)
The performance of classification and forecasting Dong Nai River water quality for sustainable water resources management using neural network techniques
,
Journal of Hydrology
,
596
,
126099
.
https://doi.org/10.1016/J.JHYDROL.2021.126099
.
Hyndman
R. J.
&
Athanasopoulos
G.
(
2015
)
Forecasting: Principles and Practice
, 3rd edn.
Melbourne, Australia
: OTexts, Aust. OTexts.com/fpp3 292.
Intro – AutoTS 0.6.13 documentation
(
2024
) .
Kurunç
A.
,
Yürekli
K.
&
Çevik
O.
(
2005
)
Performance of two stochastic approaches for forecasting water quality and streamflow data from Yeşilιrmak River, Turkey
,
Environmental Modelling & Software
,
20
,
1195
1200
.
https://doi.org/10.1016/J.ENVSOFT.2004.11.001
.
Kushwaha
N. L.
,
Kudnar
N. S.
,
Vishwakarma
D. K.
,
Subeesh
A.
,
Jatav
M. S.
,
Gaddikeri
V.
,
Ahmed
A. A.
&
Abdelaty
I.
(
2024
)
Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India
,
Heliyon
,
10
,
e31085
.
https://doi.org/10.1016/j.heliyon.2024.e31085
.
Liu
M.
&
Lu
J.
(
2014
)
Support vector machine – an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?
,
Environmental Science and Pollution Research
,
21
,
11036
11053
.
https://doi.org/10.1007/S11356-014-3046-X
.
Liu
J.
,
Wang
P.
,
Jiang
D.
,
Nan
J.
&
Zhu
W.
(
2020
)
An integrated data-driven framework for surface water quality anomaly detection and early warning
,
Journal of Cleaner Production
,
251
,
119145
.
https://doi.org/10.1016/J.JCLEPRO.2019.119145
.
Locke
K. A.
(
2024
)
Impacts of land use/land cover on water quality: a contemporary review for researchers and policymakers
,
Water Quality Research Journal
,
59
,
89
106
.
https://doi.org/10.2166/wqrj.2024.002
.
Long
Y.
,
Song
L.
,
Shu
Y.
,
Li
B.
,
Peijnenburg
W.
&
Zheng
C.
(
2023
)
Evaluating the spatial and temporal distribution of emerging contaminants in the Pearl River basin for regulating purposes
,
Ecotoxicology and Environmental Safety
,
257
,
114918
.
https://doi.org/10.1016/J.ECOENV.2023.114918
.
Lu
S.
,
Chen
Y.
,
Duan
X.
&
Yin
S.
(
2023
)
Rainfall erosivity estimation models for the Tibetan Plateau
,
Catena
,
229
,
107186
.
https://doi.org/10.1016/j.catena.2023.107186
.
Mahmoodabadi
M.
&
Rezaei Arshad
R.
(
2018
)
Long-term evaluation of water quality parameters of the Karoun River using a regression approach and the adaptive neuro-fuzzy inference system
,
Marine Pollution Bulletin
,
126
,
372
380
.
https://doi.org/10.1016/J.MARPOLBUL.2017.11.051
.
Mdegela
L.
,
De Bock
Y.
,
Luhanga
E.
,
Leo
J.
&
Mannens
E.
(
2023
)
Monitoring Kikuletwa River levels in northern Tanzania: a data set unlocking insights for effective flood early warning systems
,
Data in Brief
,
49
,
109395
.
https://doi.org/10.1016/J.DIB.2023.109395
.
Mijošek
T.
,
Kljaković-Gašpić
Z.
,
Kralj
T.
,
Valić
D.
,
Redžović
Z.
,
Šariri
S.
,
Karamatić
I.
&
Filipović Marijić
V.
(
2023
)
Spatial and temporal variability of dissolved metal(loid)s in water of the karst ecosystem: consequences of long-term exposure to wastewaters
,
Environmental Technology & Innovation
,
32
,
103254
.
https://doi.org/10.1016/J.ETI.2023.103254
.
Mokua
N.
,
Maina
C. W.
&
Kiragu
H.
(
2021
)
Anomaly detection for Raw water quality – a comparative analysis of the local outlier factor algorithm and the random forest algorithms
,
International Journal of Computer Applications Appl.
,
174
,
47
54
.
https://doi.org/10.5120/IJCA2021921196
.
Naddafi
K.
,
Honari
H.
&
Ahmadi
M.
(
2007
)
Water quality trend analysis for the Karoon River in Iran
,
Environmental Monitoring and Assessment
,
134
,
305
312
.
https://doi.org/10.1007/S10661-007-9621-6/METRICS
.
Nafsin
N.
&
Li
J.
(
2021
)
Using CANARY event detection software for water quality analysis in the Milwaukee river
,
Journal of Hydro-Environment Research
,
38
,
117
128
.
https://doi.org/10.1016/J.JHER.2021.06.003
.
Noori
R.
,
Sabahi
M. S.
,
Karbassi
A. R.
,
Baghvand
A.
&
Zadeh
H. T.
(
2010
)
Multivariate statistical analysis of surface water quality based on correlations and variations in the data set
,
Desalination
,
260
,
129
136
.
https://doi.org/10.1016/J.DESAL.2010.04.053
.
Nouraki
A.
,
Alavi
M.
,
Golabi
M.
&
Albaji
M.
(
2021
)
Prediction of water quality parameters using machine learning models: a case study of the Karun River, Iran
,
Environmental Science and Pollution Research
,
28
,
57060
57072
.
https://doi.org/10.1007/S11356-021-14560-8/METRICS
.
Palani
S.
,
Liong
S. Y.
&
Tkalich
P.
(
2008
)
An ANN application for water quality forecasting
,
Marine Pollution Bulletin
,
56
,
1586
1597
.
https://doi.org/10.1016/j.marpolbul.2008.05.021
.
Petropoulos
F.
,
Apiletti
D.
,
Assimakopoulos
V.
,
Babai
M. Z.
,
Barrow
D. K.
,
Ben Taieb
S.
,
Bergmeir
C.
,
Bessa
R. J.
,
Bijak
J.
,
Boylan
J. E.
,
Browell
J.
,
Carnevale
C.
,
Castle
J. L.
,
Cirillo
P.
,
Clements
M. P.
,
Cordeiro
C.
,
Cyrino Oliveira
F. L.
,
De Baets
S.
,
Dokumentov
A.
,
Ellison
J.
,
Fiszeder
P.
,
Franses
P. H.
,
Frazier
D. T.
,
Gilliland
M.
,
Gönül
M. S.
,
Goodwin
P.
,
Grossi
L.
,
Grushka-Cockayne
Y.
,
Guidolin
M.
,
Guidolin
M.
,
Gunter
U.
,
Guo
X.
,
Guseo
R.
,
Harvey
N.
,
Hendry
D. F.
,
Hollyman
R.
,
Januschowski
T.
,
Jeon
J.
,
Jose
V. R. R.
,
Kang
Y.
,
Koehler
A. B.
,
Kolassa
S.
,
Kourentzes
N.
,
Leva
S.
,
Li
F.
,
Litsiou
K.
,
Makridakis
S.
,
Martin
G. M.
,
Martinez
A. B.
,
Meeran
S.
,
Modis
T.
,
Nikolopoulos
K.
,
Önkal
D.
,
Paccagnini
A.
,
Panagiotelis
A.
,
Panapakidis
I.
,
Pavía
J. M.
,
Pedio
M.
,
Pedregal
D. J.
,
Pinson
P.
,
Ramos
P.
,
Rapach
D. E.
,
Reade
J. J.
,
Rostami-Tabar
B.
,
Rubaszek
M.
,
Sermpinis
G.
,
Shang
H. L.
,
Spiliotis
E.
,
Syntetos
A. A.
,
Talagala
P. D.
,
Talagala
T. S.
,
Tashman
L.
,
Thomakos
D.
,
Thorarinsdottir
T.
,
Todini
E.
,
Trapero Arenas
J. R.
,
Wang
X.
,
Winkler
R. L.
,
Yusupova
A.
&
Ziel
F.
(
2022
)
Forecasting: theory and practice
,
International Journal of Forecasting
,
38
,
705
871
.
https://doi.org/10.1016/J.IJFORECAST.2021.11.001
.
PyOD 2.0.1 documentation
(
2024
)
Available at: https://pyod.readthedocs.io/en/latest/ (Accessed: 19 July 2024)
.
Qian
Y.
,
Shang
Y.
,
Zheng
Y.
,
Jia
Y.
&
Wang
F.
(
2023
)
Temporal and spatial variation of microplastics in Baotou section of Yellow River, China
,
Journal of Environmental Management
,
338
,
117803
.
https://doi.org/10.1016/J.JENVMAN.2023.117803
.
Qian
Q.
,
He
M.
,
Sun
F.
&
Liu
X.
(
2024
)
Monitoring and evaluation of the water quality of the Lower Neches River, Texas, USA
,
Water Science and Engineering
,
17
,
21
32
.
https://doi.org/10.1016/J.WSE.2023.10.002
.
Sajan
R. I.
&
Christopher
V. B.
(
2023
)
A fuzzy inference system for enhanced groundwater quality assessment and index determination
,
Water Quality Research Journal
,
58
,
230
246
.
https://doi.org/10.2166/wqrj.2023.031
.
Salari
M.
,
Teymouri
E.
&
Nassaj
Z.
(
2021
)
Application of an artificial neural network model for estimating of water quality parameters in the Karun River, Iran
,
Journal of Environmental Treatment Techniques
,
9
,
720
727
.
https://doi.org/10.47277/JETT/9(4)727
.
Seabold
S.
&
Perktold
J.
(
2010
) ‘
Statsmodels econometric and modeling with Python
’,
9th Python in Science Conference
.
Austin, 28 June–3 July, 2010
, pp.
57
61
.
Singh
S.
,
Rai
S.
,
Singh
P.
&
Mishra
V. K.
(
2022
)
Real-time water quality monitoring of river Ganga (India) using internet of things
,
Ecological Informatics
,
71
,
101770
.
https://doi.org/10.1016/J.ECOINF.2022.101770
.
Statsmodels 0.14.1
(
2024
)
Available at: https://www.statsmodels.org/stable/index.html (Accessed: 20 July 20 2024)
.
Talagala
P. D.
,
Hyndman
R. J.
,
Leigh
C.
,
Mengersen
K.
&
Smith-Miles
K.
(
2019
)
A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors
,
Water Resources Research
,
55
,
8547
8568
.
https://doi.org/10.1029/2019WR024906
.
Ubah
J. I.
,
Orakwe
L. C.
,
Ogbu
K. N.
,
Awu
J. I.
,
Ahaneku
I. E.
&
Chukwuma
E. C.
(
2021
)
Forecasting water quality parameters using artificial neural network for irrigation purposes
,
Scientific Reports
,
11
,
1
13
.
https://doi.org/10.1038/s41598-021-04062-5
.
Uddin
M. G.
,
Rahman
A.
,
Rosa Taghikhah
F.
&
Olbert
A. I.
(
2024
)
Data-driven evolution of water quality models: an in-depth investigation of innovative outlier detection approaches – a case study of Irish Water Quality Index (IEWQI) model
,
Water Research
,
255
,
121499
.
https://doi.org/10.1016/J.WATRES.2024.121499
.
Vellingiri
J.
,
Kalaivanan
K.
,
Gopinath
M. P.
,
Gobinath
C.
,
Subramaniam
P. R.
&
Rangarajan
S.
(
2023
)
Strategies for classifying water quality in the Cauvery river using a federated learning technique
,
International Journal of Cognitive Computing in Engineering
,
4
,
187
193
.
https://doi.org/10.1016/J.IJCCE.2023.04.004
.
Wang
H.
,
Xu
P.
&
Zhao
J.
(
2022a
)
Improved KNN algorithms of spherical regions based on clustering and region division
,
Alexandria Engineering Journal
,
61
,
3571
3585
.
https://doi.org/10.1016/J.AEJ.2021.09.004
.
Wang
C.
,
Chen
X.
,
Wu
C.
&
Wang
H.
(
2022b
)
AutoTS: automatic time series forecasting model design based on Two-Stage pruning,
arXiv Preprint ArXiv:2203.14169
.
https://doi.org/10.48550/arXiv.2203.14169
.
Wu
H.
&
Chen
Q.
(
2023
)
An integrated approach using multi-source data for effective pollution risk monitoring of urban rivers: a case study of Hangzhou
,
Water Science & Technology
,
88
,
454
467
.
https://doi.org/10.2166/WST.2023.223
.
Wu
J.
,
Cheng
S. P.
,
He
L. Y.
,
Wang
Y. C.
,
Yue
Y.
,
Zeng
H.
&
Xu
N.
(
2023
)
Assessing water quality in the Pearl river for the last decade based on clustering: characteristic, evolution and policy implications
,
Water Research
,
244
,
120492
.
https://doi.org/10.1016/J.WATRES.2023.120492
.
Yin
Y.
,
Xia
R.
,
Chen
Y.
,
Jia
R.
,
Zhong
N.
,
Yan
C.
,
Hu
Q.
,
Li
X.
&
Zhang
H.
(
2023
)
Non-steady state fluctuations in water levels exacerbate long-term and seasonal degradation of water quality in river-connected lakes
,
Water Research
,
242
,
120247
.
https://doi.org/10.1016/J.WATRES.2023.120247
.
Yu
J. W.
,
Kim
J. S.
,
Li
X.
,
Jong
Y. C.
,
Kim
K. H.
&
Ryang
G. I.
(
2022
)
Water quality forecasting based on data decomposition, fuzzy clustering and deep learning neural network
,
Environmental Pollution
,
303
,
119136
.
https://doi.org/10.1016/J.ENVPOL.2022.119136
.
Zarei
H.
&
Pourreza Bilondi
M.
(
2013
)
Factor analysis of chemical composition in the Karoon river basin, southwest of Iran
,
Applied Water Science
,
34
,
753
761
.
https://doi.org/10.1007/S13201-013-0123-0
.
Zhao
Y.
,
Nasrullah
Z.
&
Li
Z.
(
2019
)
PyOD: a python toolbox for scalable outlier detection
,
Journal of Machine Learning Research
,
20
,
1
7
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-ND 4.0), which permits copying and redistribution with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nd/4.0/).

Supplementary data