ABSTRACT
Accurate river flow prediction is critical for sustainable water resource management, particularly in arid and semi-arid regions. However, balancing model accuracy, computational efficiency, and interpretability remains a significant challenge due to the complex and nonlinear nature of hydrological systems. This study employs a granular computing (GRC) model to predict monthly inflows to the Alavian Dam in Iran. Principal component analysis (PCA) was used to reduce input dimensionality, identifying six key variables to enhance computational performance. The predictive performance of GRC was compared with artificial neural networks (ANNs) and support vector machines (SVMs), using R2, RMSE, and MAE as evaluation metrics. The GRC model achieved R2 values of 0.93 during calibration and 0.94 during validation, outperforming both ANN and SVM. Notably, GRC demonstrated superior accuracy in capturing extreme flow events, which are crucial for flood and drought management. This advantage is attributed to its rule-based structure and local learning approach, which enables effective modeling of nonlinearities and sparse data. Furthermore, the interpretability of the GRC model - facilitated by its use of granules and transparent if-then rules - offers valuable insights into variable influence. These strengths highlight GRC as a reliable and efficient tool for hydrological forecasting and climate-adaptive water resource planning.
HIGHLIGHTS
This study presents granular computing (GRC) as an innovative approach for monthly inflow prediction, surpassing traditional models like artificial neural network and support vector machine in both accuracy and interpretability.
With R2 values of 0.93 and 0.94 during calibration and validation, GRC demonstrated superior predictive performance while avoiding overfitting.
Unlike black-box models, GRC ensured consistent, interpretable, and reliable predictions.
INTRODUCTION
The sustainable management of water resources has become one of the most critical global challenges, especially in regions facing diminishing supplies due to rapid population growth and increasing demand across domestic, agricultural, and industrial sectors. Accurate river flow prediction is essential for effective water resource planning, helping decision-makers allocate resources more efficiently and mitigate risks associated with both water scarcity and extreme hydrological events. In developing countries, where water infrastructure is often limited, effective runoff modeling becomes particularly important in risk reduction efforts. Multiple runoff modeling techniques exist, and enhancing these models through accurate calibration is essential to address growing water demands and the intensifying impacts of climate change on hydrological systems (Jodhani et al. 2023a, b, 2024; Mazandarani Zadeh et al. 2023; Alisoltani et al. 2024; Vyas et al. 2024; Heshmati et al. 2025).
Over the past decades, researchers have developed a variety of approaches to address the complex nature of river flow prediction. These range from traditional regression methods and conceptual hydrological models to advanced black-box techniques such as artificial neural networks (ANN) and support vector machines (SVM). For instance, Fallah Kalaki et al. (2025) evaluated weather and river flow predictions in the Karun River Basin using the North American Multi-Model Ensemble (NMME) system in combination with the Soil & Water Assessment Tool (SWAT) model. The study demonstrated improved long-term forecasting performance, particularly in spring and autumn, by integrating multiple NMME models with statistical downscaling techniques such as multiple linear regression and K-nearest neighbors.
Neuro-fuzzy models have also gained attention for their potential to combine the strengths of fuzzy logic and neural networks. Early work by Roger & Gulley (1995) introduced this hybrid approach, showing its effectiveness in capturing complex relationships through both rule-based reasoning and adaptive learning. Further applications by Coulibaly et al. (2000), Nayak et al. (2004), and Farokhnia & Morid (2010) demonstrated the superior performance of neuro-fuzzy systems compared to traditional ANN models, particularly in reducing output uncertainty. Other studies, such as those by Anusree & Varghese (2016) and Nath et al. (2020), explored enhancements to Adaptive Neuro-Fuzzy Inference System (ANFIS) models using optimization algorithms like particle swarm optimization (PSO), leading to improved predictive accuracy and computational efficiency.
In addition to ANN and neuro-fuzzy models, SVMs have been successfully applied in river flow forecasting due to their solid theoretical foundation and capacity for handling nonlinear data. Notable contributions include those by Asefa et al. (2005), Yu et al. (2006), and He et al. (2014), who reported satisfactory results in seasonal and short-term flow prediction tasks. Complementary to this, Noori et al. (2009; 2010, 2011a) highlighted the benefits of preprocessing techniques such as principal component analysis (PCA) and wavelet transforms in improving model performance.
Despite the promise of ANN, SVM, and fuzzy models, these techniques often function as black boxes, offering limited insight into the physical relationships among input parameters. Their inability to interpret nonlinear hydrological behavior, especially during extreme events such as floods and droughts, remains a key limitation. Furthermore, their performance typically degrades when faced with unstable or highly uncertain input data. These challenges underscore the need for models that combine high accuracy and efficiency with interpretability and robustness across varying conditions.
In response to these challenges, granular computing (GRC) has emerged as a promising framework for modeling complex environmental systems. Compared to conventional data-driven models, GRC offers notable advantages, including better interpretability, flexible handling of sparse or imprecise data, and rule-based knowledge structuring. These attributes make GRC particularly effective in scenarios involving uncertainty, such as estimating the longitudinal dispersion coefficient in rivers. Hybrid models that integrate GRC with neural networks have shown enhanced performance without sacrificing model transparency, positioning GRC as a compelling alternative to black-box methods in environmental applications.
The GRC approach defines computational models through classes, clusters, groups, and intervals, making it well-suited for large, complex datasets (Zhao et al. 2007). It utilizes multi-criteria decision-making to extract classification rules while minimizing inconsistency, thereby reducing problem uncertainty. However, its limitation lies in the fixed interval assumption, which can restrict flexibility. In recent years, GRC has been applied to water science problems, including the estimation of riverbed particle dimensions and longitudinal dispersion coefficients, with encouraging results reported by Naghikhani et al. (2015), Noori et al. (2017a, b), and Ghiasi et al. (2019, 2022).
This study adopts the GRC model – a method more common in electrical engineering than in water resource applications – for streamflow prediction. Given the supervised nature of the current problem, unsupervised methods such as self-organizing maps, hierarchical clustering, and semi-supervised models are unsuitable. Similarly, evolutionary algorithms and reinforcement learning techniques are not optimal for developing a predictive model in this context, although they can support pattern recognition tasks. Among supervised learning models, linear and logistic regression approaches lack the complexity-handling capabilities required here, while decision-tree-based models risk providing suboptimal, localized results (Maimon & Rokach 2008; Ben-Gal et al. 2014). Neural networks, though powerful, are often non-interpretable, computationally intensive, and sensitive to initialization conditions. In contrast, the GRC model provides a consistent and interpretable framework where model behavior aligns with its underlying assumptions and is not dependent on random weight initialization. Furthermore, its precision can be fine-tuned through parameter adjustment, unlike neural networks that may yield variable outputs across different runs.
GRC is particularly suitable for predicting monthly inflow to the Alavian Dam, where input factors such as precipitation, climatic variability, and hydrological conditions are subject to high uncertainty and fluctuation. Traditional models may fall short in such settings, whereas GRC's hierarchical structure aids in uncovering hidden patterns within the data, enhancing both predictive accuracy and computational performance. By analyzing information at varying levels of granularity, GRC simplifies the processing of large datasets – an essential feature in big-data-driven hydrological modeling. Moreover, appropriate parameterization of the GRC model helps mitigate overfitting risks (Sheikhian et al. 2017).
Despite the high predictive power of ANN and SVM models, their lack of interpretability and high computational demand present limitations. GRC overcomes these issues by decomposing data into granules and establishing interpretable interrelationships. This transparency fosters greater stakeholder trust and enables more informed decision-making. Additionally, GRC's hierarchical design better accommodates the inherent uncertainty and complexity of hydrological data compared to black-box models, and it reduces computational costs through its efficient problem decomposition.
Nevertheless, GRC does face challenges, such as scalability and memory inefficiency when applied to extremely large datasets. Addressing these limitations requires the development of optimized algorithms, implementation of parallel processing techniques, and use of advanced computational infrastructures.
Consistent with recent research practices, PCA is employed in this study as a dimensionality reduction technique to manage a high number of input variables and improve computational efficiency. By projecting the data into a lower-dimensional space, PCA retains essential patterns while streamlining the model training process. This preprocessing step was applied prior to implementing the GRC model. Its effectiveness depends on the data's structure and variability. Following PCA, the GRC model is applied alongside ANN and SVM models to predict precipitation and river flow, with particular attention given to their performance in extreme flow conditions. The objective of this study is to evaluate the accuracy, robustness, and interpretability of the GRC model in comparison to conventional machine learning approaches.
MATERIALS AND METHODS
Study area
The collected data were obtained from the synoptic station of Maragheh. The average river discharge data were obtained from the hydrometric station in Tazkand, located upstream of the Alavian Dam. These stations have provided data between the years 1983 to 2005 due to their long-term functionality. The relevant information, constituting 18 variables with three-time lags, forms the foundational data for this study. Figure 1 illustrates the location of this watershed and the corresponding station in East Azerbaijan Province. The objective of the calculations conducted in this article is to predict future discharge using the required dataset through a granular computation method. Due to the large number of input variables, PCA, as performed by Noori et al. (2011b), was employed to select and incorporate the primary and most significant variables into the model. Based on this, six input variables were considered for the models to predict discharge at time t + 1, including maximum temperature at t − 2, solar radiation at time t − 1, discharge at time t, discharge at time t − 1, precipitation at time t, and precipitation at time t − 2. A summary of the statistical information for the data used in this study is provided in Table 1.
Statistical summary data used
Variables . | Rt−2 (mm) . | R (mm) . | Qt−1 (m3/s) . | Q (m3/s) . | Radt−1 (cal/cm2) . | Tmaxt−2 (°C) . | Qt+1 (m3/s) . |
---|---|---|---|---|---|---|---|
Maximum | 101.20 | 101.34 | 23.200 | 23.200 | 609.9 | 35.000 | 23.200 |
Median | 20.90 | 20.97 | 1.570 | 1.570 | 372.3 | 18.600 | 1.570 |
Minimum | 0.00 | 0.00 | 0.149 | 0.149 | 130.2 | −1.400 | 0.149 |
Variance | 566.73 | 625.01 | 20.043 | 20.737 | 22242.7 | 114.271 | 20.915 |
St Dev | 23.81 | 25.00 | 4.477 | 4.554 | 149.1 | 10.690 | 4.573 |
SE Mean | 1.62 | 1.71 | 0.305 | 0.311 | 10.2 | 0.729 | 0.312 |
Mean | 25.85 | 26.67 | 3.623 | 3.683 | 382.9 | 18.223 | 3.713 |
Skewness | 0.86 | 0.93 | 2.20 | 2.14 | −0.02 | −0.05 | 2.11 |
Variables . | Rt−2 (mm) . | R (mm) . | Qt−1 (m3/s) . | Q (m3/s) . | Radt−1 (cal/cm2) . | Tmaxt−2 (°C) . | Qt+1 (m3/s) . |
---|---|---|---|---|---|---|---|
Maximum | 101.20 | 101.34 | 23.200 | 23.200 | 609.9 | 35.000 | 23.200 |
Median | 20.90 | 20.97 | 1.570 | 1.570 | 372.3 | 18.600 | 1.570 |
Minimum | 0.00 | 0.00 | 0.149 | 0.149 | 130.2 | −1.400 | 0.149 |
Variance | 566.73 | 625.01 | 20.043 | 20.737 | 22242.7 | 114.271 | 20.915 |
St Dev | 23.81 | 25.00 | 4.477 | 4.554 | 149.1 | 10.690 | 4.573 |
SE Mean | 1.62 | 1.71 | 0.305 | 0.311 | 10.2 | 0.729 | 0.312 |
Mean | 25.85 | 26.67 | 3.623 | 3.683 | 382.9 | 18.223 | 3.713 |
Skewness | 0.86 | 0.93 | 2.20 | 2.14 | −0.02 | −0.05 | 2.11 |
In this study, PCA was employed not only for dimensionality reduction but also to enhance the performance of the GRC model. High-dimensional input spaces in hydrological modeling often lead to overfitting, reducing the generalizability of traditional models. By applying PCA, we identified the most informative features while maintaining computational efficiency within the GRC framework.
Unlike previous studies where PCA was primarily used as a preprocessing step for neural networks or regression models, this research integrates PCA within a rule-based GRC approach, representing an innovative application in hydrological modeling. The findings indicate that this integration not only improves predictive accuracy but also enhances model interpretability by filtering out noise from less relevant variables.
The values required for predicting future discharge are presented in Table 1. In this table, calculated values for maximum, median, minimum, variance, standard deviation, mean absolute error, mean, and skewness are provided. These values can be compared for different time periods. Considering the availability of these values for past precipitation, radiation, temperature, and discharge, the corresponding values for future discharge are obtained, which are necessary for calculations in subsequent stages.
Granular computing




In Equation (5), represents the true positive rate for the class
.
The extraction of patterns in the GRC model from the data involves three main components. In the first part, possible rules are extracted. In the second part, rules of higher quality are selected from the set of extracted rules in the previous stage. In the third part, the rules are prioritized. The strength of the GRC model lies in the criteria used to determine and rank the quality of rules. These rules are expressed in Equations (2)–(5). Rules with higher absolute support, higher coverage, and higher generality have higher confidence, while rules with higher conditional inconsistency have lower confidence. In the parameter of absolute support, values range between zero and one, with values close to one being desirable and expected. This metric essentially indicates the degree of support the body of a rule provides for its output class. A set of rules reflects the level of support for a specific output class and also the support level for the selected set of rules from the set of classes or existing output values. Essentially, this parameter indicates the likelihood of a specific output occurring under specific conditions, and therefore, it does not definitively assign an output to the input conditions. The coverage parameter expresses the support of an output class from the set of input values. It indicates, if a particular output class is chosen, the probability with which different rules will be invoked. Considering both the absolute support and coverage parameters, the strength of the two-way relationship between input values and a specific output class is determined. The generality of a rule reflects the extent of inclusion of the rule in the training dataset. The higher the generality of a rule, the more occurrences it has for a greater number of samples, and thus, it carries a higher level of confidence. Rules with very high generality often lack precise information and are disregarded by considering absolute support and coverage.
Conditional inconsistency measures the degree of inconsistency of a rule within the set of rules. According to this metric, the accuracy of a rule may be called into question in cases where the extracted rule, valid for a particular dataset, is considered an exception. Such a rule, labeled as an exception, lacks compatibility with the other extracted rules and its utilization may introduce ambiguity into the created model. This parameter is one of the strengths of the model in distinguishing a reliable and accurate model, not just one that is precise (Sheikhian et al. 2015).
The GRC algorithm first extracts possible rules from the data and then calculates the mentioned parameters for each. To isolate accurate and reliable rules from the extracted set of rules, the algorithm removes rules with an inconsistency greater than 0.5 and absolute support less than 0.7. Next, it ranks the rules based on higher values of generality and coverage. For predicting an output value for input data, the model utilizes one or more rules with higher rankings to ensure both prediction accuracy and precision at satisfactory levels (Sheikhian et al. 2017).
RESULTS AND DISCUSSION
The GRC model utilized six input variables to predict future flow rates, with river discharge at the time (t + 1) serving as the output parameter. A total of 215 data points were processed, of which 145 data points (70% of the dataset) were allocated for calibration, and 70 data points (30% of the dataset) were reserved for validation. The calibration phase involved tuning the GRC model parameters to optimize accuracy, while the validation phase tested the model's performance on unseen data.
Relationship between observed and estimated discharge values during calibration.
Relationship between observed and estimated discharge values during validation.
The sensitivity analysis indicated that river discharge at time t (Q(t)) was the most influential variable in predicting inflow at t + 1. This finding aligns with the well-established temporal dependency in river discharge, where past discharge levels strongly influence future inflow rates. Additionally, solar radiation (Rad(t − 1)) and discharge at t− 1 emerged as critical predictors, underscoring the role of evaporation and groundwater contributions in hydrological processes.
Notably, precipitation at t − 2 had a relatively minor impact on inflow predictions. This suggests that the hydrological response to precipitation is primarily governed by immediate conditions rather than past rainfall patterns. The weaker influence of older precipitation data can likely be attributed to watershed storage effects, where infiltration and delayed runoff processes mitigate the direct impact of earlier rainfall events.
These results emphasize the importance of incorporating recent hydrological variables in inflow prediction models. They further suggest that approaches focusing on short-term dependencies may achieve greater predictive accuracy than those relying on extended precipitation trends.
Performance evaluation of the GRC model: R2, RMSE, and MAE statistics
Statistical index . | ANN . | SVM . | GRC . | |||
---|---|---|---|---|---|---|
Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |
R2 | 0.92 | 0.81 | 1 | 0.82 | 0.93 | 0.94 |
RMSE (m3/s) | 1.36 | 1.69 | 0.04 | 1.69 | 1.33 | 0.87 |
MAE (m3/s) | 0.89 | 1.05 | 0.03 | 0.99 | 0.87 | 0.54 |
Statistical index . | ANN . | SVM . | GRC . | |||
---|---|---|---|---|---|---|
Calibration . | Validation . | Calibration . | Validation . | Calibration . | Validation . | |
R2 | 0.92 | 0.81 | 1 | 0.82 | 0.93 | 0.94 |
RMSE (m3/s) | 1.36 | 1.69 | 0.04 | 1.69 | 1.33 | 0.87 |
MAE (m3/s) | 0.89 | 1.05 | 0.03 | 0.99 | 0.87 | 0.54 |
Heatmap of performance evaluation metrics for SVM, ANN, and GRC methods in the calibration and validation steps.
Heatmap of performance evaluation metrics for SVM, ANN, and GRC methods in the calibration and validation steps.
Comparison of R2 and RMSE values for three SVM, ANN, and GRC methods in the calibration and validation steps.
Comparison of R2 and RMSE values for three SVM, ANN, and GRC methods in the calibration and validation steps.
Residual plot (differences between observed and estimated discharge values) versus the estimated discharge for GRC methods.
Residual plot (differences between observed and estimated discharge values) versus the estimated discharge for GRC methods.
As evident in Figure 6, in the SVM method, the statistic value becomes 1 in the calibration section, but in the validation section, this value has reached 0.820. On the other hand, in the ANN method, the statistic shows 0.920 in the calibration section and adopts the value of 0.810 in the validation section. However, in the GRC method, the statistic is 0.930 in the calibration section and 0.940 in the validation section. This comparison indicates that the GRC method has operated more consistently in both calibration and validation sections and is, on average, closer to reality than the other two methods excessive training with the SVM method has led to a reduction in the MAE value to nearly zero for the validation section. However, in the testing section, the value of this statistic is 0.990. For the ANN method, the MAE value is 0.890 in the validation section and 1.050 in the testing section. However, for the GRC method in the calibration and validation sections, the MAE values are 0.870 and 0.540, respectively. Here again, the GRC method has performed significantly better than the other methods.
Also in Figure 5, for the calibration section, the SVM method has achieved a value close to zero, and for the validation section, this value is 0.691. For the ANN method, the RMSE value is 1.360 in the calibration section and 0.691 in the validation section. In the GRC method, the RMSE values for the calibration and validation sections are 1.330 and 0.870, respectively. Thus, once again, the GRC method has outperformed the other two methods. Overall, it appears that the SVM model performs weakly when faced with new data for estimation. Also, the difference between the calibration and validation stages in the GRC model compared to other methods is very small. This indicates that the GRC model has a very good performance and is not heavily dependent on training data. The reason for the superior performance of the GRC model compared to SVM and ANN models overall is due to its performance in both stages, lack of strong dependance on training data, and the absence of a black-box-like behavior in the model, which leads to reduced uncertainty.
Unlike ANN and SVM, the GRC model offers interpretable rules that provide insights into the underlying relationships between input and output variables. This transparency is particularly advantageous for water resource management, as it enables decision-makers to understand and trust the model's predictions. The model's robustness and interpretability make it well-suited for practical applications, including flood forecasting, reservoir management, and climate adaptation strategies.
Based on the residual plot (Figure 7), the model demonstrates good performance for most predicted values, with no systematic pattern observed in the errors. However, the high concentration of points at lower Q(Est) values suggests that the model may have more data points in this range, with residuals close to zero. Additionally, the random dispersion of residuals indicates the model's overall adequacy.
Evaluation of model accuracy at extreme flow values
Comparative evaluation of GRC, ANN, and SVM Models for extreme streamflow prediction.
Comparative evaluation of GRC, ANN, and SVM Models for extreme streamflow prediction.
Complementary error metrics, including RMSE and MAE, further confirm the advantage of GRC. For low extreme flows, GRC records the lowest RMSE (0.51) and MAE (0.38), while maintaining superior performance for high extremes with RMSE and MAE values of 1.87 and 1.16, respectively. These quantitative results underscore the robustness of the GRC model in modeling nonlinear and unstable streamflow dynamics.
The observed superiority of the GRC model can be attributed to its rule-based structure, which enables it to decompose the input space into multiple localized granules. This granulation process allows the model to capture subtle variations and nonlinear dependencies within the data – particularly critical in extreme hydrological conditions where conventional global modeling approaches often fail. Moreover, GRC's flexibility in rule generation and selection promotes adaptability to data uncertainty and sparsity, which are common challenges in hydrological modeling. This local learning capability contrasts with the ANN and SVM models, which rely more heavily on global approximations and often struggle with overfitting or poor generalization in edge-case scenarios.
Therefore, incorporating GRC techniques not only enhances predictive accuracy but also provides greater model stability and interpretability, especially under extreme flow conditions. These results suggest that GRC offers a reliable and effective framework for hydrological forecasting applications, including flood and drought risk assessment.
Interpretability of the GRC model
A key advantage of the GRC model over black-box approaches such as ANNs and SVMs is its interpretability. Unlike ANN and SVM, which rely on intricate mathematical transformations and weight distributions that are often challenging to interpret, the GRC model organizes data into meaningful granules. Each granule represents a subset of similar data points, facilitating a clearer understanding of how input variables influence predictions.
In this study, the GRC model identified distinct granules based on key input variables, including river discharge at time t, solar radiation, and precipitation levels. These granules were then utilized to generate if-then rules, enhancing the model's transparency. For instance, a rule such as:
‘If precipitation at t − 1 is high and discharge at t exceeds the median threshold, then inflow at t + 1 is likely to be high’
CONCLUSION
Effective water resource management is becoming increasingly critical in addressing challenges related to water scarcity, especially in arid and semi-arid regions such as Iran. Accurate prediction of river flow is a key component in ensuring optimal allocation, operational planning, and disaster mitigation. In this study, the GRC approach was used to model monthly inflows to the Alavian Dam. PCA was applied to identify the most influential input variables, reducing the data dimensionality and enhancing model efficiency without compromising accuracy. The GRC model achieved high predictive performance, with R2 values of 0.93 and 0.94 during calibration and validation, respectively, outperforming both ANN and SVM models.
Importantly, the GRC model demonstrated a superior ability to predict extreme flow conditions. Unlike ANN and SVM, which often struggle with data sparsity or overfitting in such cases, GRC's rule-based and granule-oriented structure allowed it to effectively model localized and nonlinear relationships. This makes it particularly valuable for flood and drought forecasting, where capturing rare events is crucial. The model's transparency and interpretability – through explicit rule formulation – also offer practical advantages for operational use. Water managers and policymakers can gain insight not only into the outputs but also into the driving input variables, enabling informed and adaptive decision-making.
Looking ahead, integrating GRC with optimization algorithms (e.g., Genetic Algorithm (GA), PSO) and real-time monitoring systems can further enhance its adaptability to changing hydrological patterns. Additionally, GRC's potential for integration with climate models positions it as a promising tool for long-term water resource planning under climate change scenarios. Overall, the findings suggest that GRC offers a balanced solution combining accuracy, efficiency, and interpretability, and can play a pivotal role in sustainable water management and hydroclimatic resilience planning.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.