Abstract
The teleconnection modeling of hydro-climatic events is a complex problem with highly uncertain circumstances. In contrast to the classic fuzzy logic methods, by using the Z-number in addition to the constraint of information, and by evaluating the data reliability, it is possible to characterize the degree of ambiguity of data. In this regard, this study investigates the performance of the Z-number-based model (ZBM) in prediction of classified monthly precipitation (MP) events of two synoptic stations in Iran (up to five months in advance). To this end, the sea surface temperature (SST) of adjacent seas was used as a predictor. The suggested model, by using Z-number directly and applying fuzzy Hausdorff distance to determine weights of if-then rules, predicted MP events of both the stations with over 70% confidence. Analysis of the results in the test step showed that the ZBM compared to the traditional fuzzy approach improved the results by 69% for Kermanshah and 112% for Tabriz. Overall, the Z-number concept by assessing events reliability can be used in various sectors of water resources management such as decision-making and drought monitoring.
HIGHLIGHTS
In this study, the performance of the Z-number-based model in teleconnection modeling is investigated.
In contrast to classic fuzzy logic, Z-numbers consist of both restraint and reliability of data.
The performance of the Z-number-based model and the conventional fuzzy model are compared.
The Z-number-based model can predict classified monthly precipitation based on SST variations.
INTRODUCTION
Oceanic-atmospheric teleconnection patterns could affect hydro-climatic events over large distances across the world. The accurate prediction of hydro-climatic events (such as maximum precipitation or drought events) can help decision-makers to improve planning to mitigate the adverse impacts and take advantage of beneficial conditions (Dhanya & Nagesh Kumar 2009; Moser & Hart 2015). From the early 1900s, various climatic and oceanic parameters had been used as predictors for hydro-climatic events prediction. Thus, if the association of the hydro-climatic events with the climatic and oceanic parameters is identified, this can be used for designing an effective risk management system for facing the extremes of adverse impacts (Webster et al. 1998).
Given the significance of the hydro-climatic events, previous studies have looked into the effects of large-scale ocean-atmospheric factors on these events. For example, the influence of persistent positive phases of the North Atlantic Oscillation (NAO) on Romania's drought was reported by Stefan et al. (2004). In the research by Ghasemi & Khalili (2008), the wet conditions in Iran were found to be characterized by a negative SST anomaly in the Mediterranean and the Black Sea, while dry conditions were found to be characterized by a positive SST anomaly in the Mediterranean and the Black Sea. The substantial link between southwest Iran's streamflow and the Mediterranean Sea's sea surface temperature (SST) was reported by Meidani & Araghinejad (2014). The findings of this study revealed that utilizing SST (as a predictor of streamflow in southwest Iran) produced improved outcomes compared to using other indices like NAO. The influence of regional SST variations on tropic rainfall was recently reported by Ying et al. (2019). In such studies, the ocean-atmospheric factors have been found to be important in coping with hydro-climatic occurrences.
Traditional approaches (e.g., linear and nonlinear regression or correlation) were employed in the above-mentioned studies (as well as the majority of prior efforts) merely to uncover the possible teleconnection between hydro-climatic parameters, and almost no prediction has been made. However, to deal with large-scale hydro-climatic events with highly uncertain circumstances (e.g., maximum monthly precipitation) and to predict their long-term states, fuzzy logic approaches might be a good alternative to traditional methods (Dhanya & Nagesh Kumar 2009; Nourani et al. 2021). Teleconnection patterns among hydro-climatic factors are complicated, and precise forecasting of their future conditions is challenging; in such situations, fuzzy logic might partially represent such uncertainty. Fuzzy logic has been increasingly utilized to describe complex systems in recent decades, due to its high ability to cope with the uncertainty of systems, which seems to be prevalent in hydro-climatic issues (e.g. see, (Ashrafi et al. 2019; Malik et al. 2019). To model with fuzzy logic methods, it is essential to determine the if-then rules. The construction of if-then rules is a difficult task due to the intricacy and uncertainty of far-away teleconnection mechanisms. In this regard, data mining (e.g., association mining) can be an appropriate manner for extracting the patterns and the construction of if-then rules (Tadesse et al. 2004). For example, Dadaser-Celik et al. (2013) utilized association mining to investigate the connections between streamflow and meteorological factor for Kzlrmak River Basin in Turkey.
The main flaw of the traditional fuzzy-based technique is that it struggles to deal with the ambiguous situations that are common in real-world scenarios (Aliev et al. 2016; Glukhoded & Smetanin 2016; Zadeh 2011). Since traditional fuzzy techniques merely contain restrictions and do not give reliability, it is important to discuss the reliability of the studied data. In this regard, researchers are now interested in Zadeh's proposal of Z-number, introduced in 2011. The Z-number is a pair of fuzzy numbers ordered and indicated by the symbol Z=(A, B). The first element, A, sets a constraint on the ambiguous variable X. The second element, B, is a degree of the reliability of the first element. The majority of known approaches for mathematical computations on such linguistic variables have focused on turning Z-numbers into conventional fuzzy numbers (Glukhoded & Smetanin 2016; Kang et al. 2018). However, they may miss valuable information, and they might not be applicable for all fuzzy numbers. In this study, by applying the concept of Z+-numbers, a Z-number is directly used for computation ( Zadeh 2011; Aliev et al. 2016). Also, the concept of fuzzy Hausdorff distance is applied to allocate weights to the rules (Aliev et al. 2016). The information loss is reduced in this method, but it is more difficult than traditional fuzzy logic and needs non-linear optimization procedures. As a result, a comprehensive comparison between the suggested Z-number-based model (ZBM) and the classical fuzzy logic method is needed to confirm the effectiveness and validity of the suggested model. Regarding contributions and innovations, this research, by evaluating the data reliability, investigated the use of SSTs as predictors to predict the MP events up to five months in advance. To this end, the association mining tool was used to extract (explain) the teleconnection pattern between SSTs and MP events. In the suggested ZBM, by applying fuzzy Hausdorff distance, a Z-number was directly used for computation.
In this study, a system was designed to predict the long-term MP events of two stations in northwestern Iran (up to five months in advance). In this regard, the ZBM was employed and a comparison study was performed with the conventional fuzzy model.
MATERIALS AND METHODS
Study area and data
MP data from the two synoptic stations in northwestern Iran, situated at 38.05°N, 46.17°E (Tabriz) and 34.21°N, 47.90°E (Kermanshah), as well as SSTs from the adjacent seas (Black, Mediterranean, and Red), were used to apply the suggested approach (see Figure 1). The main reasons behind the selection of the Tabriz and Kermanshah stations are their locations and length of available data that provide an appropriate situation for both temporal and spatial assessments of the results. The distance between Tabriz and Kermanshah cities is approximately 422 km and they have different climatic regimes. The elevations of these cities are 1361 m and 1318.6 m above sea level, respectively. The mean monthly precipitation at the Kermanshah synoptic station is about 12 mm higher than the mean monthly precipitation at the Tabriz synoptic station, and the mean monthly temperature at the Kermanshah synoptic station is about 2.5 ̊C higher than the Tabriz. Tabriz has a semi-arid climate with regular seasons, but Kermanshah climate is heavily influenced by the proximity of the Zagros Mountains, classified as a hot dry summer Mediterranean climate.
Overview of the study region, adjacent seas, and Tabriz and Kermanshah synoptic stations.
Overview of the study region, adjacent seas, and Tabriz and Kermanshah synoptic stations.
The MP time series were obtained from the Iran Meteorological Organization, and the monthly SST data were downloaded from the National Oceanic and Atmospheric Administration, NOAA website (http://www.esrl.noaa.gov/psd/cgi-bin/data/timeseries/timeseries1.pl) where the monthly SST values are available for 1° grid squares of the seas. The SST as the fundamental physical parameter in the earth's climate system could be used for the long-term precipitation modeling. The modeling approach included time series spanning 65 years, from 1955 to 2019. 75% of the values were used for training, while the last 25% of values (from 2003 to 2019) were used for the test (see Table 1).
An overview of the data's statistical analysis (for 1955–2019)
Statistical parameter . | SST(°C) . | MP(mm) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Black Sea . | Mediterranean Sea . | Red Sea . | Tabriz Station . | Kermanshah Station . | ||||||
train . | test . | train . | test . | train . | test . | train . | test . | train . | test . | |
Mean | 11.598 | 12.659 | 19.864 | 20.661 | 25.194 | 26.045 | 24.278 | 21.504 | 37.587 | 34.754 |
Maximum | 24.814 | 26.102 | 27.435 | 28.028 | 31.340 | 31.754 | 128.400 | 91.300 | 295.400 | 163.600 |
Minimum | − 1.587 | − 0.635 | 14.171 | 14.975 | 16.754 | 17.973 | 0.000 | 0.000 | 0.000 | 0.000 |
Standard deviation | 7.417 | 7.823 | 4.010 | 4.191 | 4.105 | 4.136 | 23.972 | 21.393 | 44.081 | 38.726 |
Coefficient of variation (dimensionless) | 0.639 | 0.618 | 0.202 | 0.203 | 0.163 | 0.159 | 0.987 | 0.995 | 1.173 | 1.114 |
Statistical parameter . | SST(°C) . | MP(mm) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Black Sea . | Mediterranean Sea . | Red Sea . | Tabriz Station . | Kermanshah Station . | ||||||
train . | test . | train . | test . | train . | test . | train . | test . | train . | test . | |
Mean | 11.598 | 12.659 | 19.864 | 20.661 | 25.194 | 26.045 | 24.278 | 21.504 | 37.587 | 34.754 |
Maximum | 24.814 | 26.102 | 27.435 | 28.028 | 31.340 | 31.754 | 128.400 | 91.300 | 295.400 | 163.600 |
Minimum | − 1.587 | − 0.635 | 14.171 | 14.975 | 16.754 | 17.973 | 0.000 | 0.000 | 0.000 | 0.000 |
Standard deviation | 7.417 | 7.823 | 4.010 | 4.191 | 4.105 | 4.136 | 23.972 | 21.393 | 44.081 | 38.726 |
Coefficient of variation (dimensionless) | 0.639 | 0.618 | 0.202 | 0.203 | 0.163 | 0.159 | 0.987 | 0.995 | 1.173 | 1.114 |
The threshold limitation of T = 35% (65th percentile) was applied for classifying the MP data into two categories of high (H) and low (L) values (other threshold values may also be used). The threshold of T = 35% or 65th percentile can be defined as the lowest value that is greater than 65% of the values computed by sorting the MP values of all months from high to low. So, the threshold precipitation of T35% = 27 mm for Tabriz and T35% = 47.3 mm for Kermanshah synoptic stations were determined and applied to the data.
Proposed methodology
This study's suggested technique comprises four phases (data pre-processing, association rule mining, modeling with ZBM and traditional fuzzy tools, and lastly comparing and assessing the results). Figure 2 depicts the suggested methodology's schematic approach.
Schematic representation of the modeling process with suggested ZBM and conventional fuzzy method.
Schematic representation of the modeling process with suggested ZBM and conventional fuzzy method.
Firstly, the monthly SSTs data were categorized into five categories: very high (VH), high (H), medium (M), low (L) and very low (VL) within the boundaries of μ ± iσ (where μ and σ are respectively mean and standard division of data and i = 0.5, 1.5). The derived rules from the association mining are dependent on these categories, and appropriate rules (in terms of confidence and support criteria) may not be derived if the categorization is inappropriate. This categorization was performed using expert judgment (as well as trial and error procedure) and earlier work; however, the number of categorized components in the inputs and outputs might be lower or higher. To this end, different methods were explored to determine the optimal thresholds. According to the literature (e.g. see, Tadesse et al. 2004; Danandeh Mehr et al. 2017; Nourani et al. 2021), different values for i were tried, including i = 1,1.5 (5 categories), i = 0.5,1.5 (5 categories), and i = 0.5,1,1.5 (7 categories). However, the technique that specifies i (i = 0.5, 1.5) as thresholds for the categories generated rules with better quality. Similarly, the reliability was divided into different classes. However, by performance analysis, the seven categories were utilized to categorize the degree of reliability. In addition, the MP time series were categorized into H and L classes (binary classification). Usually, there is no exact threshold for the determination of extreme events. For example, Rahimikhoob (2010) considered T = 25% (or percentile 75) as extreme events while Danandeh Mehr et al. (2017) examined different thresholds (15, 25 and 35%) as extreme events for the precipitation monitoring. In this regard, due to the arid to the semi-arid condition of western Iran, the threshold precipitation of T = 35% (or percentile 65) was determined and applied to the data (but other threshold values may also be tried within the suggested methodology).
In the second phase, the association rule mining approach was utilized to find patterns between the SST and MP categories. To this end, the teleconnection patterns between the binary MP(t) and SST(t-i) data (SST at different lags) were discovered using the training data set. After generating the patterns, the confidence measure of the association rule was calculated to assess the degree of rules reliability (for the consequent part). In addition, the degree of the reliability for the antecedent part of the rules was determined with the help of the probability of the occurrence of SST categories (according to past events).
In the third phase, traditional fuzzy and ZBM were performed. To this end, if-then rules were created using the patterns discovered in the second phase. Then the rules were weighted using the fuzzy Hausdorff distance and the chosen rules were aggregated.
Finally, depending on the efficiency criteria employed, the outcomes of both approaches were reviewed and compared.
The major component of the suggested technique (i.e., Z-number) is briefly described in the next sub-section, and brief descriptions of association rules, the addition of discrete Z-numbers and efficiency measures are provided in Appendix A.
Description of Z-number concept
The concept of a Z-number relates to the issue of reliability of information and is utilized to conduct computations using information that is not very reliable. A Z-number has two components, Z = (A, B). The first component, A, is a restriction (constraint) on the values by which a real-valued uncertain variable, X, is allowed to take. The second component, B, is a measure of reliability (certainty) of the first component. The Z-number definitions are briefly discussed in the following; for additional information readers can refer to Aliev et al. (2016) and Zadeh (2011).
Discrete Z-number
A fuzzy subset A of the real line R with convex membership function: R → [0, 1] is a discrete fuzzy number if its support is finite; that is, there exists x1, …, xs ∈ R with x1< x2 <…< xs such that supp(A) = {x1, …, xs}. A discrete Z-number is an ordered pair Z = (A, B) of discrete fuzzy numbers A and B. A plays a role of a fuzzy constraint on values that a random variable X may take. B is a discrete fuzzy number with a membership function μB: {b1, …, bs} → [0, 1], {b1, …, bs} ⊂ [0, 1], playing the role of a fuzzy constraint on the probability measure of A, , P(A) ∈ supp(B).
The definition of a discrete Z+-number is similar to the discrete Z-number. The Z +-number, Z + = (A, R), is a pair consisting of a fuzzy number, A, and a random number R, where A performs the same character as it does in a discrete Z-number and R performs the character of the probability distribution for B (Aliev et al. 2016).
Z-valued if-then rules-based reasoning


The weights of the selected rules should be multiplied by Z-numbers. It is worth noting that Zy = λ. Zx (Ax, Bx) is the same as Zy = Zx ( λ.Ax, Bx). As a result, multiplying by λ has no effect on Bx (for more details, see appendix A).
RESULTS AND DISCUSSION
In this study, by developing a MATLAB code and extracting the teleconnections patterns between SSTs of surroundings seas (Black, Mediterranean, and Red Seas) and MP events, a novel method is suggested using the Z-number theory.
Accordingly, the SSTs were categorized into interval sets such as VL, VH, etc (see Table 2). Also, the degree of reliability was categorized into seven categories, VL, L, low medium (LM), high medium (HM), H, VH, and extremely high (EH)) and MP time series (to H or L). Then by dividing them into training and testing sets, the association rule mining was utilized to find patterns between the SSTs and MP events (training data set) to create Z-rules. Figure 3 indicates that the codebook (or fuzzy sets) was created by the expert judgment. In this regard, the intervals by rigid limits were transformed into fuzzy sets and were used to create if-then rules. Finally, by performing the considered models, their performances were investigated, and the findings were compared with each other.
The classes of monthly SSTs
. | VL . | L . | M . | H . | VH . |
---|---|---|---|---|---|
Black Sea temperature (°C) | < 0.46 | [0.46–7.89] | [7.89–15.31] | [15.31–22.73] | > 22.73 |
Mediterranean Sea temperature (°C) | < 13.84 | [13.84–17.86] | [17.86–21.87] | [21.87–25.85] | > 25.85 |
Red Sea temperature (°C) | < 19.03 | [19.03–23.14] | [23.14–27.25] | [27.25–31.36] | > 31.36 |
. | VL . | L . | M . | H . | VH . |
---|---|---|---|---|---|
Black Sea temperature (°C) | < 0.46 | [0.46–7.89] | [7.89–15.31] | [15.31–22.73] | > 22.73 |
Mediterranean Sea temperature (°C) | < 13.84 | [13.84–17.86] | [17.86–21.87] | [21.87–25.85] | > 25.85 |
Red Sea temperature (°C) | < 19.03 | [19.03–23.14] | [23.14–27.25] | [27.25–31.36] | > 31.36 |
The codebook for antecedents, consequences and degrees of reliability: (a) the Black Sea, (b) the Mediterranean Sea, (c) the Red Sea, (d) reliability of Z-numbers, (e) Tabriz precipitation and (f) Kermanshah precipitation.
The codebook for antecedents, consequences and degrees of reliability: (a) the Black Sea, (b) the Mediterranean Sea, (c) the Red Sea, (d) reliability of Z-numbers, (e) Tabriz precipitation and (f) Kermanshah precipitation.
Results of suggested ZBM and conventional fuzzy method
It is worth noting that the association rule technique treats the linguistic terms as intervals, but these interval sets must be transformed to fuzzy sets for fuzzy logic-based modeling. In this regard, +5% and −5% (95% confidence) were applied to the minimum and maximum values of each interval, respectively. Furthermore, the trapezoidal membership functions were used for min and max fuzzy numbers, or
, but triangular membership function
was considered for other fuzzy numbers (see Figure 3). Figure 3 illustrates a codebook for antecedents, consequences, and degrees of reliability. For example, according to the above-mentioned rigid bounds at Table 2, for the Mediterranean Sea the interval [13.84, 17.86] could be considered as low (or L) but by converting it to the fuzzy set, the triangular fuzzy number
should be considered as L.
The teleconnection patterns between the MP(t) and SSTs data were discovered by association mining, and then if-then rules were created. In Z-rules, for the consequent part, the confidence measure of each association rule was determined to specify the degree of rule reliability. In addition, for the antecedent part, the degree of reliability was determined with the probability of occurrence of SST categories. So, the degrees of reliability for SSTs categories of all three seas were determined similarly as:
- I.
If the class of SST is VL or VH, the degree of reliability is VL.
- II.
If the class of SST is L or H, the degree of reliability is L.
- III.
If the class of SST is M, the degree of reliability is HM.
To discover the most dominant lags between SSTs and MP events, numerous input combinations with varied lagged inputs were explored to calibrate and verify the models. The dominant lags were found using cross-correlation functions (CCFs) between MP(t-i) and SST(t) time series. In addition, seasonal-differencing (SST(t)-SST(t-12) was employed to eliminate trends from the SST series, and various combinations of the de-trended SST lags were utilized in the simulation. However, no significant rule was extracted for these lags by the association mining. Because of CCF's linear characteristics, there is no guarantee that the delays (lags) listed above are the best options for accounting for non-linear connections. The findings of the comparative analysis revealed that the performance of the models for SST with no pre-processing (and utilizing the same time delays for all three seas) might lead to superior outcomes in terms of de-trended data modeling results. This might be owing to the nonlinear filters used in ZBM (e.g., membership functions), which eliminate the need for other data pre-processing approaches (e.g., de-trending). Therefore, with trial and error procedure, the best delays were identified to be 1 to 5 months and were utilized in the simulation.
As a result, to construct Z-rules, the potential of extracted patterns via association mining were investigated. Thus, if the extracted patterns were acceptable in terms of confidence and support criteria (i.e., existing patterns with confidence >0.6), the required Z-rules for modeling with Z numbers were constructed. The Z-rules were created using the previously established linguistic variables (see Figure 3). Examples of examined rules are given in Table 3.
Examples for Z if-then rules for Kermanshah precipitation station, lag 3
Rule No. . | If . | Then . | ||
---|---|---|---|---|
Black Sea (t − 3) temp is . | Mediterranean Sea (t − 3) temp is . | Red Sea (t − 3) temp is . | Kermanshah Precipitation (t) is . | |
1 | (M,HM) | (M,HM) | (VL,VL) | (H,EH) |
2 | (L,L) | (M,HM) | (M,HM) | (H,EH) |
3 | (M,HM) | (VH,VL) | (H,L) | (H,EH) |
4 | (VH,VL) | (M,HM) | (H,L) | (H,EH) |
5 | (L,L) | (M,HM) | (L,L) | (H,H) |
6 | (M,HM) | (H,L) | (M,HM) | (H,H) |
7 | (H,L) | (VH,VL) | (M,HM) | (H,H) |
8 | (L,L) | (M,HM) | (VL,VL) | (H,HM) |
9 | (M,HM) | (M,HM) | (L,L) | (H,HM) |
10 | (L,L) | (L,L) | (VL,VL) | (H,HM) |
11 | (VL,VL) | (L,L) | (L,L) | (H,LM) |
12 | (H,L) | (H,L) | (H,L) | (H,LM) |
13 | (VH,VL) | (H,L) | (H,L) | (H,LM) |
14 | (VH,VL) | (VH,VL) | (H,L) | (H,LM) |
15 | (M,HM) | (M,HM) | (M,HM) | (H,LM) |
16 | (VL,VL) | (L,L) | (VL,VL) | (H,LM) |
17 | (M,HM) | (L,L) | (L,L) | (H,L) |
18 | (H,L) | (H,L) | (M,HM) | (H,L) |
19 | (H,L) | (VH,VL) | (H,L) | (H,L) |
20 | (L,L) | (L,L) | (L,L) | (H,L) |
21 | (M,HM) | (L,L) | (M,HM) | (H,VL) |
22 | (M,HM) | (M,HM) | (H,L) | (H,VL) |
23 | (L,L) | (L,L) | (M,HM) | (H,VL) |
24 | (H,L) | (M,HM) | (H,L) | (H,VL) |
25 | (H,L) | (M,HM) | (M,HM) | (H,VL) |
26 | (VL,VL) | (M,HM) | (VL,VL) | (H,VL) |
Rule No. . | If . | Then . | ||
---|---|---|---|---|
Black Sea (t − 3) temp is . | Mediterranean Sea (t − 3) temp is . | Red Sea (t − 3) temp is . | Kermanshah Precipitation (t) is . | |
1 | (M,HM) | (M,HM) | (VL,VL) | (H,EH) |
2 | (L,L) | (M,HM) | (M,HM) | (H,EH) |
3 | (M,HM) | (VH,VL) | (H,L) | (H,EH) |
4 | (VH,VL) | (M,HM) | (H,L) | (H,EH) |
5 | (L,L) | (M,HM) | (L,L) | (H,H) |
6 | (M,HM) | (H,L) | (M,HM) | (H,H) |
7 | (H,L) | (VH,VL) | (M,HM) | (H,H) |
8 | (L,L) | (M,HM) | (VL,VL) | (H,HM) |
9 | (M,HM) | (M,HM) | (L,L) | (H,HM) |
10 | (L,L) | (L,L) | (VL,VL) | (H,HM) |
11 | (VL,VL) | (L,L) | (L,L) | (H,LM) |
12 | (H,L) | (H,L) | (H,L) | (H,LM) |
13 | (VH,VL) | (H,L) | (H,L) | (H,LM) |
14 | (VH,VL) | (VH,VL) | (H,L) | (H,LM) |
15 | (M,HM) | (M,HM) | (M,HM) | (H,LM) |
16 | (VL,VL) | (L,L) | (VL,VL) | (H,LM) |
17 | (M,HM) | (L,L) | (L,L) | (H,L) |
18 | (H,L) | (H,L) | (M,HM) | (H,L) |
19 | (H,L) | (VH,VL) | (H,L) | (H,L) |
20 | (L,L) | (L,L) | (L,L) | (H,L) |
21 | (M,HM) | (L,L) | (M,HM) | (H,VL) |
22 | (M,HM) | (M,HM) | (H,L) | (H,VL) |
23 | (L,L) | (L,L) | (M,HM) | (H,VL) |
24 | (H,L) | (M,HM) | (H,L) | (H,VL) |
25 | (H,L) | (M,HM) | (M,HM) | (H,VL) |
26 | (VL,VL) | (M,HM) | (VL,VL) | (H,VL) |
In this work, Mamdani fuzzy inference system (FIS) with the min implication, max for aggregation, and centroid technique for de-fuzzification were employed (Jayawardena et al. 2014). In general, by assuming n inputs with m classes, the number of rules might be up to mn (in this example, up to 53 = 125) in traditional fuzzy logic approaches. However, owing to the association rule mining, only around 26 rules were evaluated and analyzed for each model in this study (e.g., see Table 3 as an example). In such an instance (lack of rules), the classical reasoning procedures are ineffective in generating an outcome for the sample covered by no rules (Aliev et al. 2016). In this study, the inference approaches (weighting the Z-rules) were utilized to conduct the approximation reasoning in the absence of matching rules (named Z-interpolated method).
For verification and comparison reasons, the ZBM output as a Z-number (pair of fuzzy numbers) must be transformed to a single value. Based on its reliability, the output as Z-number could be transformed to an interval value. Due to the codebook, if the middle value of the reliability part (in this case, triangular fuzzy number) is greater than 0.49, then the Z-number first part is approved. For instance, if the output = then Tabriz precipitation will be high (0.7 > 0.49) or if the output =
then Tabriz precipitation will be low (0.3 < 0.49).
Here, the prediction of the future state with the suggested ZBM is illustrated with an example. Assume the calculation of Kermanshah precipitation at time t according to values of SSTs at time t-3 (Black Sea = 14.365 °C, Mediterranean Sea = 24.881 °C, Red Sea = 27.805 °C) observed in October 2015:
- I.
Based on SST's categorization and Figure 3, the numerical inputs are transformed to fuzzy numbers. As a result, the numbers (14.365, 24.881, 27.805) are transformed to (M, H, H).
- II.
By evaluating the reliability of fuzzy sets, the (M, H, H) are transformed to Z-numbers ((M,HM), (H,L), (H,L)).
- III.
By weighting the rules, the most appropriate rules are chosen (based on Equations (1)–(7)). In this case, three rules are selected.
- IV.
Now, using the approach given in Appendix A, the consequences of the specified rules (as Z-numbers) are aggregated by taking into account their estimated weights. The consequences of 3 selected rules are (precipitation = H, reliability = EH), (precipitation = H, reliability = H) and (precipitation = H, reliability = VL), so the output is computed as
. According to the codebook (Figure 3), this output with very high reliability indicates that the Kermanshah precipitation in January 2016 will be high.
To fully explain how to apply the suggested ZBM to model MP, an example has been presented in Appendix A.
In this research as a binary type modeling, two types of output could be achieved (i.e., high or low MP). So the results of the modeling could be expressed as:
- I.
True High (TH) indicates the numbers of high predicts that are true.
- II.
True Low (TL) indicates the numbers of low predicts that are true.
- III.
False Low (FL) indicates the numbers of low predicts that are false. In other words, the numbers of high observations are predicted as low.
- IV.
False High (FH) indicates the numbers of high predicts that are false.
As shown in Table 4, these four possible results form a matrix called the confusion matrix (Danandeh Mehr et al. 2017). Its rows represent the observed data, while its columns represent the predicted data. In addition, f indicates the total number of predictions.
Confusion matrix for binary modeling
f . | Predicted as High . | Predicted as Low . |
---|---|---|
Observed High | TH | FL |
Observed Low | FH | TL |
f . | Predicted as High . | Predicted as Low . |
---|---|---|
Observed High | TH | FL |
Observed Low | FH | TL |
To evaluate the performances of the ZBM and traditional fuzzy method (as a benchmark model) the total accuracy (TA) and Heidke Skill Score (HSS) were computed and presented in Table 5. Due to the necessity for numerical results or crisp values, traditional assessment measures such as determination coefficient cannot be employed in binary classification situations (e.g., see (Sharghi et al. 2018). The TA (ranges from 0 to 100%) and HSS (ranges from -∞ to 1) may be suitable alternatives in this case (Nourani et al. 2021). HSS = 1 denotes the best model, 0 means no skill, and negative values indicate that the chance forecast is better (see Appendix A for more details about used efficiency criteria). It is worth noting that the TA criterion may not accurately reflect the model's performance on its own. With the given threshold (T = 35%), the number of H data (only 35% of the total data) is lower than L data. Therefore, biased predictions may have an impact on the TA. For example, the TA criterion for low events as a major group is predicted to be always greater than H events as a minority group. As a result, the HSS was used to compare the suggested ZBM to a traditional fuzzy model. To this end, the confusion matrices for both the stations and all considered lags were provided (see Table 6, as an example). The most common occurrence, as shown in Table 6, is correctly predicted as low events. This indicates the potential of TA findings to be biased, so the TA criterion, in conjunction with HSS, can be used to accurately assess the ability of models.
The performance results of the ZBM and traditional fuzzy model
Stations . | Lag No. . | Z-number . | Traditional Fuzzy . | ||||||
---|---|---|---|---|---|---|---|---|---|
TA . | HSS . | TA . | HSS . | ||||||
Train . | Test . | Train . | Test . | Train . | Test . | Train . | Test . | ||
Kermanshah | 1 | 77.95 | 70.27 | 0.53 | 0.37 | 61.98 | 57.81 | 0.32 | 0.28 |
2 | 70.49 | 69.79 | 0.35 | 0.29 | 56.94 | 53.13 | 0.25 | 0.19 | |
3 | 72.74 | 70.31 | 0.36 | 0.26 | 53.47 | 50.52 | 0.21 | 0.18 | |
4 | 74.31 | 69.79 | 0.45 | 0.34 | 55.73 | 46.88 | 0.24 | 0.14 | |
5 | 72.92 | 70.06 | 0.46 | 0.36 | 65.80 | 58.33 | 0.37 | 0.27 | |
Tabriz | 1 | 76.14 | 75.25 | 0.44 | 0.42 | 60.59 | 60.49 | 0.30 | 0.29 |
2 | 74.83 | 72.64 | 0.35 | 0.30 | 65.10 | 56.99 | 0.34 | 0.24 | |
3 | 74.83 | 73.40 | 0.39 | 0.38 | 69.44 | 65.05 | 0.38 | 0.29 | |
4 | 76.53 | 75.37 | 0.37 | 0.35 | 48.26 | 46.24 | 0.13 | 0.12 | |
5 | 69.27 | 68.97 | 0.38 | 0.37 | 55.90 | 43.55 | 0.23 | 0.10 |
Stations . | Lag No. . | Z-number . | Traditional Fuzzy . | ||||||
---|---|---|---|---|---|---|---|---|---|
TA . | HSS . | TA . | HSS . | ||||||
Train . | Test . | Train . | Test . | Train . | Test . | Train . | Test . | ||
Kermanshah | 1 | 77.95 | 70.27 | 0.53 | 0.37 | 61.98 | 57.81 | 0.32 | 0.28 |
2 | 70.49 | 69.79 | 0.35 | 0.29 | 56.94 | 53.13 | 0.25 | 0.19 | |
3 | 72.74 | 70.31 | 0.36 | 0.26 | 53.47 | 50.52 | 0.21 | 0.18 | |
4 | 74.31 | 69.79 | 0.45 | 0.34 | 55.73 | 46.88 | 0.24 | 0.14 | |
5 | 72.92 | 70.06 | 0.46 | 0.36 | 65.80 | 58.33 | 0.37 | 0.27 | |
Tabriz | 1 | 76.14 | 75.25 | 0.44 | 0.42 | 60.59 | 60.49 | 0.30 | 0.29 |
2 | 74.83 | 72.64 | 0.35 | 0.30 | 65.10 | 56.99 | 0.34 | 0.24 | |
3 | 74.83 | 73.40 | 0.39 | 0.38 | 69.44 | 65.05 | 0.38 | 0.29 | |
4 | 76.53 | 75.37 | 0.37 | 0.35 | 48.26 | 46.24 | 0.13 | 0.12 | |
5 | 69.27 | 68.97 | 0.38 | 0.37 | 55.90 | 43.55 | 0.23 | 0.10 |
The confusion matrices of ZBM for the Kermanshah station
Lag No. . | Train . | Test . | ||||||
---|---|---|---|---|---|---|---|---|
. | Predicted MP . | . | Predicted MP . | |||||
. | fa = 576 . | High . | Low . | . | fb = 192 . | High . | Low . | |
1 | MP Observed | High | 156 | 48 | MP Observed | High | 40 | 22 |
Low | 79 | 293 | Low | 33 | 97 | |||
2 | MP Observed | High | 116 | 89 | MP Observed | High | 29 | 33 |
Low | 81 | 290 | Low | 25 | 105 | |||
3 | MP Observed | High | 95 | 110 | MP Observed | High | 23 | 39 |
Low | 47 | 324 | Low | 18 | 112 | |||
4 | MP Observed | High | 140 | 64 | MP Observed | High | 38 | 25 |
Low | 84 | 288 | Low | 33 | 96 | |||
5 | MP Observed | High | 168 | 35 | MP Observed | High | 51 | 12 |
Low | 121 | 252 | Low | 50 | 79 |
Lag No. . | Train . | Test . | ||||||
---|---|---|---|---|---|---|---|---|
. | Predicted MP . | . | Predicted MP . | |||||
. | fa = 576 . | High . | Low . | . | fb = 192 . | High . | Low . | |
1 | MP Observed | High | 156 | 48 | MP Observed | High | 40 | 22 |
Low | 79 | 293 | Low | 33 | 97 | |||
2 | MP Observed | High | 116 | 89 | MP Observed | High | 29 | 33 |
Low | 81 | 290 | Low | 25 | 105 | |||
3 | MP Observed | High | 95 | 110 | MP Observed | High | 23 | 39 |
Low | 47 | 324 | Low | 18 | 112 | |||
4 | MP Observed | High | 140 | 64 | MP Observed | High | 38 | 25 |
Low | 84 | 288 | Low | 33 | 96 | |||
5 | MP Observed | High | 168 | 35 | MP Observed | High | 51 | 12 |
Low | 121 | 252 | Low | 50 | 79 |
aDenotes the number of train samples.
bDenotes the number of test samples.
Comparing the obtained results
According to the HSS criterion (Table 5), at the test step for unseen data, the ZBM outperformed the traditional fuzzy model by an average of 69% for Kermanshah and 112% for Tabriz. This indicates that the ZBM, in addition to providing the reliability of the output, outperforms the traditional fuzzy model even by transforming its output to an interval (H or L). For example, if Z-number output = , this indicates high MP events with extremely high reliability for Tabriz (see Figure 3).
As a result, the ZBM can be a credible alternative for prediction from a practical standpoint especially for unseen data due to its aforementioned benefits in dealing with the lack of rules and evaluating the data reliability. The HSS had max values of 0.42 for Tabriz (delay 1) and 0.37 (delay 1) for Kermanshah at the test phase. The TA values for these delays as the best ZBMs were 75% for Tabriz and 70% for Kermanshah. As a result, the ZBM described H and L occurrences of MP by simultaneously using SSTs from the Black, Mediterranean, and Red Seas, with over 70% confidence.
The most effective lag of the SST time series for the ZBM may be found by comparing the HSS criterion, which is obtained using SST with distinct delays. Table 5 indicates that for the both stations, the optimum ZBM was obtained at lag 1. This might be owing to the fact that the overall distances between the stations and the adjacent seas are almost similar (5200 km). As a result, the obtained findings not only show the dynamic teleconnections between MP events and SSTs of adjacent seas but also confirm the validity of the chosen delays (1–5 delays).
SST is a very important variable in the earth's climate system. Being at the interface of the ocean and the atmosphere, SST is critical to both, and to the exchanges of heat, moisture, momentum, and gases between the two (O'Carroll et al. 2019). High MP occurrences happened with combinations of M, L, and VL categories of SSTs at both stations owing to the derived rules for lag 1 with reliability greater than HM (H, VH, and EH), and H or VH category of SSTs did not exist at the derived rules. This confirms the previous results (e.g., Ghasemi & Khalili 2008) that the wet circumstances in Iran are often accompanied by a negative SST anomaly in the Mediterranean and Black Sea. For Tabriz station, the frequencies of high MP events (with T = 35%) were 32% for winter, 40% for spring, 2.5% for summer and 25.5% for autumn. Nevertheless, for Kermanshah station, the frequencies of high MP events were 49% for winter, 17.5% for spring, 0% for summer and 33.5% for autumn. With summer coming and the occurrence of H and VH SSTs, MP occurrences were infrequent for both stations, and no rule with H or VH SST category was derived. Consequently, for both stations, high MP events occurred by a combination of the M, L and VL of SSTs classes of the antecedents. It is notable that according to the binary assumption of precipitation in this study (L or H), when the consequence of the rule is precipitation = ‘H with reliability = ‘VL’, it means precipitation = ‘L’ with reliability = ‘EH’. Furthermore, while high MP events (the MP values higher than 90th percentile) were infrequent compared to L events, the rules with high reliability for L precipitation events were far more numerous than those for high MP events.
In terms of geographical analysis, the ZBM for Tabriz performed better than Kermanshah for all examined lags (see Table 5). Higher values of the coefficient of variation and standard deviation for Kermanshah compared to Tabriz indicate that Kermanshah data is more irregular (see Table 1). This resulted in better modeling performance for Tabriz. Also, as shown in Table 5, the increase in modeling efficiency using Z-number compared to traditional fuzzy for Tabriz (up to 112%) was better than that for Kermanshah (up to 69%). This might be owing to the linear correlation between SSTs and MP, which in Kermanshah (∼ 0.6) is greater than in Tabriz (∼ 0.4). However, ZBM has improved the performance of the model by using strong nonlinear filters, which has been higher for Tabriz compared to Kermanshah.
The Zagros and Alborz Mountains, as the two large mountain chains of Iran are situated in the northwest, west, and north of Iran. Precipitation fluctuations of different regions of Iran are not significant and such fluctuations can be found only in western Iran (Raziei et al. 2009). In addition, only the west of Iran is affected by SSTs of the Red, Mediterranean and Black Seas. For this reason, in this study, the data of Tabriz station (in northwestern Iran) and Kermanshah station (close to the center of western Iran) which have data with appropriate quality and quantity were used as the representative of western Iran. Previous studies (e.g. see, Nazemosadat et al. 2006; Raziei et al. 2009; Dezfuli et al. 2010; Hosseinzadeh Talaee et al. 2014) have mostly focused on the correlation of El Niño-Southern Oscillation (ENSO) and NAO and Iran's regional climate. However, Iran is surrounded by seas and using their SSTs in modeling might enhance the forecast findings. The results of this study confirmed this hypothesis and indicated that SSTs, even without using any other indices, are appropriate predictors for the prediction of future states of MP events in western Iran. This may be because approximately 70% of Iran's precipitations originated in either the Black Sea or the Mediterranean Sea. The other 30% originates in North Africa and the Red Sea and comes to Iran via Saudi Arabia and the Persian Gulf (Kendrew 1922; Ghasemi & Khalili 2008).
In this research as a binary type modeling, two classes of output were considered in the modeling (i.e., high or low MP). However, the output could be classified into more classes if needed. In this study, the MP events of only two stations were investigated but to further explore the ability of ZBM for teleconnection modeling between MP events and SSTs, it is recommended to apply it to multiple precipitations series, with diverse characteristics over the whole country. In addition, this study only applied the SSTs as predictors but other indices such as NAO, ENSO, etc., could be used to teleconnection modeling of hydro-climatic events.
CONCLUSIONS
Due to the uncertainties of hydro-climatic systems, fuzzy logic has been increasingly utilized to describe the ambiguity of such systems. The classic fuzzy logic methods do not consider the reliability of the information. However, by using the Z-number in addition to the constraint of information, it is possible to characterize the degree of reliability of data. Predicting MP events (H or L) as a complex natural process is associated with high uncertainty. It seems necessary to develop models that can control this uncertainty, especially in regions such as Iran which is categorized as arid to semi-arid. In this regard, by developing the ZBM model, monthly SSTs of Black, Mediterranean, and Red Seas were used to predict the classified the MP of the two stations in the northwest of Iran. The derived outcomes were compared to the outcomes of the classic fuzzy approach using the TA and HSS criteria.
According to the obtained results, the teleconnection parameters such as the SSTs of surrounding seas at different lags could be applied as predictors to predict MP events. The results indicated that even for test data (unseen data), by evaluating the data reliability and assigning weights to the if-then rules, the ZBM compared to the conventional fuzzy model improved the results by 69% for Kermanshah and 112% for Tabriz. In addition, the performance of the ZBM for Tabriz was better than for Kermanshah because of the distinct precipitation patterns over these regions. Therefore, the ZBM can be an effective tool for the prediction goals especially for the unseen test data in the case of incomplete or lack of matching rules because of using inference techniques for the approximate reasoning.
Consequently, the Z-number idea, by assessing events reliability, can be used in various sectors of water resources management, and it improves the modeling efficiency.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories (https://psl.noaa.gov/cgi-bin/data/timeseries/timeseries1.pl And for monthly precipitation data: https://www.irimo.ir/).