Missing streamflow data is a common issue in Peninsular Malaysia, as the technologies used in hydrological studies often fail to collect data accurately. Additionally, conventional methods are still widely used in the region, which are less accurate compared to artificial intelligence (AI) methods in estimating missing streamflow data. Therefore, this study aims to estimate the missing streamflow data from 11 stations in Peninsular Malaysia by using different AI methods and determine the most appropriate method. Four homogeneity tests were applied to check the quality of data, and the results of the tests indicated that the streamflow data in most stations were homogenous. Two AI methods were applied in this study, which were artificial neural network and artificial neuro-fuzzy inference systems (ANFIS). The proposed AI methods were compared with five different conventional methods. All streamflow missing data, constituting 30% of data from each year were estimated on a daily time scale, and evaluated using root mean square error, mean absolute error and correlation coefficient values. The results indicated that ANFIS was the best due to its learning abilities and the fuzzy inference systems, which enable it to handle complicated input–output patterns and provide highly accurate estimation results.

  • Various estimation methods were compared and evaluated.

  • Homogeneity tests were applied to assess the quality of historical streamflow data from 11 stations.

  • The estimation results were rigorously evaluated using three evaluation methods: root mean square error, mean absolute error and correlation coefficient.

  • Adaptive neuro-fuzzy inference system emerged as the best method for estimating missing streamflow data.

The completeness of streamflow data is vital in the aspect of the hydrological system, including weather forecast, water resources management and flood and drought prediction. However, the missing streamflow data are frequently found during the data collection process because the gauging devices used in hydrological studies heavily rely on physical-based systems and sensors used to collect the data (Li et al. 2020). The data collected from the physical sensor system is often partial and inaccurate over an extended period of data collection due to exposure to various hazards, including battery depletion, physical damage and extreme environmental conditions (Hamzah et al. 2020). Other than that, the monsoon period in Malaysia can significantly impact streamflow discharge, leading to potential inaccuracies in streamflow measurement. High velocity and turbulence of the streamflow during this period can make measuring streamflow more challenging (Higgins et al. 2022). Besides, conventional methods such as the inverse distance weighting (IDW) method, arithmetic average (AM) method and normal ratio (NR) method are still applied in Peninsular Malaysia. These conventional methods possess a lower accuracy in the estimation of missing streamflow data compared to artificial intelligence (AI) methods (Ismail et al. 2017). The conventional methods are suitable for stable environments without floods and heavy rainfall. While commonly used for estimating missing data, it is recommended to use advanced methods for cases where missing data are significant or patterns are complex (Gao et al. 2018). The incompleteness of streamflow data will heavily affect the analysis, prediction and forecasting activities conducted by hydrologists and engineers.

There are several studies that applied conventional methods, including AM, NR, IDW and coefficient of correlation (CR), to estimate missing streamflow data. Ismail et al. (2017) applied AM, CR, IDW and NR to estimate the missing streamflow data and rainfall data in Terengganu by assuming 5, 10, 15 and 20% of the datasets are missing. The result showed that the NR was the best compared to all the other methods for missing streamflow data estimation in most of the stations. Meanwhile, Yilmaz & Bihrat (2019) evaluated the performances for regression analysis (REG), standardization with mean (SM), single donor station-based drainage area (DAR), standardization with mean and standard deviation (SMS), multiple donor stations-based drainage area ratio (MDAR) and IDW for missing streamflow estimation from stations located at the Porsuk River Basin. The results showed that the IDW was the best method, while DAR was the most below-average method in most of the stations. Kamwaga et al. (2018) applied linear regression, multi-donor linear regression, the rainfall–runoff relationship using the double mass curve method, flow duration matching, drainage area ratio and rainfall–runoff modelling using hydrologiska byrans vattenbalansavdelning (HBV)-light to estimate missing streamflow data on the catchment of the Little Ruaha at Tanzania. The results showed the multiple linear regression methods were the best in missing data estimation among all the other methods, while the runoff modelling using HBV-light possessed the poorest performance. Al-Taiee (2008) applied trend analysis of the time series of the river stages data at the Mosul station and concluded that Winter's multiplicative model is suitable to forecast missing streamflow data for the Tigris River.

Studies on evaluating the performance of AI methods for estimating missing streamflow data were also conducted in Malaysia. Gao et al. (2018) employed the auto regressive integrated moving average (ARIMA) and autoregressive conditional heteroscedasticity (ARCH) models to fill the gap of missing streamflow data and compare their performance with conventional methods including listwise and pairwise deletion, single imputation, arithmetic means and median imputation, regression-based imputation and principal component analysis, and multiple imputation. The results showed that ARIMA and ARCH are the more suitable methods for filling the missing data. Norazizi & Deni (2019) applied the multivariate imputation by chained equations method, bootstrapping and the expectation maximization algorithm method and artificial neural network (ANN) to estimate the missing data at eight meteorological stations in Kuantan. The results showed that the performance of ANN is the best among the other methods.

Many studies have been conducted to compare the performances between conventional and AI methods. Mesta et al. (2020) applied the Takagi–Sugeno fuzzy rule-based (FRB) model to estimate the missing daily streamflow data of the Meric–Ergene River located in Turkey. The performance of FRB was evaluated by comparing it with the physical-based Hydrologic Engineering Centre-Hydrologic Modelling System (HEC-HMS). The results showed that the data-based FRB model possesses a better performance compared to HEC-HMS. Saplioglu & Kucukerdem (2018) evaluated the performance of artificial neuro-fuzzy inference systems (ANFIS), multiple regression and NR in estimating missing streamflow data at the Yeşilırmak River, Turkey. The results indicated that the ANFIS was the best method with the lowest mean squared error in most of the data sets. Hamzah et al. (2020) reviewed the performance, advantages and disadvantages of different conventional and AI methods. This research indicated that the conventional method such as the deletion method brings different drawbacks and that the removal of variables that possess missing values using this method will cause the loss of valued data, reducing the size of the data sample and inaccuracy in the results. Besides, the research also showed that the AI methods based on machine learning techniques delivered a better performance in missing streamflow data estimation compared to conventional methods. Belotti et al. (2021) investigated the effectiveness of linear autoregressive (AR), periodic autoregressive (PAR) models and extreme learning machines (ELM) in streamflow forecasting. The accuracy of each model was analysed using mean square error (MSE), mean absolute error (MAE), correlation coefficient (CC) and Nash–Sutcliffe efficiency. The Friedman test was applied to verify the difference in the significance of errors in the forecast models and showed that ELM had the best performance compared to AR and PAR.

In general, missing streamflow data is an issue which needs to be treated seriously, as streamflow plays an important role for humans, different hydrological applications and fields (Heddam & Kisi 2021). In the aspects of engineering, complete streamflow data provide engineers with sufficient information to design the infrastructure, including reservoirs, drainage systems, bridges and roads. It ensures that the engineer can provide a suitable design, avoiding over- or under-design issues, which can result in excessive costs and the risk of structural collapse. Besides, a complete set of data enables hydrologists to conduct the hydrological analysis including streamflow prediction, flood and drought prediction more accurately. These analyses with higher accuracy enable related authorities to manage and provide wise decisions on water resources management. This will result in a stable water supply and better planning for natural disasters such as floods and drought (Fentaw et al. 2019), which eventually benefit people and industries strongly relying on water supply such as agriculture industries and meat production. Although previous studies have been conducted on comparing various missing data estimating methods, it is important to note that the performance of estimating methods may be different in different regions due to their distinct geographical and meteorological characteristics. Hence, this study aims to investigate the performance of different AI methods in estimating missing streamflow and how it performed compared to conventional methods in Peninsular Malaysia. At the end of this study, the most appropriate method for estimating missing streamflow data will be determined.

Data collection and study area

The study area is Peninsular Malaysia, which is also known as West Malaysia and is located at latitude 3.9743° N, longitude 102.4381° E. Its climatic conditions are highly affected by the Southwest monsoon from May to September and the Northeast monsoon from November to March. Peninsular Malaysia experiences high humidity, heavy rainfall and high temperature throughout the whole year where rainfall amount is between 2,000 and 2,500 mm per year, and the average daily temperature is around 21 to 32°C (Kerishnan et al. 2020; Ahmed et al. 2021). The tropical climatic conditions in Peninsular Malaysia led to rapid variations in streamflow, making it difficult to maintain consistent and systematic data collection. Also, the Northeast monsoon usually brings heavy rainfall, leading to severe disruptions to data collection infrastructure (Ng et al. 2024). The sudden spikes in streamflow can lead to frequent occurrences of flooding, causing inconvenience to the public. In contrast, during the Southwest monsoon, a drier atmosphere and reduced streamflow are experienced in Peninsular Malaysia due to the decrement of rainfall. Low streamflow usually leads to dry periods and drought, which would negatively affect agricultural production. The streamflow data obtained in this study from different states were provided by the Department of Irrigation and Drainage (DID). Twenty years of data ranging from 1970 to 2015 from 11 stations was collected, and the location of each station in all states was plotted in the Peninsular Malaysia map, as shown in Figure 1. The detailed information on each station is listed in Table 1.
Table 1

Information on each streamflow station

Station codeStation nameStudy periodDurationLatitudeLongitude
1,737,451 Sungai Johor at Rantau Panjang 1972 ∼ 1992 20 years 01° 46′ 50″E 103° 44′ 45″N 
5,606,410 Sungai Muda at Jambatan Syed Omar 1974 ∼ 1994 20 years 05° 36' 35″E 100° 37' 35″N 
5,721,442 Sungai Kelantan at Jambatan Guillemard 1973 ∼ 1993 20 years 05° 45' 45″E 102° 09' 00″N 
2,224,432 Sungai Kesang at Chin Chin 1960 ∼ 1980 20 years 02° 17' 25″E 102° 29' 35″N 
2,723,401 Sungai Kepis at Jambatan Kayu Lama 1979 ∼ 1999 20 years 02° 42' 20″E 102° 21' 20″N 
3,519,426 Sungai Bentong at Kuala Marong 1970 ∼ 1990 20 years 03° 30' 45″E 101° 54' 55″N 
6,503,401 Sungai Arau at Ladang Tebu Felda 1984 ∼ 2004 20 years 06° 30' 10″E 100° 21' 05″N 
3,116,430 Sungai Klang at Jambatan Sulaiman 1995 ∼ 2015 20 years 03° 08' 20″E 101° 41' 50″N 
4,930,401 Sungai Berang at Menerong 1998 ∼ 2018 20 years 04° 56' 20″E 103° 03' 45″N 
3,813,411 Sungai Bernam at Jambatan Skc 1984 ∼ 2004 20 years 03° 48' 27″E 101° 21' 70″N 
4,907,422 Sungai Kurau at Bt. 14 Jalan Taiping 1975 ∼ 1995 20 years 04 ° 58' 40″E 100 ° 46' 50″N 
Station codeStation nameStudy periodDurationLatitudeLongitude
1,737,451 Sungai Johor at Rantau Panjang 1972 ∼ 1992 20 years 01° 46′ 50″E 103° 44′ 45″N 
5,606,410 Sungai Muda at Jambatan Syed Omar 1974 ∼ 1994 20 years 05° 36' 35″E 100° 37' 35″N 
5,721,442 Sungai Kelantan at Jambatan Guillemard 1973 ∼ 1993 20 years 05° 45' 45″E 102° 09' 00″N 
2,224,432 Sungai Kesang at Chin Chin 1960 ∼ 1980 20 years 02° 17' 25″E 102° 29' 35″N 
2,723,401 Sungai Kepis at Jambatan Kayu Lama 1979 ∼ 1999 20 years 02° 42' 20″E 102° 21' 20″N 
3,519,426 Sungai Bentong at Kuala Marong 1970 ∼ 1990 20 years 03° 30' 45″E 101° 54' 55″N 
6,503,401 Sungai Arau at Ladang Tebu Felda 1984 ∼ 2004 20 years 06° 30' 10″E 100° 21' 05″N 
3,116,430 Sungai Klang at Jambatan Sulaiman 1995 ∼ 2015 20 years 03° 08' 20″E 101° 41' 50″N 
4,930,401 Sungai Berang at Menerong 1998 ∼ 2018 20 years 04° 56' 20″E 103° 03' 45″N 
3,813,411 Sungai Bernam at Jambatan Skc 1984 ∼ 2004 20 years 03° 48' 27″E 101° 21' 70″N 
4,907,422 Sungai Kurau at Bt. 14 Jalan Taiping 1975 ∼ 1995 20 years 04 ° 58' 40″E 100 ° 46' 50″N 
Figure 1

Location of the streamflow stations in each state.

Figure 1

Location of the streamflow stations in each state.

Close modal

Conventional methods in missing streamflow data estimation

Arithmetic average method

AM is the easiest method which is normally applied to estimate and fill in the missing gap of data in hydrological studies. AM is the method suitable to be applied when the data from the individual gauge is not too varied from the average data value and the gauges are homogenously allocated at the area. By using AM, the missing data is determined based on the data obtained from the nearest stations.

V0, the missing data estimated value can be defined as:
(1)
where N is the total amount of closest stations and Vi is the value of the same parameter at ith closest station (Sattari et al. 2016).

Inverse distance weighting method

IDW is preferable for most of the researchers to do missing data estimation. The estimation is performed by assigning weights to neighbouring stations based on their distance and interpolate missing data values. V0, the missing data estimated value can be defined as:
(2)
where Di is the distance between the station with missing data and the ith closest station (Sattari et al. 2016).

Normal ratio method

The NR is one of the frequently applied methods for missing data estimation. It is applied when one or more of the streamflow stations data exceed the targeted streamflow data by 10% or more. The average ratio of the available data between the targeted station and the ith surrounding station is used to weight in the NR method. The ratios of entire streamflow data at the targeted and surrounding stations are applied to determine the concurrent streamflow data at neighbouring stations.

V0, the estimation value of the missing data can be expressed as:
(3)
(4)
where Ri is the CC between the targeted station and the ith neighbouring station, and Ni is the number of points applied to derive the CC (Sattari et al. 2016; Hamzah et al. 2020).

Coefficient of correlation

The CR is the method applied to estimate the missing data value. It has the benefits of being simple to implement and understand. The CC is the scale that measures the level of correlation between the two variables. The missing value can be determined by replacing the distance between the targeted and nearby station with the CC, and it can be expressed as:
(5)
where the Rit is the CC of daily time series data among the targeted and the ith surrounding stations (Hamzah et al. 2020).

Linear interpolation

The linear interpolation (LI) is one of the interpolation methods used to estimate the missing data. It is easy to apply by connecting two points of the data into a straight line and estimating the missing data by using the interpolation, and it can be defined as:
(6)
where x0 is the independent variable, xa and xb are known values of the independent variable and V0 the value of the dependent variable for a value x0 of the independent variable (Noor et al. 2014; Hamzah et al. 2020).

Auto regressive integrated moving average

The ARIMA is an analytical model commonly applied to process the input time series data to estimate the missing data. ARIMA is the combination of the autoregressive model (AR) and the moving average (MA) model. The AR in ARIMA indicates that the time series is regressed on its own historical data. The MA in ARIMA shows that the prediction error is a linear mixture of previous individual errors. The integrated (I) component of ARIMA indicates that the data values are replaced with the difference between the previous values and current data values in order to create stable data. The missing value Vt can be defined as:
(7)
where Yt is the streamflow data at time t, p, d and q are the orders of the autoregressive (AR), integrated (I) and moving average (MA) terms, respectively, and P, D and Q are the seasonal orders of the AR, I and MA terms, respectively. C is a constant term and εt is the error term. The subscript s indicates the number of time periods per season (Kotu & Deshpande 2019; Kermorvant et al. 2021).

Artificial intelligence methods in missing streamflow data estimation

Artificial neural network

ANN is a technology that processes data from one or more inputs in multiple layers and generates an output. ANN offers a lot of benefits, it can be trained and learned in a broad range of situations, including non-linear and complicated correlation conditions in the data set. The missing value Vi+1 can be defined as:
(8)
where αj and βij are parameters of the model, i1 is the number of neurons of the input layer, r is the number of hidden neurons, g is the hidden-layer transfer function from the hidden layer and ε represents the error or residual term (Lepot et al. 2017; Souza et al. 2020).

Artificial neuro-fuzzy inference systems

The ANFIS is the type of adaptive network that is functionally equal to fuzzy inference systems and similar to the ANN model. The ANFIS structure possesses the advantage of ANN learning capabilities as well as fuzzy logic inference. ANFIS identifies all feasible rules or enables the missing values to be formed using input and output values. There are five layers in ANFIS as shown in Figure 2. Each layer plays a different role in the system. There are numerous nodes in each layer, all functioning is a similar way. There are different functions in each layer of ANFIS, and the functions of each layer are listed below (Nayak et al. 2004; Saplioglu & Kucukerdem 2018).
Figure 2

Overall structure of ANFIS (Saplioglu & Kucukerdem 2018).

Layer 1
Every node in this layer runs the same equation to compute the membership grades of an input variable; it can be expressed as:
(9)
(10)
where the is the nodes output, x and y are the input to the node, Ai and Bi−2 are the fuzzy set associated with this node, μ represents the degree of membership of each input variable and i is the index that refers to a specific rule in the rule base.
Layer 2
Every node in this layer multiplies the incoming signals and generates the output that represents the fire strength of a rule; it can be defined as:
(11)
Layer 3
The ith node of this layer is labelled as N. The nodes in this layer generate the normalized fire strengths, which can be expressed as:
(12)
Layer 4
In this layer, the node i generates the contribution of the ith rule to the model output, and it can be defined as:
(13)

In Equation (23), is the output obtained from layer 3, and pi, qi and ri are the parameter set.

Layer 5
The only node in this layer generates the final output of ANFIS, which is the missing value, V and it can be expressed as:
(14)

Performance evaluation of different estimation methods

To effectively evaluate the performance of all estimation methods, the daily streamflow value computed from each method is compared to the actual daily streamflow data. Using estimated values, which constitute 30% of the data from each year, a comparison was made with the observed values. The longest periods without records are different across different streamflow stations, with some stations experiencing gaps of up to 7 days. Three performance evaluation methods were applied to the comparative analysis, including MAE, root mean square error (RMSE) and CC statistics, in order to evaluate each missing data estimation method. These evaluation methods provided us with the error of each estimation method, based on comparing the estimation result and the relevant observed value (Ismail et al. 2017).

Root mean square error

RMSE is one of the commonly used methods for measuring the difference between the estimation and observed value. The lower the RMSE value is, the smaller the difference between the estimation and the observed value is. The RMSE can be defined as (Ismail et al. 2017):
(15)
where xi is the observed streamflow data from the neighbouring station, is the estimated streamflow data and n is the amount of data points.

Mean absolute error

MAE is a method that is applied to calculate the difference between the estimated and observed values. The lower the MAE result is, the higher the accuracy of the estimation method. The MAE can be expressed as (Ismail et al. 2017):
(16)

Correlation coefficient

CC is a method that determines the level of correlation between the estimated and observed data. The higher positive CC value indicates a stronger correlation between the estimated and observed data, which means the better the result is. CC can be defined as:
(17)
where xi is the observed streamflow data from the neighbouring station, is the estimated streamflow data, is the mean streamflow value and n is the amount of data points.

Homogeneity test

Buishand range test

Buishand range test (BRT) is a test that considers whether the data are randomly or separately distributed according to the null hypothesis. This test is derived from cumulative deviations or adjusted partial sums from the mean. The cumulative deviations or adjusted partial sum is given by (Singh & Choudhary 2023):
(18)
where the is the sample mean, Yi is the ith observation in the dataset and n is the number of data in the data set. The rescaled adjusted partial sums () can be expressed as:
(19)
where Dx represents the sample standard deviation, and it can be expressed as:
(20)
The Q statistic used to determine the sensitivity to departures from homogeneity can be expressed as:
(21)

Higher Q statistic values imply that the data in the time series is not homogeneous. To assess the significance of the Q statistic, a p-value can be calculated by comparing the observed Q value to the distribution of Q values under the null hypothesis. The p-value represents the probability of observing a Q statistic as extreme as or more extreme than the observed value if the null hypothesis of homogeneity is true. If the p-value is below a predetermined significance level, the null hypothesis of homogeneity is rejected, indicating non-homogeneity.

Von Neumann ratio test

Von Neumann ratio test (VNRT) is by far the most extensively used non-parametric test for detecting non-homogeneity of the data. The test does not offer information regarding the time of the break, but it does provide an estimate of the total amount of data non-homogeneity. The von Neumann ratio (N) can be defined as (Kabbilawsh et al. 2023):
(22)
where n is the sample size and i is the observation in the dataset. If the calculated N value is equal to 2, the series is homogeneous; if it is less than 2, it is non-homogeneous. The N might reach a value larger than 2 if the data possesses a swift variation in the mean. To assess the significance of the N value, a p-value can be calculated by comparing it to the distribution of N values under the null hypothesis. If the p-value is below the predetermined significance level, the null hypothesis of homogeneity is rejected.

Standard normal homogeneity test

The standard normal homogeneity test (SNHT) test is widely used in climatic studies. This test was established by Alexandersson & Moberg (1997) and has been successfully utilized to evaluate a wide range of hydrological and climatic data. This test is adaptable and straightforward. Additionally, the SNHT test is more sensitive to breaks that occur towards the start or end of the time series. The T(k) statistic can be calculated as:
(23)
(24)
and are the parameters for Equation (6) to calculate the value of T(k), k is the number of the year of record, Yi is the ith observation in the dataset, is the mean of the time series and the s is the standard deviation of time series. It is anticipated that in the year k when there is a break occurs, and the T(k) statistic achieves its maximum value. In the SNHT, the T0 statistic can be defined as:
(25)

To assess the significance of the T0 statistic, a p-value can be calculated by comparing it to the distribution of T0 values under the null hypothesis. If the p-value is below the predetermined significance level, the null hypothesis of homogeneity is rejected.

Pettitt Test

Pettitt test (PT) is a non-parametric method that may be used to identify breaks in the centre of a time series. This method is based on the Wilcoxon test. The Xk statistic in the PT was derived using the rankings r1, r2, … ,rk of the Y1, Y2, … ,Yi. The Xk statistic can be expressed as (Kabbilawsh et al. 2023):
(26)
If in a particular year, there is a break occurring, the absolute value of the Xk statistic will achieve its maximum value, according to the PT. It is defined as:
(27)

To assess the significance of the Xk statistic, critical values can be determined based on the null distribution of the test statistic. If the observed Xk statistic exceeds the critical values, the null hypothesis of homogeneity is rejected.

The estimation results were generated using the data from stations in different states of Peninsular Malaysia. Five proposed conventional methods and three AI methods were applied to estimate the missing data from 11 proposed targeted stations. The performances of the estimation methods were evaluated using RMSE, MAE and CC methods.

Root mean square error

Table 2 shows the RMSE values of each method in different stations where a lower RMSE value indicates higher performance. ANFIS obtained the lowest RMSE values at five stations: 2,723,401, 3,116,430, 3,813,411, 5,721,442 and 3,519,426. ANN obtained the lowest RMSE values at stations 2,224,432, 4,907,422 and 4,930,401. ARIMA obtained the lowest RMSE values at stations 1,737,451, 5,606,410 and 6,503,401. As shown in Figure 3, ANFIS was the best method with the lowest average RMSE value of 1.87, while ANN and ARIMA followed with average RMSE values of 2.07 and 2.51, respectively. These findings are consistent with the study of Saplioglu & Kucukerdem (2018), who found the ANFIS method to be the best method to estimate missing streamflow data at the Yeşilırmak River. ANFIS excels because its structure is constructed by using the fuzzy logic inference and the learning capabilities of ANNs. Hence it performs better than just applying ANN alone. ANFIS identifies all feasible rules or enables them to be developed using input and output values.
Table 2

Performance of each estimation method based on RMSE results

StationsConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 9.96 7.65 6.14 15.11 8.92 1.37 2.14 1.74 
2,224,432 5.43 0.63 0.20 7.37 5.59 0.08 0.05 0.26 
2,723,401 3.50 2.79 2.24 4.08 3.47 0.23 0.26 0.14 
3,116,430 14.83 9.78 2.53 23.69 17.37 1.59 2.62 1.46 
3,813,411 8.18 9.43 6.76 9.70 5.43 2.58 3.25 1.58 
4,907,422 5.92 1.16 0.34 2.71 6.57 0.43 0.06 0.21 
5,606,410 20.42 26.76 7.76 22.10 10.98 0.37 0.46 0.44 
6,503,401 7.23 0.10 0.13 21.90 7.29 0.07 0.0997 0.1031 
5,721,442 143.55 147.22 13.27 145.65 75.86 9.97 9.96 9.85 
4,930,401 64.75 11.02 5.11 68.70 62.99 9.91 3.18 4.04 
3,519,426 42.04 54.71 1.95 29.84 43.35 1.06 0.7313 0.7309 
Average 29.62 24.66 4.22 31.90 22.53 2.51 2.07 1.87 
StationsConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 9.96 7.65 6.14 15.11 8.92 1.37 2.14 1.74 
2,224,432 5.43 0.63 0.20 7.37 5.59 0.08 0.05 0.26 
2,723,401 3.50 2.79 2.24 4.08 3.47 0.23 0.26 0.14 
3,116,430 14.83 9.78 2.53 23.69 17.37 1.59 2.62 1.46 
3,813,411 8.18 9.43 6.76 9.70 5.43 2.58 3.25 1.58 
4,907,422 5.92 1.16 0.34 2.71 6.57 0.43 0.06 0.21 
5,606,410 20.42 26.76 7.76 22.10 10.98 0.37 0.46 0.44 
6,503,401 7.23 0.10 0.13 21.90 7.29 0.07 0.0997 0.1031 
5,721,442 143.55 147.22 13.27 145.65 75.86 9.97 9.96 9.85 
4,930,401 64.75 11.02 5.11 68.70 62.99 9.91 3.18 4.04 
3,519,426 42.04 54.71 1.95 29.84 43.35 1.06 0.7313 0.7309 
Average 29.62 24.66 4.22 31.90 22.53 2.51 2.07 1.87 

Note: Bolded value indicates the best estimation method.

Figure 3

Average RMSE of each estimation method.

Figure 3

Average RMSE of each estimation method.

Close modal

Among all the conventional methods ARIMA was found to be the best method with the lowest average RMSE value of 2.51, as shown in Figure 3. NR, LI, IDW and AM followed with average RMSE values of 4.22, 22.53, 24.66 and 29.62, respectively. The CR was the worst as it possessed the highest average RMSE value of 31.90. The average RMSE values of conventional methods for each station were significantly higher compared to AI methods. Hence, it can be concluded that AI methods performed better than conventional methods at every station. These findings agree with the study by Hamzah et al. (2020), which showed that the performance of conventional methods was poorer than that of AI methods. This is because conventional methods suffer from different drawbacks, including the removal of correlation between variables, leading to the loss of important information and inaccuracy in the results (Faizin et al. 2019; Ratolojanahary et al. 2019).

Mean absolute error

The MAE values of each method for different stations are shown in Table 3 where the lower MAE value denotes the better performance of the method. ANFIS obtained the lowest MAE values at five stations, which were stations 2,723,401, 3,116,430, 3,813,411, 5,721,442 and 3,519,426, accordingly. ANN obtained the lowest MAE at stations 2,224,432, 4,907,422 and 4,930,401, accordingly. ARIMA obtained the lowest MAE at stations 1,737,451, 5,606,410 and 6,503,401, accordingly. Based on Figure 4, ANFIS was the best among all other methods with the lowest average MAE value of 1.08, while ANN came second best with an average MAE value of 1.20. These findings match the study of Mosavi et al. (2018), which showed that the performance of ANFIS was better than other AI and conventional methods. This is mainly because the ANFIS method offers a broad framework that combines the ANNs and fuzzy systems. By employing the fuzzy system, ANFIS can perform non-linear modelling capabilities to analyse the relationship between the input and output data. As a result, ANFIS has a strong ability to handle large and complicated input–output patterns with ease and provides an accurate estimation result. Besides, Table 3 showed that the ARIMA was the best conventional method with the lowest average MAE value of 1.45. NR, LI, IDW and AM were the second to fifth best with average MAE values of 2.44, 13.01, 14.24 and 17.10, respectively. The CR was the worst as it possessed the highest average MAE value of 18.42. When compared to AI methods, the average MAE values for conventional methods in each station were considerably higher. This leads to the conclusion that the performances of conventional methods were poorer than those of AI methods for every station. In addition, these findings are in accordance with the study of Gao et al. (2018) which indicated that the estimation result by using conventional methods was not as accurate as that AI method. This is because conventional methods such as AM, LI, CR, NR and IDW are only suitable to be applied when the data series consists of only a minimal number of missing values, as these methods reduce the size of the data set and ignore the correlation between the data, which may generate biased parameters which lead to the inaccuracy of the estimation result (Hamzah et al. 2020).
Table 3

Performance of each estimation method based on MAE results

StationsConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 5.75 4.41 3.55 8.73 5.15 0.79 1.23 1.00 
2,224,432 3.14 0.36 0.12 4.25 3.23 0.05 0.03 0.15 
2,723,401 2.02 1.61 1.29 2.36 2.01 0.13 0.15 0.08 
3,116,430 8.56 5.64 1.46 13.67 10.03 0.92 1.51 0.84 
3,813,411 4.72 5.44 3.91 5.60 3.14 1.49 1.87 0.91 
4,907,422 3.42 0.67 0.20 1.57 3.79 0.25 0.04 0.12 
5,606,410 11.79 15.45 4.48 12.76 6.34 0.21 0.26 0.25 
6,503,401 4.17 0.06 0.07 12.64 4.21 0.04 0.057 0.059 
5,721,442 82.88 85.00 7.66 84.09 43.80 5.76 5.75 5.68 
4,930,401 37.39 6.36 2.95 39.67 36.37 5.72 1.84 2.33 
3,519,426 24.27 31.59 1.13 17.23 25.03 0.61 0.4222 0.4220 
Average 17.10 14.24 2.44 18.42 13.01 1.45 1.20 1.08 
StationsConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 5.75 4.41 3.55 8.73 5.15 0.79 1.23 1.00 
2,224,432 3.14 0.36 0.12 4.25 3.23 0.05 0.03 0.15 
2,723,401 2.02 1.61 1.29 2.36 2.01 0.13 0.15 0.08 
3,116,430 8.56 5.64 1.46 13.67 10.03 0.92 1.51 0.84 
3,813,411 4.72 5.44 3.91 5.60 3.14 1.49 1.87 0.91 
4,907,422 3.42 0.67 0.20 1.57 3.79 0.25 0.04 0.12 
5,606,410 11.79 15.45 4.48 12.76 6.34 0.21 0.26 0.25 
6,503,401 4.17 0.06 0.07 12.64 4.21 0.04 0.057 0.059 
5,721,442 82.88 85.00 7.66 84.09 43.80 5.76 5.75 5.68 
4,930,401 37.39 6.36 2.95 39.67 36.37 5.72 1.84 2.33 
3,519,426 24.27 31.59 1.13 17.23 25.03 0.61 0.4222 0.4220 
Average 17.10 14.24 2.44 18.42 13.01 1.45 1.20 1.08 

Note: Bolded value indicates the best estimation method.

Figure 4

Average MAE of each estimation method.

Figure 4

Average MAE of each estimation method.

Close modal

Correlation coefficient

Table 4 indicates the CC values of each method for different stations where a higher CC value indicates the higher performance of the method. ANFIS obtained the highest CC values at eight stations, which were stations 2,224,432, 2,723,401, 3,116,430, 3,813,411, 5,606,410, 5,721,442, 4,930,401 and 3,519,426, respectively. ANN obtained the highest CC values at stations 1,737,451, 4,907,422 and 6,503,401, respectively. Figure 5 also indicated ANFIS as the best estimation method with the highest average CC value of 5,473.09, while ANN was the second best with an average CC value of 949.70. These results are in line with the study of Anusree & Varghese (2016), which indicated that ANFIS provided the highest accuracy in missing data estimation. This is primarily due to ANFIS's robust fuzzy logic neural network that offers a way for fuzzy modelling to learn information from the data set that enables the attached fuzzy inference system to analyse the input and output data. It can incorporate the advantages of ANN learning capability and the fuzzy inference system into a single framework since it integrates the ideas of neural networks with fuzzy logic. Its inference system is a collection of fuzzy IF–THEN rules that may approximate non-linear functions via learning. Besides, as shown in Table 4, among all the conventional methods ARIMA was found as the best with the highest average CC value of 615.9. NR, LI, CR and IDW were the second to fifth best with average CC values of 168.69, 53.40, 30.34 and 27.96, respectively. The AM was the worst as it possessed the lowest average CC value of 27.87. The average CC values of conventional methods for each station were significantly lower when compared to AI methods. Hence, it can be concluded that AI methods have a better performance compared to conventional methods for most of the stations. Moreover, these findings which indicated that the conventional methods are unable to perform well are consistent with the study of Ahmed et al. (2019), because conventional methods highly rely on input data sets which often include a large portion of unidentified values. Due to the inconsistencies of the collected data sets, conventional methods are unable to provide an accurate estimation result.
Table 4

Performance of each estimation method based on CC results

LocationConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 16.0 6.7 34.2 16.6 25.4 103.4 267.3 162.5 
2,224,432 1.5 1.5 2.3 1.5 1.5 5.2 19.8 153.6 
2,723,401 8.8 2.8 8.1 8.5 13.8 47.0 44.8 648.3 
3,116,430 27.3 37.1 43.1 25.8 27.9 90.3 43.8 1,611.8 
3,813,411 82.0 104.4 103.2 120.0 122.1 244.5 215.2 2,436.3 
4,907,422 4.3 1.5 10.4 4.6 4.4 50.9 3,633.4 362.6 
5,606,410 20.1 3.2 108.5 15.3 56.6 3,275.7 998.0 7,517.5 
6,503,401 0.26 0.00 0.01 0.25 0.27 0.24 2.82 0.00 
5,721,442 99.0 89.4 1,473.9 93.5 285.4 2,846.5 3,662.6 39,186.9 
4,930,401 36.5 50.2 54.4 36.5 39.0 77.0 1,502.7 5,018.3 
3,519,426 10.8 10.6 17.5 11.0 10.9 34.2 56.3 3,106.1 
Average 27.9 28.0 168.7 30.3 53.4 615.9 949.7 5,473.1 
LocationConventional methods
AI methods
AMIDWNRCRLIARIMAANNANFIS
1,737,451 16.0 6.7 34.2 16.6 25.4 103.4 267.3 162.5 
2,224,432 1.5 1.5 2.3 1.5 1.5 5.2 19.8 153.6 
2,723,401 8.8 2.8 8.1 8.5 13.8 47.0 44.8 648.3 
3,116,430 27.3 37.1 43.1 25.8 27.9 90.3 43.8 1,611.8 
3,813,411 82.0 104.4 103.2 120.0 122.1 244.5 215.2 2,436.3 
4,907,422 4.3 1.5 10.4 4.6 4.4 50.9 3,633.4 362.6 
5,606,410 20.1 3.2 108.5 15.3 56.6 3,275.7 998.0 7,517.5 
6,503,401 0.26 0.00 0.01 0.25 0.27 0.24 2.82 0.00 
5,721,442 99.0 89.4 1,473.9 93.5 285.4 2,846.5 3,662.6 39,186.9 
4,930,401 36.5 50.2 54.4 36.5 39.0 77.0 1,502.7 5,018.3 
3,519,426 10.8 10.6 17.5 11.0 10.9 34.2 56.3 3,106.1 
Average 27.9 28.0 168.7 30.3 53.4 615.9 949.7 5,473.1 

Note: Bolded value indicates the best estimation method.

Figure 5

Average CC of each estimation method.

Figure 5

Average CC of each estimation method.

Close modal

Ranking of each method

Based on the obtained RMSE, MAE and CC values, a ranking of each estimation method can be acquired. Table 5 indicates the ranking of each method, where the lower total points indicate the higher performance of the methods. According to Table 5, ANFIS was selected as the best among all the methods, while ANN was the second best. These findings are similar to the study of Jimeno-Sáez et al. (2017), which indicated that ANFIS provided superior performance in peninsular Spain. ANFIS is armed with the learning abilities of neural networks and fuzzy inference systems to generate a series of fuzzy IF–THEN criteria by analysing the patterns in different data sets. By merging the neural network's learning ability with fuzzy inference systems, ANFIS can estimate the missing streamflow data better than the other methods (Zakaria et al. 2021). Besides, as shown in Table 5, ARIMA was ranked as number three, which was also the best among all the conventional methods. ARIMA took into account of the autocorrelation present in the streamflow data series, which means the model made estimation by capturing the trend and pattern of the data.

Table 5

Ranking of each estimation method

Estimation methodTotal pointsRanking
ANFIS 60 1 
ANN 73 
ARIMA 81 
NR 136 
LI 193 
IDW 206 
AM 210 
CC 229 
Estimation methodTotal pointsRanking
ANFIS 60 1 
ANN 73 
ARIMA 81 
NR 136 
LI 193 
IDW 206 
AM 210 
CC 229 

Note: Bolded value indicates the best estimation method.

Homogeneity test

The homogeneity tests including PT, SNHT, BRT and VNRT were applied to the daily streamflow data collected from DID. If the null hypothesis is accepted with the p-value larger than 0.05, the streamflow data are considered homogeneous. The results produced by all four tests were categorized into three categories, which are ‘useful’, ‘doubtful’ and ‘suspect’. The result was categorized as ‘useful’ when all or three of the tests accepted the null hypotheses; ‘doubtful’ when two tests accepted the null hypotheses; ‘suspect’ when one or none of the tests accepted the null hypothesis. As shown in Table 6, the results of the PT indicated that the time series from all the stations were accepted. The rejections of SNHT occurred at three stations, which were stations 2,224,432, 3,813,411 and 5,721,442. Besides, the rejection of BRT was found at station 2,723,401. Furthermore, the rejections of VNRT occurred at most of the stations, except station 2,723,401. Because VNRT evaluates the time series as randomly distributed, hence the rejections indicated that the streamflow was normally distributed. This could be due to the influence of atmospheric pressure, temperature, humidity and wind speed. Previous research suggested that rainfall data in Peninsular Malaysia often exhibits a bell-shaped curve, which is one of the characteristics if a normal distribution (Tan et al. 2020; Ghani et al. 2022).

Table 6

Summary of homogeneity test for daily time series

StationsMethods
Results
PTSNHTBRTVNRT
1,737,451 0.459 0.215 0.635 < 0.0001 Useful 
2,224,432 0.956 0.0003 1.000 < 0.0001 Doubtful 
2,723,401 0.125 0.083 0.041 0.351 Useful 
3,116,430 0.714 0.138 0.947 0.038 Useful 
3,813,411 0.592 0.001 0.589 < 0.0001 Doubtful 
4,907,422 0.258 0.072 0.376 0.003 Useful 
5,606,410 0.959 0.061 0.951 < 0.0001 Useful 
6,503,401 0.822 0.150 0.857 0.001 Useful 
5,721,442 0.672 < 0.0001 0.789 < 0.0001 Doubtful 
4,930,401 0.966 0.119 0.973 0.044 Useful 
3,519,426 0.244 0.330 0.599 0.004 Useful 
StationsMethods
Results
PTSNHTBRTVNRT
1,737,451 0.459 0.215 0.635 < 0.0001 Useful 
2,224,432 0.956 0.0003 1.000 < 0.0001 Doubtful 
2,723,401 0.125 0.083 0.041 0.351 Useful 
3,116,430 0.714 0.138 0.947 0.038 Useful 
3,813,411 0.592 0.001 0.589 < 0.0001 Doubtful 
4,907,422 0.258 0.072 0.376 0.003 Useful 
5,606,410 0.959 0.061 0.951 < 0.0001 Useful 
6,503,401 0.822 0.150 0.857 0.001 Useful 
5,721,442 0.672 < 0.0001 0.789 < 0.0001 Doubtful 
4,930,401 0.966 0.119 0.973 0.044 Useful 
3,519,426 0.244 0.330 0.599 0.004 Useful 

Note: Bolded value indicates that the null hypothesis of homogeneity is rejected.

It can be concluded that there were total of eight stations categorized as ‘useful’, three stations categorized as ‘doubtful’ and none of the station was categorized as ‘suspect’. These results indicated that the streamflow data at most of the stations are homogenous. The daily time series categorized as ‘useful’ can be applied for future studies related to missing streamflow data estimation. Those daily time series categorized as ‘doubtful’ should be further alerted, as non-homogeneity was detected in the series. Even though there were three streamflow stations categorized as ‘doubtful’, the streamflow data can still be applied if they are treated carefully. However, it is important to note that while homogeneity tests can be useful for detecting changes in the streamflow data, they do not account for the uncertainty associated with missing streamflow values. Additionally, these tests can sometimes be sensitive to the magnitude of streamflow values, making it difficult to capture extreme events or rare hydrological phenomena.

Research on missing data estimation has received much attention from researchers worldwide, as accurate and complete datasets are important for reliable analysis and better decision-making.

In this study, the estimation of missing streamflow data was carried out in 11 stations from Peninsular Malaysia. The missing streamflow data were estimated by six different conventional methods including AM, IDW, NR, CR, LI and ARIMA, and two AI methods including ANN and ANFIS. Homogeneity tests, including PT, SNHT, BRT and VNRT were applied to access the quality of streamflow data sets. Based on the performance evaluation results, ANFIS was ranked as the best method, while ANN, ARIMA, NR, LI, IDW, AM and CR were ranked as the second to eighth-best, respectively. This can be explained by the fact that ANFIS combined the learning abilities and fuzzy inference systems which allows it to model the non-linear patterns of input–output data. This study demonstrates that AI methods significantly outperform conventional methods due to their better capabilities in dealing with complicated relationships in data and adapting to new data patterns. The conventional methods fall short in situations requiring complicated handling of the missing data. The findings of the estimation of missing streamflow data are important for the accuracy of hydrological analysis, such as flood and drought prediction, and are useful for other regions with similar tropical climatic conditions. The hydrological analysis with higher accuracy can enable related authorities to make the proper decisions on water resources management.

It is recommended that the longer time scales of streamflow data should be applied in future studies, such as utilization of datasets spanning more than 30 years. Longer time scales can provide a comprehensive view of streamflow characteristics and improve the reliability of the results. Additionally, it is recommended to apply hybrid methodologies that integrate multiple AI models, such as the multilayer perceptron neural networks model (Narimani et al. 2023) and hybrid deep neural network approach (Sharma et al. 2022) to improve the precision of missing data estimation. They have the potential to integrate the advantages of diverse stand-alone AI models by offering a powerful tool to handle ambiguous and incomplete data.

The authors would like to express their gratitude towards the Department of Irrigation and Drainage (DID) of Malaysia for providing the streamflow data for this study.

All authors equally contributed to the preparation of this manuscript. All authors read and approved the final manuscript.

This research received no external funding.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmed
A. N.
,
Othman
F. B.
,
Afan
H. A.
,
Ibrahim
R. K.
,
Fai
C. M.
,
Hossain
M. S.
,
Ehteram
M.
&
Elshafie
A.
(
2019
)
Machine learning methods for better water quality prediction
,
Journal of Hydrology
,
578
,
124084
.
doi:10.1016/j.jhydrol.2019.124084
.
Ahmed
A.
,
Ishak
M. Y.
,
Uddin
M. K.
,
Abd Samad
M. Y.
,
Mukhtar
S.
&
Danhassan
S. S
. (
2021
)
Effects of Some Weather Parameters on Oil Palm Production in the Peninsular Malaysia. Preprints, pp. 1–17. doi:10.20944/preprints202106.0456.v1
.
Alexandersson
H.
&
Moberg
A.
(
1997
)
Homogenization of Swedish temperature data. Part I: Homogeneity test for linear trends
,
International Journal of Climatology: A Journal of the Royal Meteorological Society
,
17
(
1
),
25
34
.
Al-Taiee
T. M.
(
2008
)
Statistical prediction of Tigris River levels at Mosul hydrological station, North Iraq
,
Journal of hydrology and hydromechanics
,
56
(
4
),
272
.
Anusree
K.
&
Varghese
K.
(
2016
)
Streamflow prediction of Karuvannur river basin using ANFIS, ANN and MNLR models
,
Procedia Technology
,
24
,
101
108
.
doi:10.1016/j.protcy.2016.05.015
.
Belotti
J.
,
Mendes
J. J.
Jr.
,
Leme
M.
,
Trojan
F.
,
Stevan
S. L.
Jr.
&
Siqueira
H.
(
2021
)
Comparative study of forecasting approaches in monthly streamflow series from Brazilian hydroelectric plants using extreme learning machines and Box & Jenkins models
,
Journal of hydrology and hydromechanics
,
69
(
2
),
180
195
.
doi:10.2478/johh-2021-0001
.
Faizin
R. N.
,
Riasetiawan
M.
&
Ashari
A.
(
2019
) ‘
A review of missing sensor data imputation methods
’,
2019 5th International Conference on Science and Technology (ICST)
. 1, 1–6.
doi:10.1109/icst47872.2019.9166287
.
Fentaw
F.
,
Melesse
A. M.
,
Hailu
D.
&
Nigussie
A.
(
2019
)
Precipitation and streamflow variability in Tekeze River basin, Ethiopia
. In: Melesse, A. M., Abtew, W. & Senay, G. (eds)
Extreme Hydrology and Climate Variability
, Oxford, UK: Elsevier, pp.
103
121
.
doi:10.1016/b978-0-12-815998-9.00010-5
.
Gao
Y.
,
Merz
C.
,
Lischeid
G.
&
Schneider
M.
(
2018
)
A review on missing hydrological data processing
,
Environmental Earth Sciences
,
77
(
2
), 1–12.
doi:10.1007/s12665-018-7228-6
.
Ghani, N. A. A. A., Senawi, A. & Subramaniam, R. (2022) A feasibility study of fitting the normal distribution and gamma distribution to rainfall data at Kuantan River Basin, Proceedings of the 5th International Conference on Water Resources (ICWR)–Volume 1: Current Research in Water Resources, Coastal and Environment, 293, 27–35. https://doi.org/10.1007/978-981-19-5947-9_3.
Hamzah
F. B.
,
Mohd Hamzah
F.
,
Mohd Razali
S. F.
,
Jaafar
O.
&
Abdul Jamil
N.
(
2020
)
Imputation methods for recovering streamflow observation: A methodological review
,
Cogent Environmental Science
,
6
(
1
),
1745133
.
doi:10.1080/23311843.2020.1745133
.
Heddam
S.
&
Kişi
Z.
(
2021
)
A new heuristic model for monthly streamflow forecasting
. In: Sharma, P. & Machiwal, D. (eds)
Advances in Streamflow Forecasting
, Amsterdam: Elsevier, pp.
281
303
.
doi:10.1016/b978-0-12-820673-7.00005-6
.
Higgins
P. A.
,
Palmer
J. G.
,
Rao
M. P.
,
Andersen
M. S.
,
Turney
C. S.
&
Johnson
F.
(
2022
)
Unprecedented high northern Australian streamflow linked to an intensification of the Indo-Australian monsoon
,
Water Resources Research
,
58
(
3
),
e2021WR030881
.
Ismail
W. W.
,
Zin
W. Z. W.
&
Ibrahim
W.
(
2017
)
Estimation of rainfall and stream flow missing data for Terengganu, Malaysia by using interpolation technique methods
.
Malaysian Journal of Fundamental and Applied Sciences
,
13
,
214
218
.
doi:10.11113/mjfas.v13n3.578
.
Jimeno-Sáez
P.
,
Senent-Aparicio
J.
,
Pérez-Sánchez
J.
,
Pulido-Velazquez
D.
&
Cecilia
J.
(
2017
)
Estimation of instantaneous peak flow using machine-Learning models and empirical formula in peninsular Spain
,
Water
,
9
(
5
),
347
.
doi:10.3390/w9050347
.
Kabbilawsh
P.
,
Kumar
D. S.
&
Chithra
N. R.
(
2023
)
Assessment of temporal homogeneity of long-term rainfall time-series datasets by applying classical homogeneity tests
,
Environment, Development and Sustainability
,
26
,
16757
16801
.
Kamwaga
S.
,
Mulungu
D. M.
&
Valimba
P.
(
2018
)
Assessment of empirical and regression methods for infilling missing streamflow data in Little Ruaha catchment Tanzania
,
Physics and Chemistry of the Earth, Parts A/B/C
,
106
,
17
28
.
doi:10.1016/j.pce.2018.05.008
.
Kerishnan
P. B.
,
Maruthaveeran
S.
&
Maulan
S.
(
2020
)
Investigating the usability pattern and constraints of pocket parks in Kuala Lumpur, Malaysia
,
Urban Forestry & Urban Greening
,
50
,
126647
.
doi:10.1016/j.ufug.2020.126647
.
Kermorvant
C.
,
Liquet
B.
,
Litt
G.
,
Jones
J. B.
,
Mengersen
K.
,
Peterson
E. E.
,
Hyndman
R. J.
&
Leigh
C.
(
2021
)
Reconstructing missing and anomalous data collected from high-frequency in-situ sensors in fresh waters
,
International Journal of Environmental Research and Public Health
,
18
(
23
),
12803
.
doi:10.3390/ijerph182312803
.
Kotu
V.
&
Deshpande
B.
(
2019
)
Time Series Forecasting
. In: Kotu, V. & Deshpande, B. (eds)
Data Science
, Burlington, MA: Morgan Kaufmann Publishers, pp.
395
445
.
doi:10.1016/b978-0-12-814761-0.00012-5
.
Li, L., Liu, Y., Wei, T. & Li, X. (2020) Exploring inter-sensor correlation for missing data estimation, In: IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, pp. 2108–2114. doi:10.1109/iecon43393.2020.9254904.
Mesta
B.
,
Akgun
O. B.
&
Kentel
E.
(
2020
)
Alternative solutions for long missing streamflow data for sustainable water resources management
,
International Journal of Water Resources Development
,
37
(
5
),
882
905
.
doi:10.1080/07900627.2020.1799763
.
Mosavi
A.
,
Ozturk
P.
&
Chau
K. W.
(
2018
)
Flood prediction using machine learning models: Literature review
,
Water
,
10
(
11
),
1536
.
doi:10.3390/w10111536
.
Narimani
R.
,
Jun
C.
,
De Michele
C.
,
Gan
T. Y.
,
Nezhad
S. M.
&
Byun
J.
(
2023
)
Multilayer perceptron-based predictive model using wavelet transform for the reconstruction of missing rainfall data
,
Stochastic Environmental Research and Risk Assessment
,
37
(
7
),
2791
2802
.
Nayak
P.
,
Sudheer
K.
,
Rangan
D.
&
Ramasastri
K.
(
2004
)
A neuro-fuzzy computing technique for modeling hydrological time series
,
Journal of Hydrology
,
291
(
1–2
),
52
66
.
doi:10.1016/j.jhydrol.2003.12.010
.
Ng
J. L.
,
Huang
Y. F.
,
Yong
S. L. S.
,
Lee
J. C.
,
Ahmed
A. N.
&
Mirzaei
M.
(
2024
)
Analysing the variability of non-stationary extreme rainfall events amidst climate change in East Malaysia
,
AQUA – Water Infrastructure, Ecosystems and Society
,
73
,
1494
1509
.
Noor
M.
,
Yahaya
A.
,
Ramli
N.
&
al Bakri
A. M. M.
(
2014
)
Filling missing data using interpolation methods: Study on the effect of fitting distribution
,
Key Engineering Materials
,
594–595
,
889
895
.
doi:10.4028/www.scientific.net/kem.594-595.889
.
Norazizi
N. A. A.
&
Deni
S. M.
(
2019
)
Comparison of artificial neural network (ANN) and other imputation methods in estimating missing rainfall data at Kuantan station
. In: Berry, M., Yap, B., Mohamed, A. & Köppen, M. (eds)
Communications in Computer and Information Science
, Singapore: Springer Science, pp.
298
306
.
doi:10.1007/978-981-15-0399-3_24
.
Ratolojanahary
R.
,
Houé Ngouna
R.
,
Medjaher
K.
,
Junca-Bourié
J.
,
Dauriac
F.
&
Sebilo
M.
(
2019
)
Model selection to improve multiple imputation for handling high rate missingness in a water quality dataset
,
Expert Systems with Applications
,
131
,
299
307
.
doi:10.1016/j.eswa.2019.04.049
.
Saplioglu
K.
&
Kucukerdem
T. S.
(
2018
)
Estimation of streamflow data using ANFIS models and determination of the number of datasets for ANFIS: The case of Yesirmak river
,
Applied Ecology and Environmental Research
,
16
(
3
),
3583
3594
.
doi:10.15666/aeer/1603_35833594
.
Sattari
M. T.
,
Rezazadeh-Joudi
A.
&
Kusiak
A.
(
2016
)
Assessment of different methods for estimation of missing data in precipitation studies
,
Hydrology Research
,
48
(
4
),
1032
1044
.
doi:10.2166/nh.2016.364
.
Sharma
G.
,
Singh
A.
&
Jain
S.
(
2022
)
A hybrid deep neural network approach to estimate reference evapotranspiration using limited climate data
,
Neural Computing and Applications
,
34
,
4013
4032
.
Souza
G. R. D.
,
Bello
I. P.
,
Corrêa
F. V.
&
Oliveira
L. F. C. D.
(
2020
)
Artificial neural networks for filling missing streamflow data in Rio do Carmo Basin, Minas Gerais, Brazil
,
Brazilian Archives of Biology and Technology
,
63
, 1–8.
doi:10.1590/1678-4324-2020180522
.
Tan
Y. X.
,
Ng
J. L.
&
Huang
Y. F.
(
2020
)
Estimation of missing daily rainfall during monsoon seasons for tropical region: A comparison between ANN and conventional methods
,
Carpathian Journal of Earth and Environmental Sciences
,
15
(
1
),
103
112
.
doi:10.26471/cjees/2020/015/113
.
Yilmaz
M. U.
&
Bihrat
Ö. N. Ö. Z
. (
2019
)
Evaluation of statistical methods for estimating missing daily streamflow data
,
Teknik Dergi
,
30
(
6
),
9597
9620
.
doi:10.18400/tekderg.421091
.
Zakaria
M. N. A.
,
Malek
M. A.
,
Zolkepli
M.
&
Ahmed
A. N.
(
2021
)
Application of artificial intelligence algorithms for hourly river level forecast: A case study of Muda River, Malaysia
,
Alexandria Engineering Journal
,
60
(
4
),
4015
4028
.
doi:10.1016/j.aej.2021.02.046
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).