This paper describes a novel application of a pattern recognition technique for predicting boundary shear stress distribution in open channels. In this approach, a synthetic database of images representing normalized shear stress distributions is formed from a training data set using recurrence plot (RP) analysis. The face recognition algorithm is then employed to synthesize the RPs and transform the original database into short-dimension vectors containing similarity weights proportional to the principal components of the distribution of images. These vectors capture the intrinsic properties of the boundary shear stress distribution of the cases in the training set, and are sensitive to variations of the corresponding hydraulic parameters. The process of transforming one-dimensional data series into vectors of weights is invertible, and therefore, shear stress distributions for unseen cases can be predicted. The developed method is applied to predict boundary shear stress distributions in smooth trapezoidal and circular channels and the results show a cross correlation coefficient above 92%, mean square errors within 0.04% and 4.48%, respectively, and average shear stress fluctuations within 2% and 5%, respectively, thus indicating that the proposed method is capable of providing accurate estimations of the boundary shear stress distribution in open channels.

## INTRODUCTION

Boundary shear stress is the result of the tangential component of the hydraulic forces that act in the direction parallel to the channel's boundaries and transfer momentum to its bed and walls (Chow 1959). Excessive shear stress can undermine channel stability by eroding bank sides and cause changes in the river morphology by affecting the transport and deposition of sediments (Julien 1995). Erosion often results in higher levels of turbidity and lower water quality levels. Furthermore, an increase in sediment movement and deposition can cause a decrease in channel capacity and, consequently, higher flood risk. Computation of flow resistance, side-wall correction, sediment discharge, channel erosion or deposition, cavitation problems, and design of stable channels are among the problems which require accurate estimates of the boundary shear stress distribution (Yang & Lim 1997; Guo & Julien 2005; Blanckaert *et al.* 2010).

The distribution of boundary shear stress over the wetted perimeter of a channel cross-section is non-uniform. This is true even for steady flows in straight prismatic channels with a simple cross-sectional geometry. This non-uniformity is mainly due to the anisotropy of the turbulence which produces transverse gradients of Reynolds stresses and secondary circulations (Gessner 1973). Tominaga *et al.* (1989) and Knight & Demetriou (1983) showed that the boundary shear stress increases where the secondary currents flow towards the wall, and decreases when they flow away from the wall. Other factors that govern the distribution of shear stress are the geometry of the cross-section, lateral and longitudinal boundary roughness distributions (Blanckaert *et al.* 2010) and sediment concentration (Khodashenas *et al.* 2008).

To date, numerous investigations have been conducted and various mechanistic and empirical methods have been developed for understanding and estimating the magnitude and distribution of boundary shear stress. However, due to the complexities involved, boundary shear stress has proven to be one of the most challenging parameters to quantify and measure, even for simple smooth prismatic channels with uniform flow.

^{−2}),

*A*is the channel's cross-section (m

^{2}),

*P*is the channel's wetted perimeter (m),

*L*is the reach length (m),

*γ*is water's specific weight (kg m

^{−3}) and

*α*is the slope angle of the channel bed plane. Rearranging Equation (1) gives:where

*R*is the hydraulic radius of the channel (m). This simple equation, often referred to as the slope method, is valid for both laminar and turbulent flow regimes, but only provides the average boundary shear stress.

*u*is the time-averaged (mean) streamwise velocity profile (ms

^{−1}),

*z*is the vertical coordinate (m), is the shear velocity (ms

^{−1}) given by ,

*ν*is the kinematic viscosity (m

^{2}s

^{−1}), is the von Karman's constant = 0.41 and

*C*is a dimensionless integration constant related to the thickness of the viscous sublayer. Although the log law is strictly valid for the turbulent sublayer (approximately the lower 20% of the depth), it is commonly extended over the entire flow depth in rivers and channels (Petrie & Diplas 2015). If the mean velocity profile,

*u*(

*z*), is known, then a simple linear regression (e.g. least squares) can be applied to fit the velocity profile to Equation (3) and calculate the log law parameters, the shear velocity, and consequently the shear stress. The advantage of this approach is that it does not need detailed information about bed roughness, however it requires measurements of the streamwise velocity profile, and making assumptions for the viscous sublayer thickness, which to some extent limits its applicability and accuracy.

Preston's (1954) method is the most widely practised technique for measuring boundary shear stress in smooth channels. In this method, a Preston tube is used to infer the velocity of the water flow by recording the difference between static and total pressures. A non-dimensional calibration function is then established based on the ‘law of the wall’, Equation (3), and used to determine the boundary shear stress from the differential pressures. The simplicity of the experimental setup and its operation are the main reasons behind the popularity of this method. However, for rough boundaries, application of the technique is substantially more complicated, due to the absence of a viscous sublayer. A number of studies (Hwang & Laursen 1963; Ghosh & Roy 1970; Hollick 1976; Hollingshead & Rajaratnam 1980) have attempted to extend the use of this technique to rough surfaces, and have calibrated curves for the Preston tube by using Nikuradse's (1933) model of velocity distribution over rough boundaries. Although promising, these methods can only be applied when the sand equivalent roughness height of the surface is known, which makes them unsuitable for application to a variety of open channels. Other methods based on fitting the log law of the wall such as Clauser's (1956) method and the boundary characteristics method (Hinze 1975; Papanicolaou *et al.* 2012) have been developed and applied to gradually (Afzalimehr & Anctil 2000) and rapidly varying flows over spatially varying boundaries (Papanicolaou *et al.* 2012).

Geometrical methods for estimating shear stress distribution (Leighly 1932; Einstein 1942; Lundgren & Jonsson 1964; Yang & Lim 1997; Khodashenas & Paquier 1999; 2005; Yu & Tan 2007; Abderrezzak *et al.* 2008) consist of splitting the channel cross-section into sub-regions where the shear force along each segment of the boundary is calculated by balancing the forces against the weight of fluid in the corresponding subregion. In these approximations, mapping and discretising the wetted perimeter is often a complicated and sensitive task, however, they have the advantage of requiring relatively low computational effort.

Where abundant experimental data existed, researchers (e.g. Knight 1981; Knight *et al.* 1984, 1994; Flintham & Carling 1988; Pizzuto 1991; Olivero *et al.* 1999) have used regression and correlation analysis to derive empirical and semi-empirical equations for boundary shear stress. These equations are capable of only calculating mean, maximum and percentages of shear stress carried on the channel's walls and beds with relatively good accuracy, but are unable to provide the distribution of shear stress along the entire wetted perimeter. Some other researchers (e.g. Zheng & Jin 1998; Jin *et al.* 2004; Bilgil 2005; Guo & Julien 2005) have solved the governing energy transport, continuity, and momentum equations to formulate analytical and semi-analytical solutions for calculating the boundary shear stress. These methods often rely on a number of subjective and controversial assumptions and require a large amount of computing resources which make them impractical. With the advent of more powerful computers, computational fluid dynamic techniques have also been used (e.g. Christensen & Fredsoe 1998; De Cacqueray *et al.* 2009) to solve the referred set of equations and calculate the boundary shear stress distribution. Nonetheless, these methods are computationally expensive and the model outputs are extremely sensitive to mesh size, the turbulence closure model, and other internal parameters defined by the user.

Recently, information theory and machine-learning techniques have been used to tackle this problem. For instance, the principle of maximum entropy has been used (e.g. Sterling & Knight 2002; Li & Zhang 2008; Bonakdari & Moazaminia 2015) to establish relationships for the boundary shear stress. A comparison with experimental data has shown that these approximations provide relatively flat shear stress distributions which make them unreliable. The divergence between the numerical and experimental results increases at the regions around the corners of the sections where secondary flow structures are more pronounced. Cobaner *et al.* (2010) used a neural network with four hidden layers to predict the percentage of the shear force acting on the walls of smooth rectangular channels and ducts. The study concluded that the artificial neural network (ANN) predictions were less biased and slightly more accurate than the classic empirical models suggested by Knight *et al.* (1984) and Knight & Patel (1985).

Measuring the actual local shear stress along the channel's boundaries is difficult and costly owing to the complexity of the turbulent velocity field, presence of flow structures, and the small magnitude of the stress. Shear stress also represents a difficult parameter to calculate due to the variability of channel slope, geometry and flow structures, which are the main influencing factors in the complex flow process. To date, all the developed methods are inherently based on some sort of simplifying assumption, and therefore, the problem of accurately estimating these stresses has only been partially resolved (Zheng & Jin 1998).

The recent relative abundance in available experimental data has offered an opportunity to test and validate novel techniques for describing the boundary shear stress distribution. In this paper, an advanced pattern recognition technique is employed to predict the distribution of boundary shear stress in open channels. This technique, which results from merging two existing algorithms (recurrence plots (RPs) and Eigenfaces for Recognition), is combined with a standard regression model for the prediction of data series representing shear stress distribution of flows with known attributes (i.e. Froude numbers, flow depths and channel slopes).

In the following sections the RP analysis and its adaptation to the Eigenfaces for Recognition is explained. This is followed by a description of the experimental data used in the study and details of the proposed methodology. Next, the prediction of boundary shear stress distributions in trapezoidal and circular channels are presented and critically discussed. The paper concludes with a discussion on the advantages of the method and suggestions for improvement.

## BACKGROUND

The proposed approach for predicting boundary shear stress distribution combines RP analysis (Eckmann *et al.* 1987) and Eigenfaces for Recognition (Turk & Pentland 1991). RP is used to transform one-dimensional data series into two-dimensional arrays which can be graphically represented. Eigenfaces for Recognition is then used as a means of identifying patterns in the arrays and to transform these into short-dimension vectors which can then be used to predict boundary shear stress distributions. It is to note that despite using a method that was originally developed for the recognition of human faces using 2D still images; no ‘recognition’ is involved in the proposed methodology. Instead, the technique is used to filter the original data and reduce their dimensionality whilst preserving intrinsic qualities. These reduced databases are then used to produce shear stress distribution for unseen cases.

### Recurrence plots

*et al.*2007). In this technique, starting from the first point of a data series,

*d*-dimensional vectors are formed by taking a sample of

*d*consecutive points in the data series:where subscripts

*j*and

*k*represent the

*j*th and

*k*th data points in the data series. The

*d*-dimensional vectors are then correlated by calculating the Euclidean distance between them. This parameter can then be used to form the RP matrix:where

*e*is the distance between vectors and and

_{jk}*N*is the total number of vectors, which define the number of data points in each column and row of the matrix. Note that the values of

*e*in the RP matrix vary with

_{jk}*d*whilst any value of

*d*would result in a matrix that could be used in the recognition method as described further below. However by letting

*d*= 1 each row in the RP matrix effectively becomes a normalized version of the original one-dimensional data series which has the benefit of maintaining its basic structure throughout the pattern recognition process.

By projecting the RP matrix on a Cartesian space, a map of the data can be generated. In this case each pixel on the map has the coordinates {*j,k*} with *j*, *k* = 1, 2, … , *N*, as well as a numerical value that is proportional to its associated distance, *e _{jk}*. In an 8-bit grayscale image representation, the values of

*e*can take values within the range 0 to 255, where the brightness intensity of each pixel indicates a larger

_{jk}*e*. Through the calculation of the

_{jk}*RP*matrix, the correlation between all data points within the data series is established whilst preserving the basic structure of the database. Such transformation and visualization helps to make explicit features of data which otherwise would be difficult to observe in the original series.

*RP*s helps to visualize characteristic patterns of the data, although for numerical analysis, its elements need to be sorted in a

*N*dimension vector where all rows of the

^{2}*RP*matrix are assembled in sequence:The arrangement of rows of the RP matrix into the Г vector is shown schematically in Figure 1. The unique configuration of RPs makes them particularly suitable for machine learning. That is because pattern recognition methods are capable of identifying data sets with unique features such as those made explicit through the RP method. In the following sections the post-processing of Г vectors and their relationship with the hydraulic parameters that control shear stress distributions is discussed in detail.

### Eigenfaces for Recognition

*N*×

*N*can be represented by an

*N*

^{2}size vector Г, where each element is a real number that represents an individual pixel in the image. If the training set consists of

*M*images, then the average face of the training set,

*ϑ*, is defined by:and hence, the difference between each image, Г

*, and the average face,*

_{i}*ϑ,*is given by:Performing principal component analysis (PCA) on the collection of all

*ϕ*

_{i}, would result in a set of

*M*orthonormal vectors which best describe the distribution of data. PCA is a statistical procedure that is able to identify orthogonal modes or degrees of freedom within a numerical array, and transform a number of possibly correlated variables into a smaller number of uncorrelated variables, which are called the principal components. The eigenvectors and eigenvalues of these principal components can be determined from the covariance matrix:whereMatrix

*C*is of size

*N*

^{2}, and finding its eigenvectors and eigenvalues is computationally expensive. If the number of training images,

*M*, is less than the dimension of the space,

*N*

^{2}, then there will only be

*M*-1 meaningful eigenvectors (Turk & Pentland 1991), and hence, to reduce the calculations, an

*M*by

*M*matrix

*L*can be constructed to find the meaningful eigenvectors:whereThe principal components of the distribution of images are called the eigenfaces,

*u*, which can be calculated from a linear combination of the images and eigenvectors:where

_{l}*v*is the

_{li}*i*

^{th}eigenvector of the covariance matrix. The collection of eigenface vectors defines a subspace of training images which is called the ‘

*face space*’. Any input image expressed in vector form Г, can be projected into the

*face space*through the following operation:where

*ω*

_{i}is a weight factor that describes the contribution of the

*i*

^{th}eigenface in representing the image, and

*M′*is the number of significant eigenvectors, associated with the

*M*largest eigenvalues, i.e.

*M′*

*≤*

*M*. Furthermore, the set of weights ordered in a short-dimension vector

*Ω*

^{T}*=*

*(ω*

_{1}

*, ω*

_{2}

*…*

*, ω*

_{M}′) can be used to project any new image, Г′, into the

*face space*by:where

*U*= {

*u*

_{i}} is the collection of eigenfaces. Equation (15) suggests that the process of encoding data into Ω vectors can be inverted for prediction purposes. If a reliable estimation of weights factors is available to conform a new vector Ω′, then a prediction of its associated image can be made through:

As will be further explained in the following sections, a reliable estimation of the weight factors, i.e. vector Ω′, would be based on the known vectors obtained through Equation (15) together with the parameters that characterize the original data sets. These vectors and parameters can be typically related with the aid of a simple regression model or more complicated method such as an artificial neural network, and the output Ω*′* can be considered to be reliable if the modelling error is less than 5%. Note that the validity of Equation (16) is provided in Appendix A (available with the online version of this paper).

## EXPERIMENTAL DATA SETS

In this study, laboratory measurements of flow velocity and boundary shear stress in trapezoidal and circular open channels were taken directly from the University of Birmingham's Flow Database (www.flowdata.bham.ac.uk).

### Trapezoidal data sets

^{−3}to 2.337 × 10

^{−2}in order to observe shear stress distribution for Froude (Fr) and Reynolds (Re) numbers within ranges of 0.58 ≤ Fr ≤ 3.59 and 0.46 × 10

^{5}≤ Re ≤ 6.18 × 10

^{5}, respectively, which derives from flow velocities (

*V*) between 0.39 and 2.69 ms

^{−1}. Measurements of velocity and shear stress were taken on average every 20 mm along the wetted perimeter (i.e. between 16 and 32 measurement points for each case), and measurement accuracy was estimated to be within ±5% (Yuen 1989).

To obtain homogenous subsets suitable for pattern recognition, and to test the sensitivity of the approach to the size of the training set, k-nearest neighbors (k-nn) clustering analysis (Fix & Hodges 1951) was first performed. The fundamental idea of the k-nn algorithm is to simply separate the data based on the assumed similarities between various clusters. Here, the Euclidean distance metric was used to measure the similarity between clusters, and shear stress data were non-dimensionalized by the average shear stress to eliminate the scale effects. K-nn was run with different *k* values, and consequently, three clusters (subsets) were identified by investigating the resultant dendrograms, i.e. graphical tree-structures that show the hierarchical relationships among clusters, ensuring highest similarity within each cluster (homogeneity) and lowest similarity between clusters were achieved.

Table 1 lists the geometric and hydraulic parameters of all the experiments. In each subset, one experiment (highlighted in Table 1) was randomly selected and excluded to be used for validation whilst the remaining were considered for training. Since the method requires all data series in the set to have the same number of measurements taken at relatively even distances, for each experiment, the horizontal coordinates of all data points were normalized using a perimetric distance defined as *Pd**=**s/p*, where *s* is the distance along the wetted perimeter starting at the left bank at the free surface, moving around the wetted perimeter, and *p* is the total length of the wetted perimeter. The shear stress measurements in each experiment were also non-dimensionalized by the average shear stress. Where required, linear interpolation was used to obtain shear stress values from adjacent neighboring points. It is notable that at regions where shear stress varied at higher rates, experimental measurements were taken at smaller increments, thus resulting in the standardization of the degree of accuracy across the wetted perimeter.

. | . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | ID . | 2b (m)
. | h (m)
. | A (m^{2})
. | S
. _{0} | P (m)
. | R
. | V (ms^{−1})
. | Q (m^{3} s^{−1})
. | (Nm^{−2})
. | Fr . | Re . |

Set 1 | #1 | 0.15 | 0.030 | 0.005 | 0.0040 | 0.235 | 0.023 | 0.565 | 0.003 | 0.894 | 1.12 | 11,392 |

#2 | 0.15 | 0.058 | 0.012 | 0.0087 | 0.313 | 0.038 | 1.308 | 0.016 | 3.256 | 1.97 | 43,656 | |

#3 | 0.15 | 0.037 | 0.007 | 0.0234 | 0.255 | 0.027 | 1.843 | 0.013 | 6.223 | 3.35 | 43,800 | |

(TVc-1) | #4 | 0.15 | 0.042 | 0.008 | 0.0234 | 0.269 | 0.030 | 1.901 | 0.015 | 6.871 | 3.27 | 50,251 |

#5 | 0.15 | 0.050 | 0.010 | 0.0234 | 0.291 | 0.034 | 2.080 | 0.021 | 7.859 | 3.32 | 62,609 | |

#6 | 0.15 | 0.057 | 0.012 | 0.0234 | 0.310 | 0.038 | 2.190 | 0.026 | 8.625 | 3.32 | 72,546 | |

Set 2 | #7 | 0.45 | 0.044 | 0.022 | 0.0040 | 0.574 | 0.038 | 0.893 | 0.019 | 1.472 | 1.42 | 29,574 |

#8 | 0.45 | 0.050 | 0.025 | 0.0010 | 0.591 | 0.042 | 0.398 | 0.010 | 0.414 | 0.60 | 14,743 | |

#9 | 0.45 | 0.056 | 0.029 | 0.0010 | 0.609 | 0.047 | 0.428 | 0.012 | 0.459 | 0.61 | 17,563 | |

#10 | 0.45 | 0.060 | 0.031 | 0.0010 | 0.620 | 0.049 | 0.439 | 0.013 | 0.484 | 0.60 | 18,998 | |

#11 | 0.15 | 0.029 | 0.005 | 0.0087 | 0.231 | 0.022 | 0.924 | 0.005 | 1.882 | 1.88 | 17,923 | |

#12 | 0.15 | 0.036 | 0.007 | 0.0087 | 0.250 | 0.026 | 1.010 | 0.007 | 2.244 | 1.87 | 23,347 | |

#13 | 0.15 | 0.041 | 0.008 | 0.0088 | 0.266 | 0.029 | 1.113 | 0.009 | 2.521 | 1.94 | 28,663 | |

#14 | 0.15 | 0.048 | 0.009 | 0.0087 | 0.284 | 0.033 | 1.231 | 0.012 | 2.815 | 2.01 | 35,702 | |

(TVc-2) | #15 | 0.45 | 0.044 | 0.022 | 0.0087 | 0.574 | 0.038 | 1.381 | 0.030 | 3.228 | 2.19 | 45,751 |

#16 | 0.45 | 0.059 | 0.030 | 0.0087 | 0.615 | 0.048 | 1.553 | 0.046 | 4.124 | 2.17 | 65,743 | |

#17 | 0.45 | 0.044 | 0.022 | 0.0145 | 0.574 | 0.038 | 1.822 | 0.040 | 5.384 | 2.89 | 60,371 | |

#18 | 0.15 | 0.029 | 0.005 | 0.0234 | 0.231 | 0.022 | 1.592 | 0.008 | 5.052 | 3.24 | 30,888 | |

#19 | 0.45 | 0.045 | 0.022 | 0.0234 | 0.576 | 0.038 | 2.272 | 0.050 | 8.752 | 3.59 | 76,145 | |

#20 | 0.45 | 0.059 | 0.030 | 0.0234 | 0.615 | 0.048 | 2.427 | 0.072 | 11.067 | 3.38 | 102,739 | |

Set 3 | #21 | 0.15 | 0.075 | 0.017 | 0.0040 | 0.362 | 0.047 | 0.960 | 0.016 | 1.812 | 1.29 | 39,299 |

#22 | 0.15 | 0.107 | 0.028 | 0.0010 | 0.453 | 0.061 | 0.497 | 0.014 | 0.596 | 0.58 | 26,583 | |

(TVc-3) | #23 | 0.15 | 0.125 | 0.034 | 0.0010 | 0.504 | 0.068 | 0.544 | 0.019 | 0.669 | 0.59 | 32,599 |

#24 | 0.15 | 0.150 | 0.045 | 0.0010 | 0.574 | 0.078 | 0.584 | 0.026 | 0.768 | 0.59 | 40,170 | |

#25 | 0.45 | 0.075 | 0.039 | 0.0010 | 0.662 | 0.060 | 0.514 | 0.020 | 0.583 | 0.64 | 26,845 | |

#26 | 0.15 | 0.073 | 0.016 | 0.0087 | 0.356 | 0.046 | 1.468 | 0.024 | 3.896 | 2.00 | 58,886 | |

#27 | 0.15 | 0.099 | 0.025 | 0.0087 | 0.430 | 0.057 | 1.667 | 0.041 | 4.891 | 2.00 | 84,008 | |

#28 | 0.15 | 0.030 | 0.005 | 0.0145 | 0.235 | 0.023 | 1.296 | 0.007 | 3.272 | 2.58 | 26,146 | |

#29 | 0.15 | 0.075 | 0.017 | 0.0145 | 0.361 | 0.046 | 1.943 | 0.033 | 6.598 | 2.62 | 78,915 | |

#30 | 0.15 | 0.099 | 0.025 | 0.0234 | 0.430 | 0.057 | 2.690 | 0.066 | 13.129 | 3.23 | 135,513 | |

#31 | 0.45 | 0.044 | 0.004 | 0.0234 | 0.574 | 0.007 | 1.679 | 0.007 | 5.264 | 3.13 | 9,997 | |

#32 | 0.45 | 0.058 | 0.006 | 0.0234 | 0.614 | 0.010 | 1.785 | 0.011 | 6.513 | 2.96 | 15,045 |

. | . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | ID . | 2b (m)
. | h (m)
. | A (m^{2})
. | S
. _{0} | P (m)
. | R
. | V (ms^{−1})
. | Q (m^{3} s^{−1})
. | (Nm^{−2})
. | Fr . | Re . |

Set 1 | #1 | 0.15 | 0.030 | 0.005 | 0.0040 | 0.235 | 0.023 | 0.565 | 0.003 | 0.894 | 1.12 | 11,392 |

#2 | 0.15 | 0.058 | 0.012 | 0.0087 | 0.313 | 0.038 | 1.308 | 0.016 | 3.256 | 1.97 | 43,656 | |

#3 | 0.15 | 0.037 | 0.007 | 0.0234 | 0.255 | 0.027 | 1.843 | 0.013 | 6.223 | 3.35 | 43,800 | |

(TVc-1) | #4 | 0.15 | 0.042 | 0.008 | 0.0234 | 0.269 | 0.030 | 1.901 | 0.015 | 6.871 | 3.27 | 50,251 |

#5 | 0.15 | 0.050 | 0.010 | 0.0234 | 0.291 | 0.034 | 2.080 | 0.021 | 7.859 | 3.32 | 62,609 | |

#6 | 0.15 | 0.057 | 0.012 | 0.0234 | 0.310 | 0.038 | 2.190 | 0.026 | 8.625 | 3.32 | 72,546 | |

Set 2 | #7 | 0.45 | 0.044 | 0.022 | 0.0040 | 0.574 | 0.038 | 0.893 | 0.019 | 1.472 | 1.42 | 29,574 |

#8 | 0.45 | 0.050 | 0.025 | 0.0010 | 0.591 | 0.042 | 0.398 | 0.010 | 0.414 | 0.60 | 14,743 | |

#9 | 0.45 | 0.056 | 0.029 | 0.0010 | 0.609 | 0.047 | 0.428 | 0.012 | 0.459 | 0.61 | 17,563 | |

#10 | 0.45 | 0.060 | 0.031 | 0.0010 | 0.620 | 0.049 | 0.439 | 0.013 | 0.484 | 0.60 | 18,998 | |

#11 | 0.15 | 0.029 | 0.005 | 0.0087 | 0.231 | 0.022 | 0.924 | 0.005 | 1.882 | 1.88 | 17,923 | |

#12 | 0.15 | 0.036 | 0.007 | 0.0087 | 0.250 | 0.026 | 1.010 | 0.007 | 2.244 | 1.87 | 23,347 | |

#13 | 0.15 | 0.041 | 0.008 | 0.0088 | 0.266 | 0.029 | 1.113 | 0.009 | 2.521 | 1.94 | 28,663 | |

#14 | 0.15 | 0.048 | 0.009 | 0.0087 | 0.284 | 0.033 | 1.231 | 0.012 | 2.815 | 2.01 | 35,702 | |

(TVc-2) | #15 | 0.45 | 0.044 | 0.022 | 0.0087 | 0.574 | 0.038 | 1.381 | 0.030 | 3.228 | 2.19 | 45,751 |

#16 | 0.45 | 0.059 | 0.030 | 0.0087 | 0.615 | 0.048 | 1.553 | 0.046 | 4.124 | 2.17 | 65,743 | |

#17 | 0.45 | 0.044 | 0.022 | 0.0145 | 0.574 | 0.038 | 1.822 | 0.040 | 5.384 | 2.89 | 60,371 | |

#18 | 0.15 | 0.029 | 0.005 | 0.0234 | 0.231 | 0.022 | 1.592 | 0.008 | 5.052 | 3.24 | 30,888 | |

#19 | 0.45 | 0.045 | 0.022 | 0.0234 | 0.576 | 0.038 | 2.272 | 0.050 | 8.752 | 3.59 | 76,145 | |

#20 | 0.45 | 0.059 | 0.030 | 0.0234 | 0.615 | 0.048 | 2.427 | 0.072 | 11.067 | 3.38 | 102,739 | |

Set 3 | #21 | 0.15 | 0.075 | 0.017 | 0.0040 | 0.362 | 0.047 | 0.960 | 0.016 | 1.812 | 1.29 | 39,299 |

#22 | 0.15 | 0.107 | 0.028 | 0.0010 | 0.453 | 0.061 | 0.497 | 0.014 | 0.596 | 0.58 | 26,583 | |

(TVc-3) | #23 | 0.15 | 0.125 | 0.034 | 0.0010 | 0.504 | 0.068 | 0.544 | 0.019 | 0.669 | 0.59 | 32,599 |

#24 | 0.15 | 0.150 | 0.045 | 0.0010 | 0.574 | 0.078 | 0.584 | 0.026 | 0.768 | 0.59 | 40,170 | |

#25 | 0.45 | 0.075 | 0.039 | 0.0010 | 0.662 | 0.060 | 0.514 | 0.020 | 0.583 | 0.64 | 26,845 | |

#26 | 0.15 | 0.073 | 0.016 | 0.0087 | 0.356 | 0.046 | 1.468 | 0.024 | 3.896 | 2.00 | 58,886 | |

#27 | 0.15 | 0.099 | 0.025 | 0.0087 | 0.430 | 0.057 | 1.667 | 0.041 | 4.891 | 2.00 | 84,008 | |

#28 | 0.15 | 0.030 | 0.005 | 0.0145 | 0.235 | 0.023 | 1.296 | 0.007 | 3.272 | 2.58 | 26,146 | |

#29 | 0.15 | 0.075 | 0.017 | 0.0145 | 0.361 | 0.046 | 1.943 | 0.033 | 6.598 | 2.62 | 78,915 | |

#30 | 0.15 | 0.099 | 0.025 | 0.0234 | 0.430 | 0.057 | 2.690 | 0.066 | 13.129 | 3.23 | 135,513 | |

#31 | 0.45 | 0.044 | 0.004 | 0.0234 | 0.574 | 0.007 | 1.679 | 0.007 | 5.264 | 3.13 | 9,997 | |

#32 | 0.45 | 0.058 | 0.006 | 0.0234 | 0.614 | 0.010 | 1.785 | 0.011 | 6.513 | 2.96 | 15,045 |

### Circular channels

*Pd*, was used to standardize the number of measurements along the wetted perimeter. Where data points did not exist in the original series, linear interpolation was used to infer local boundary shear stress from adjacent neighboring points. Figure 5 depicts the boundary shear stress distribution of the cases in the training set.

. | . | 1 . | . | . | 4 . | 5 . | 6 . | . | 8 . | 9 . | 10 . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | ID . | D
. | 2 . | 3 . | A . | S
. _{0} | P
. | 7 . | V . | Q . | . | 11 . | 12 . |

. | . | m . | t/D
. | h/D
. | (m^{2})
. | × 10^{−2}
. | (m) . | R
. | (ms^{−1})
. | (m^{3} s^{−1})
. | (Nm^{−2})
. | Fr . | Re . |

#1 | 0.24 | 0 | 0.33 | 0.014 | 0.1 | 0.300 | 0.045 | 0.394 | 0.005 | 0.441 | 0.52 | 15,687 | |

#2 | 0.24 | 0 | 0.51 | 0.024 | 0.1 | 0.386 | 0.061 | 0.493 | 0.012 | 0.597 | 0.51 | 26,580 | |

#3 | 0.24 | 0 | 0.83 | 0.041 | 0.1 | 0.557 | 0.074 | 0.554 | 0.023 | 0.721 | 0.38 | 36,068 | |

CVc-1 | #4 | 0.24 | 0.25 | 0.15 | 0.008 | 0.196 | 0.078 | 0.106 | 0.403 | 0.003 | 0.545 | 0.70 | 37,377 |

#5 | 0.24 | 0.25 | 0.08 | 0.004 | 0.196 | 0.044 | 0.100 | 0.294 | 0.001 | 0.337 | 0.67 | 25,865 | |

#6 | 0.24 | 0.25 | 0.25 | 0.014 | 0.862 | 0.127 | 0.111 | 1.283 | 0.018 | 3.538 | 1.70 | 125,381 | |

#7 | 0.24 | 0.25 | 0.42 | 0.024 | 0.862 | 0.210 | 0.114 | 1.625 | 0.039 | 4.804 | 1.59 | 162,219 | |

#8 | 0.24 | 0.25 | 0.55 | 0.031 | 0.196 | 0.282 | 0.109 | 0.775 | 0.024 | 1.198 | 0.63 | 74,130 | |

#9 | 0.24 | 0.33 | 0.17 | 0.010 | 0.2 | 0.083 | 0.117 | 0.449 | 0.004 | 0.612 | 0.72 | 46,203 | |

#10 | 0.24 | 0.33 | 0.33 | 0.020 | 0.2 | 0.166 | 0.117 | 0.625 | 0.012 | 0.967 | 0.69 | 64,359 | |

#11 | 0.24 | 0.33 | 0.47 | 0.027 | 0.2 | 0.241 | 0.110 | 0.833 | 0.022 | 1.106 | 0.72 | 80,571 | |

#12 | 0.24 | 0.5 | 0.16 | 0.009 | 0.9 | 0.081 | 0.117 | 0.886 | 0.008 | 2.571 | 1.40 | 91,194 | |

CVc-2 | #13 | 0.24 | 0.5 | 0.25 | 0.014 | 0.88 | 0.126 | 0.111 | 1.143 | 0.016 | 3.341 | 1.42 | 111,576 |

. | . | 1 . | . | . | 4 . | 5 . | 6 . | . | 8 . | 9 . | 10 . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | ID . | D
. | 2 . | 3 . | A . | S
. _{0} | P
. | 7 . | V . | Q . | . | 11 . | 12 . |

. | . | m . | t/D
. | h/D
. | (m^{2})
. | × 10^{−2}
. | (m) . | R
. | (ms^{−1})
. | (m^{3} s^{−1})
. | (Nm^{−2})
. | Fr . | Re . |

#1 | 0.24 | 0 | 0.33 | 0.014 | 0.1 | 0.300 | 0.045 | 0.394 | 0.005 | 0.441 | 0.52 | 15,687 | |

#2 | 0.24 | 0 | 0.51 | 0.024 | 0.1 | 0.386 | 0.061 | 0.493 | 0.012 | 0.597 | 0.51 | 26,580 | |

#3 | 0.24 | 0 | 0.83 | 0.041 | 0.1 | 0.557 | 0.074 | 0.554 | 0.023 | 0.721 | 0.38 | 36,068 | |

CVc-1 | #4 | 0.24 | 0.25 | 0.15 | 0.008 | 0.196 | 0.078 | 0.106 | 0.403 | 0.003 | 0.545 | 0.70 | 37,377 |

#5 | 0.24 | 0.25 | 0.08 | 0.004 | 0.196 | 0.044 | 0.100 | 0.294 | 0.001 | 0.337 | 0.67 | 25,865 | |

#6 | 0.24 | 0.25 | 0.25 | 0.014 | 0.862 | 0.127 | 0.111 | 1.283 | 0.018 | 3.538 | 1.70 | 125,381 | |

#7 | 0.24 | 0.25 | 0.42 | 0.024 | 0.862 | 0.210 | 0.114 | 1.625 | 0.039 | 4.804 | 1.59 | 162,219 | |

#8 | 0.24 | 0.25 | 0.55 | 0.031 | 0.196 | 0.282 | 0.109 | 0.775 | 0.024 | 1.198 | 0.63 | 74,130 | |

#9 | 0.24 | 0.33 | 0.17 | 0.010 | 0.2 | 0.083 | 0.117 | 0.449 | 0.004 | 0.612 | 0.72 | 46,203 | |

#10 | 0.24 | 0.33 | 0.33 | 0.020 | 0.2 | 0.166 | 0.117 | 0.625 | 0.012 | 0.967 | 0.69 | 64,359 | |

#11 | 0.24 | 0.33 | 0.47 | 0.027 | 0.2 | 0.241 | 0.110 | 0.833 | 0.022 | 1.106 | 0.72 | 80,571 | |

#12 | 0.24 | 0.5 | 0.16 | 0.009 | 0.9 | 0.081 | 0.117 | 0.886 | 0.008 | 2.571 | 1.40 | 91,194 | |

CVc-2 | #13 | 0.24 | 0.5 | 0.25 | 0.014 | 0.88 | 0.126 | 0.111 | 1.143 | 0.016 | 3.341 | 1.42 | 111,576 |

## METHODOLOGY

The initial step in using RPs and Eigenface Recognition for predicting boundary shear stress distribution is forming a training set from available experimental data. As mentioned in the previous section, the raw experimental data sets typically consist of a number of local boundary sheer stress measurements, taken along the wetted perimeter of a channel, and presented as a data series. If measurements are not taken at the same relative locations across the different channels, then, to make the raw data suitable for use in the data mining algorithm, interpolation is carried out to find shear stresses at the same relative distances along the wetted perimeter, for all data series.

As the Eigenfaces for Recognition algorithm accepts two-dimensional arrays of numbers, the original (one-dimensional) data series in the training set has to be pre-processed. This can be done through the RP algorithm. To this end, the dimension of the vectors identified in Equation (4) is set to be equal to 1, so that each vector would contain numerical differences of shear stress between consecutive points in the data series, effectively resulting in each row of the RP matrix becoming a normalized version of the original data series. In line with the image recognition algorithm, once the RP matrix is formed, Equation (6) is used to construct a unique Г* _{i}* vector representing the

*i*-th experimental data set.

*ϑ*, and consequently, the difference between each image in the training set and the average face,

*ϕ*

_{i}, are found by using Equation (8). Then, performing PCA, the eigenvectors that characterize the face space are computed, and consequently, the set of weights, Ω, are determined by using Equation (14). Figure 7 outlines the steps involved for encoding the training data sets and obtaining the vectors of weights, Ω.

Trapezoidal channels: 2

*b*/*h*, 2*bh*/*A*, Fr, ReCircular channels:

*(h**+**t*)/*D*,*Q*/(*VD*^{2}), Fr, Re

*h*is the water depth,

*V*is the mean velocity,

*Q*represents discharge, and Fr and Re are the Froude and Reynolds numbers, respectively. The bottom width for trapezoidal channels is represented by 2b whilst t/D is the base height to diameter ratio for circular channels. In order to relate weights and non-dimensional attributes, a simple regression model can be established:where

*ω*

_{i}represents a regression estimation of the

*i*-th weighting factor,

*x*

_{ij}is the

*j*-th non-dimensional hydraulic/geometric parameter of the

*i-th*training experiment and

*β*are regression parameters. The solution to Equation (17) is given by:where is the best estimator vector of the target

_{j}*β*, factors

**represents the matrix of hydraulic/geometric non-dimensional parameters, and**

*X**Ω*is the vector of target weighting factors. Once the set of

*β*factors is determined, the prediction model can be established. After investigating a number of non-dimensional attributes, the ones listed above were found to strongly influence the established relationship with the target weighting factors.

Figure 7 shows the steps involved in the process of encoding the original data series. This process can be reversed to obtain a new set of weighting values, *ω _{ɩ}*, e.g. for test cases not included in the training set. Equation (17) enables finding those weighting factors which can then be stored in the short-dimension vector (

*Ω*′). Following, the vector

*Γ′*can be predicted by applying Equation (16).

To help the reader better understand the entire modelling process, a simple step-by-step guide to using the proposed approach is presented in Appendix B (available with the online version of this paper). Furthermore, a copy of the code written in C ++ is available at: http://shear-stress-using-face-recog.sourceforge.net.

## RESULTS

### Trapezoidal channels

*β*regression parameters were then obtained using Equation (18) and the vector of estimated weights for the unseen test case, Ω′ was constructed. All weighting and

*β*factors are provided in Appendix C (available with the online version of this paper). Subsequently, Equation (16) was used to obtain the Г′ vector, and its elements were transformed to find non-dimensionalized shear stress values across the channel. It should be noted that the first component of the Г′ vector, i.e.

*e*

_{11}, which corresponds to the boundary shear stress value at

*Pd*= 0 is always zero. The boundary shear stress at this point was obtained through a separate linear regression between the predicted values of local shear stresses:where

*τ*represents a regression estimation of the

_{0i}*i*-th shear stress at

*Pd*= 0,

*x*

_{ij}is the

*j*-th non-dimensional hydraulic/geometric parameter of the

*i-th*training experiment and

*β*are regression parameters.

_{j}Finally, the predicted series were rescaled by multiplying their ordinates by the estimated average shear stress obtained by the Slope method (Equation (1)). For practical purposes, this parameter could be estimated using any other prediction model, such as the ones suggested by Knight (1981), Knight *et al.* (1984, 1994) and Flintham & Carling (1988).

_{SKM}is the mean square error for the SKM model. The table also provides the ratio of those divergence parameters with respect to the average stress, in addition to the relative location of

*Δτ*along the wetted perimeter covering an interval [0-1], with 0 and 1 corresponding to the utmost left and right edges of the wetted perimeter, respectively. Furthermore, the cross correlation, , between the observed and predicted time series is presented for each case:where is the average shear stress and

_{peak}*σ*is the corresponding standard deviation.

Validation case . | Fr . | MSE
. | |Δτ| (Nm^{−2})
. | . | |Δτ_{max}| (Nm^{−2})
. | . | Relative location of |Δτ_{max}|
. | ρ_{τ}
. | MSE_{SKM}
. |
---|---|---|---|---|---|---|---|---|---|

TVc-1 | 3.2701 | 0.0448 | 0.1404 | 0.0204 | 0.7682 | 0.1118 | 1 | 0.994 | 0.6589 |

TVc-2 | 2.1934 | 0.0088 | 0.0694 | 0.0215 | 0.2340 | 0.0725 | 0.946 | 0.984 | 0.1107 |

TVc-3 | 0.5926 | 0.0004 | 0.0167 | 0.0249 | 0.0641 | 0.0958 | 1 | 0.975 | 0.0008 |

Validation case . | Fr . | MSE
. | |Δτ| (Nm^{−2})
. | . | |Δτ_{max}| (Nm^{−2})
. | . | Relative location of |Δτ_{max}|
. | ρ_{τ}
. | MSE_{SKM}
. |
---|---|---|---|---|---|---|---|---|---|

TVc-1 | 3.2701 | 0.0448 | 0.1404 | 0.0204 | 0.7682 | 0.1118 | 1 | 0.994 | 0.6589 |

TVc-2 | 2.1934 | 0.0088 | 0.0694 | 0.0215 | 0.2340 | 0.0725 | 0.946 | 0.984 | 0.1107 |

TVc-3 | 0.5926 | 0.0004 | 0.0167 | 0.0249 | 0.0641 | 0.0958 | 1 | 0.975 | 0.0008 |

### Circular channels

Validation case . | Fr . | MSE
. | |Δ̄τ| (Nm^{−2})
. | . | |Δτ_{max}| (Nm^{−2})
. | . | Relative location of |Δτ_{max}|
. | ρ_{τ}
. |
---|---|---|---|---|---|---|---|---|

CVc-1 | 0.696 | 0.0012 | 0.0245 | 0.0448 | 0.1239 | 0.2264 | 1 | 0.922 |

CVc-2 | 1.420 | 0.0494 | 0.1710 | 0.0512 | 0.7048 | 0.2110 | 1 | 0.944 |

Validation case . | Fr . | MSE
. | |Δ̄τ| (Nm^{−2})
. | . | |Δτ_{max}| (Nm^{−2})
. | . | Relative location of |Δτ_{max}|
. | ρ_{τ}
. |
---|---|---|---|---|---|---|---|---|

CVc-1 | 0.696 | 0.0012 | 0.0245 | 0.0448 | 0.1239 | 0.2264 | 1 | 0.922 |

CVc-2 | 1.420 | 0.0494 | 0.1710 | 0.0512 | 0.7048 | 0.2110 | 1 | 0.944 |

## SUMMARY AND CONCLUSIONS

RP analysis and Eigenface for Recognition were used to predict the distribution of boundary shear stress in trapezoidal and circular channels. In this approach, first, the RPs of all training set members are constructed and the differences between them and the average RPs are computed. PCA is then performed and weight factors proportional to the eigenvectors are obtained. To obtain predictions of boundary shear stress, a simple regression equation is established to relate the weight factors to non-dimensional attributes of the training set's hydraulic and geometric characteristics. For each validation case, corresponding weights are obtained by the regression equations, and the reverse of the process is performed to obtain the distribution of the boundary shear stress.

The method was applied to two trapezoidal data sets and one circular data set. The results showed that: The technique is capable of capturing the intrinsic patterns of the RPs which makes it suitable for the prediction of shear stress distributions; the method is valid for both sub- and supercritical flow condition; the average error obtained across all predicted series is 2.09% and the cross correlation is within 92% of accuracy for all trapezoidal and circular verification cases.

The accuracy of the predictions was found to be somewhat higher for trapezoidal channels compared to circular channels. This can be a reflection of the consistency of the input information which in the case of circular channels is less, i.e. the distributions are less uniform. The variation of the shape of the wetted section with the increase of the water level appears to be the reason of such variability.

The present investigation was based on a database formed by a limited number of experimental test cases, particularly for the case of circular sections. Nevertheless, the prediction results were satisfactory. The robustness of the methodology should be further tested with a larger training database containing further combinations of hydraulic parameters and section dimensions, and additional validation cases. This would help to ensure the generality of the weighting factors, and therefore, the overall accuracy of the prediction models. Based on the analysis presented here it is clear that the method works for a relatively low number of input data series which in this research ranged between 5 and 13 data series in the clusters. Furthermore, the linear regression models were demonstrated to be adequate estimators for the relatively smooth bed shear stresses studied, as they were able to capture the rates of shear stress variation with accuracy. It is notable that such simple estimators might not be accurate when modelling more complex configurations and patterns, and a more robust estimator (e.g. artificial neural network) may be more suitable depending on the degree of non-linearity observed in the input data.