CHAPTER THREE
1 METHODOLOGY
The aim of this chapter is an introduction to the two intelligent methods of artificial neural network and fuzzy ARTMAP. Moreover, a summary of case study is presented. Finally, the process of this project is illustrated in a flowchart.
1.1 Time Series Prediction
A time series is a sequence of data points measured over time such as vectors, x(t), t = 0, 1, . . . , where t represents elapsed time. For simplicity here only sequences of scalars is considered, although the techniques considered generalize readily to vector series. Theoretically, x may be a value which varies continuously with t, such as a temperature. In practice, for any given physical system, x will be sampled to give a series of discrete data points, equally spaced in time [32].
Data sequence is created by measuring input and output parameters of a process at discrete, regular time intervals. Process may also receive input in every time step, which affects its future behavior or it may be purely generative receiving no input at all. A model which is introduced to used depending on receiving data. Some of popular linear models are Finite Impulse Response (FIR), AutoRegressive eXogenous (ARX), and AutoRegressive Moving Average eXogenous (ARMAX) model [33]. FIR model is the simplest inputoutput relationship with a polynomial in the backward shift operator
q1for input and a white noise error term. ARX model is created when an AutoRegressive part add to FIR model. A further extension is obtained when the error term is modeled as a moving average of white noise. Due to the Moving Average part, it will be called an ARMAX model. In the following the structure of ARX model is described.
 ARX Model: Another simple input–output relationship and extended with a noise term, is given by the linear difference equation
yt+a1yt1+…+anaytna=b1ut1+…+bnbu(tnb)+et (3.1)
Rewriting Equation (3.1) in transferfunction form gives:
A(q)yt=B(q)u(t)+et (3.2)
where
Aq=∑k=0naakqk=a0+a1q1+a2q2+…+anaqnawith
a0=1, and again
Bq=∑k=1nbbkqk=b1q1+b2q2+…+bnbqnb. The adjustable parameters are in this structure:
θ=[ a1 a2… ana b1 b2…bnb]T (3.3)
Notice that Equation (3.2) has an AutoRegressive part A(q)y(t) and an eXogenous part B(q)u(t). Therefore, this model structure is also indicated as an ARX model, which can be rewritten in explicit form as
yt=B(q)A(q)u(t)+1A(q)et (3.4)
More speciﬁcally, ARX model structures are also denoted as ARX (n_{a}, n_{b}, n_{k}), where n_{k} indicates the number of sampling intervals related to dead time. Consequently, in case of dead time b_{1} = ··· = b_{nk} = 0.
A special case is obtained when n_{a} = 0, which reduces the ARX to an FIR model structure.
The objective is to forecast future values of the time series from values of x up to the current time. Formally this can be stated as: ﬁnd a function F: R^{N} →R to predict x at time t + d, from the N time samples of x from time tN+1 to t. In this project, a combination of the ARX model and neural network make a nonlinear model which is used to monitor the condition of a gas turbine.
1.2 Artificial Neural Network
Artificial Intelligence (AI) or computational intelligence is a computer program that demonstrates intelligent behavior. The purpose is to artificially resemble a brain’s capacity to learn draw conclusions, plan, solve problems, and etc.
ANN is a nonlinear data modeling tool mimicking the neural structure of the human brain, and it basically learns from examples. ANN can be used to solve a variety of tasks, including classification, regression, general estimations problems, etc. An ANN consists of a group of interconnected artificial neurons processing information in parallel [34].
1.2.1 Model of a Neuron
A neuron is an information processing unit that is fundamental to the operation of a neural network. Figure 3.1 shows the model of a neuron, which forms the basis for designing artificial neural networks. Four basic elements of neural network are: weight (w_{kj}), bias (b_{k}), adder (sum the input signals), and activation function. A signal x_{j} at the input connected to neuron k is multiplied by the weight w_{kj}. The first subscript refers to the neuron and the second subscript refers to the input.
In mathematical terms, we may describe the neuron k depicted in Figure 3.1 by writing the pair of equations:
vk=∑j=0mwkjxj (3.5)
and
yk= φ(vk) (3.6)
where x_{1}, x_{2}, …, x_{m} are the input signals; w_{k1}, w_{k2}, …, w_{km} are the respective weights of neuron k; b_{k} (w_{k0}) is the bias and its input is x_{0 }= +1; v_{k} is the linear combiner output due to the input signals and bias input; φ(.) is the activation function; and y_{k} is the output signal of the neuron.
Figure 3.1. Another nonlinear model of a neuron[35].
1.2.2 Types of Activation Function
The activation function, denoted by φ(v), defines the output of a neuron in terms of the induced local field v. It uses to limit the amplitude of the output of a neuron. Most commonly activation functions for neural network arethreshold function, linear function, sigmoid function and hyperbolic tangent function (tanh). The graphs and equations of which are represented in Table 3.1, respectively.
Table 3.1. Most commonly used activation functions.
Activation function  Equation 
Threshold function:
φv=1 if v≥00 if v<0 

Linear function:
φv=purelinev=v 

Sigmoid function:
φv=11+exp(av) 

Hyperbolic tangent function:
φv=21+exp(2av)1 
1.2.3 Learning process
The progress of learning is an iterative process and involves modifying the strength of connections between the elements. The two main learning paradigms are:
 Supervised learning, in which both inputs and desired outputs are known. This means that the network can measure its predictive performance for given inputs. It means that the network is learned with a teacher.
 Unsupervised learning, in which the targets are unknown and the ANN has to find the underlying relationships within the data set by itself (without a teacher), and build clusters of data.
Supervised learning is used for tasks of classification and regression, whereas unsupervised learning is more suitable for data clustering, compression and filtering tasks [34].
1.2.4 Network Architectures
The manner in which the neurons of a neural network are structured is intimately linked with the learning algorithm used to train the network. Three fundamentally different types of network architectures are:
1.2.4.1 Multilayer Feedforward Networks
A type of a feedforward neural network distinguishes itself by the presence of one or more hidden layers, whose computation nodes are correspondingly called hidden neurons. The term hidden refers to the fact that this part of the neural network is not seen directly from either the input or output of the network. The function of hidden neurons is to intervene between the external input and the network output in some useful manner. By adding one or more hidden layers, the network is enabled to extract higherorder statistics from its input. Multilayer feedforward networks are often called MultiLayer Perceptrons (MLP), because of their similarity to the Perceptron.
The source nodes in the input layer of the network supply respective elements of the activation pattern (input vector), which constitute the input signals applied to the neurons (computation nodes) in the second layer (i.e., the first hidden layer). The output signals of the second layer are used as inputs to the third layer, and so on for the rest of the network. Typically, the neurons in each layer of the network have as their inputs the output signals of the preceding layer only. The set of output signals of the neurons in the output (final) layer of the network constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input (first) layer. The architectural graph in Figure 3.2 illustrates the layout of a multilayer feedforward neural network for the case of a single hidden layer. For the sake of brevity, the network in Figure 3.2 is referred to as a 10–4–2 network because it has 10 source nodes, four hidden neurons, and two output neurons [35]. In certain cases an MLP can have more than one hidden layer, but it has been proven that a single hidden layer is enough to approximate any continuous function provided that this layer has a sufficient number of units and that the transfer functions of these units are nonlinear. Finding the sufficient number of hidden neurons is a trialanderror process [34].
Figure 3.2. Fully connected feedforward network with one hidden layer.
The neural network in Figure 3.2 is said to be fully connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer. If, however, some of the communication links are missing from the network, the network is partially connected.
1.2.4.2 Recurrent Networks
A recurrent neural network distinguishes itself from a feedforward neural network that it has at least one feedback loop. For the purpose of time series prediction, a neural network can be thought of as a general nonlinear mapping between a subset of the past time series and the future time series values. The recurrent neural network models are the Jordan Recurrent Neural Network (JRNN), the Elman Recurrent Neural Network (ERNN) and the Fully Connected Recurrent Neural Network (FCRNN) or extended Recurrent Neural Network.
The Jordan network is a multilayered network in which there are feedback connections from the output layer to the context layer. In the structure depicted in Figure 3.3, there are no selffeedback loops in the network [27].
Figure 3.3. Recurrent network with no selffeedback loops and no hidden neurons.
The Elman network is another class of recurrent neural networks which has three layers with feedback connections from the hidden layer to the context layer. In Figure 3.4 the structure of this model is illustrated [27].
The presence of feedback loops be it in the recurrent structure of Figure 3.3 or in that of Figure 3.4, has a profound impact on the learning capability of the network and on its performance. Moreover, the feedback loops involve the use of particular branches composed of unittime delay elements (denoted by z^{1}), which result in a nonlinear dynamic behavior, assuming that the neural network contains nonlinear units.
Figure 3.4. Recurrent network with hidden neurons.
In a FCRNN, each neuron gets feedbacks from all the neurons. That is, the outputs of all the neurons in the current iteration will be fed back to each neuron in the next iteration. Selffeedback refers to a situation where the output of a neuron is fed back into its own input.
1.2.5 Back Propagation Algorithm
The back propagation algorithm is a supervised training of multilayer perceptrons. To describe this algorithm, consider Figure 3.5, which depicts neuron j being fed by a set of function signals produced by a layer of neurons to its left.
Figure 3.5. Signalflow graph highlighting the details of output neuron j [35].
The induced local field v_{j}(n) produced at the input of the activation function associated with neuron j is therefore
vjn=∑i=0mwji(n)yi(n) (3.7)
where m is the total number of inputs (excluding the bias) applied to neuron j. The synaptic weight w_{j0} (corresponding to the fixed input y_{0} = +1) equals the bias b_{j} applied to neuron j. Hence, the function signal y_{j}(n) appearing at the output of neuron j at iteration n is
yjn=φj(vj(n)) (3.8)
ejn=djnyjn (3.9)
En=12∑jej2(n) (3.10)
The backpropagation algorithm applies a correction Δw_{ji}(n) to the weight w_{ji}(n), which is proportional to the partial derivative E(n)/ w_{ji}(n). According to the chain rule of calculus, this gradient is expressed as
∂E(n)∂wji(n)=∂E(n)∂ej(n)∂ej(n)∂yj(n)∂yj(n)∂vj(n)∂vj(n)∂wji(n) (3.11)
The partial derivative E(n)/ w_{ji}(n) represents a sensitivity factor, determining the direction of search in weight space for the synaptic weight w_{ji}.
Differentiating both sides of Equation (3.10) with respect to e_{j}(n), we get:
∂E(n)∂ej(n)=ej(n) (3.12)
Differentiating both sides of Equation (3.9) with respect to y_{j}(n), we get
∂ej(n)∂yj(n)=1 (3.13)
Next, differentiating Equation (3.8) with respect to v_{j}(n), we get
∂yj(n)∂vj(n)=φjˊ(vj(n)) (3.14)
where the use of prime (on the righthand side) signifies differentiation with respect to the argument. Finally, differentiating Equation (3.7) with respect to w_{ji}(n) yields
∂vj(n)∂wji(n)=yi(n) (3.15)
The use of Equation (3.12) to (3.15) in Equation (3.11) yields
∂E(n)∂wji(n)=ej(n) φjˊ(vj(n))yi(n) (3.16)
The correction Δw_{ji}(n) applied to w_{ji}(n) is defined by the delta rule, or
Δwjin=η∂E(n)∂wji(n) (3.17)
where η is the learningrate parameter of the backpropagation algorithm. The use of the minus sign in Equation (3.17) accounts for gradient descent in weight space (i.e., seeking a direction for weight change that reduces the value of E(n)). Accordingly, the use of Equation (3.16) in Equation (3.17) yields
Δwjin=ηδj(n)yi(n) (3.18)
where the local gradient δ_{j}(n) is defined by
δjn=∂E(n)∂vj(n)=∂E(n)∂ej(n)∂ej(n)∂yj(n)∂yj(n)∂vj(n)=ej(n) φjˊ(vj(n)) (3.19)
The local gradient points to required changes in synaptic weights. According to Equation (3.19), the local gradient δ_{j}(n) for output neuron j is equal to the product of the corresponding error signal e_{j}(n) for that neuron and the derivative φˊ_{j}(v_{j}(n)) of the associated activation function.
The gradientbased learning method is illustrated in Figure 3.6.
Figure 3.6. A gradientbased updating of the weights [34].
From Equations (3.18) and (3.19), we note that a key factor involved in the calculation of the weight adjustment Δw_{ji}(n) is the error signal ej(n) at the output of neuron j. Two distinct cases are identified depending on where in the network neuron j is located.
 Case 1 Neuron j Is an Output Node
When neuron j is located in the output layer of the network, it is supplied with a desired response of its own. We may use Equation (3.9) to compute the error signal e_{j}(n) associated with this neuron. Having determined e_{j}(n), we find it a straightforward matter to compute the local gradient j(n) by using Equation (3.19).
 Case 2 Neuron j Is a Hidden Node
When neuron j is located in a hidden layer of the network, there is no specified desired response for that neuron. Accordingly, the error signal for a hidden neuron would have to be determined recursively and working backwards in terms of the error signals of all the neurons to which that hidden neuron is directly connected.
1.2.6 Levenberg–Marquardt Method
The Levenberg–Marquardt (LM) method is a compromise between the following two methods:
 Newton’s method, which converges rapidly near a local or global minimum, but may also diverge.
 Gradient Descent (GD), which is assured of convergence through a proper selection of the stepsize parameter, but converges slowly.
To be specific, consider the optimization of a secondorder function F(w), and let g be its gradient vector and H be its Hessian. According to the Levenberg–Marquardt method, the optimum adjustment Δw applied to the parameter vector w is defined by
Δw=[H+λI]1g (3.20)
where I is the identity matrix of the same dimensions as H and λ is a regularizing, or loading, parameter that forces the sum matrix (H+λI) to be positive definite and safely wellconditioned throughout the computation.
With this background, consider a multilayer perceptron with a single output neuron. The network is trained by minimizing the cost function
Eavw=12N∑i=1N[diFxi;w]2 (3.21)
where [x(i), d(i)] for i=1,…,N is the training sample and F(x(i); w) is the approximating function realized by the network; the synaptic weights of the network are arranged in some orderly manner to form the weight vector w. The gradient and the Hessian of the cost function E_{av}(w) are respectively defined by
gw=∂Eav(w)∂w=1N∑i=1NdiFxi;w∂Fxi;w∂w (3.22)
and
Hw=∂2Eav(w)∂w2=1N∑i=1N[∂F(xi;w)∂w]∂Fxi;w∂wT
1N∑i=1N[diF(xi;w)]∂2F(xi;w)∂w2 (3.23)
Thus, substituting Equation (3.22) and (3.23) into Equation (3.20), the desired adjustment Δw is computed for each iteration of the LevenbergMarquardt algorithm.
However, from a practical perspective, the computational complexity of Equation (3.23) can be demanding, particularly when the dimensionality of the weight vector w is high; the computational difficulty is attributed to the complex nature of the Hessian H(w). To mitigate this difficulty, the recommended procedure is to ignore the second term on the righthand side of Equation (3.23), thereby approximating the Hessian simply as
Hw≈1N∑i=1N[∂F(xi;w)∂w]∂Fxi;w∂wT (3.24)
This approximation is recognized as the outer product of the partial derivative F(w, x(i))/ w with itself, averaged over the training sample; accordingly, it is referred to as the outerproduct approximation of the Hessian. The use of this approximation is justified when the LevenbergMarquardt algorithm is operating in the neighborhood of a local or global minimum.
Clearly, the approximate version of the Levenberg–Marquardt algorithm, based on the gradient vector of Equation (3.22) and the Hessian of Equation (3.24), is a firstorder method of optimization that is well suited for nonlinear leastsquares estimation problems. Moreover, because of the fact that both of these equations involve averaging over the training sample, the algorithm is of a batch form.
The regularizing parameter λ plays a critical role in the way the LevenbergMarquardt algorithm functions. If we set λ equal to zero, then the formula of Equation (3.20) reduces to Newton’s method. On the other hand, if we assign a large value to λ such that overpowers the Hessian H, the LevenbergMarquardt algorithm functions effectively as a gradient descent method. From these two observations, it follows that at each iteration of the algorithm, the value assigned to λ should be just large enough to maintain the sum matrix (H+λI) in its positivedefinite form. In specific terms, the recommended Marquardt recipe for the selection of is as follows:
 Compute E_{av}(w) at iteration n1.
 Choose a modest value for λ, say λ= 10^{3}.
 Solve Equation (3.20) for the adjustment Δw at iteration n and evaluate E_{av}(w+Δw).
 If E_{av}(w+Δw) ≥ E_{av}(w) increase λ by a factor of 10 (or any other substantial factor) and go back to step 3.
 If, on the other hand, E_{av}(w+Δw) < E_{av}(w) decrease λ by a factor of 10, update the trial solution w → w+Δw, and go back to step 3.
1.3 Adaptive Resonance Theory (ART)
Adaptive Resonance Theory (ART) architectures are neural networks that carry out stable selforganization of recognition codes for arbitrary sequences of input patterns [36]. ART first emerged from an analysis of the instabilities inherent in feed forward adaptive coding structures [37]. ART1 is selforganizes recognition categories for arbitrary sequences of binary input patterns [38]. Next ART2 was introduced for analog inputs [39]. A Fuzzy Adaptive Resonance Theory (ART) model capable of rapid stable learning of recognition categories in response to arbitrary sequences of analog or binary input patterns was introduced by Carpenter [40]. Fuzzy ART incorporates computations from fuzzy set theory into the ART1 neural network, which learns to categorize only binary input patterns. Figure 3.7 shows the fuzzy ART block diagram. The generalization to learning both analog and binary input patterns is achieved by replacing appearances of the intersection operator (∩) in ART1 by the MIN operator (˄) of fuzzy set theory. The MIN operator reduces to the intersection operator in the binary case. Category proliferation is prevented by, normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (˄) and the MAX operator (˅) of fuzzy set theory play complementary roles [40]. After that Fuzzy ARTMAP was developed [41] and is a generalized ARTMAP system that learns to classify inputs by a fuzzy set of features, or a pattern of fuzzy membership’s values between 0 and 1, which indicate the extent that each feature is present.
Figure 3.7. A block diagram of the Fuzzy ART architecture [42].
1.3.1 Fuzzy ARTMAP
The Fuzzy ARTMAP generally consists of two fuzzy ART modules, ARTa and ARTb, which create stable recognition categories in response to arbitrary sequences of input patterns. The two modules are linked together by a map ﬁeld which is a learning network and internal controller that ensures autonomous system operation in real time [43]. The overall architecture of fuzzy ARTMAP is shown in Figure 3.8.
Figure 3.8. Fuzzy ARTMAP architecture [43].
The ARTa and ARTb produce compressed recognition codes to represent the class of their input vector a and b, respectively. Vector a is the measured data vector and vector b is the prediction of a [44].
Fuzzy ART (ARTa or ARTb) module has three different layers: F0 , F1 and F2 consisting of nodes. Input layer F0 nodes represent a current input vector and F0 activity vector is denoted by A = (a_{1}, . . ., a_{M} ), with each component a_{i} in the interval [0,1], i = 1, . . .,M.
The proliferation of categories in Fuzzy ART is avoided if the inputs are normalized using the method of complement coding. Thus, the complement coded input A to the ﬁeld F1 is the 2M dimensional vector
A=a, ac (3.25)
Where
aic=1ai (3.26)
Weight vector w connects the F_{1} layer nodes to the output layer F_{2} nodes. For each F_{2} category node j (j = 1, . . .N), there is a weight vector associated with layer of F1 nodes, w_{j} = (w_{j1}, . . . , w_{j2M} ) of adaptive weights. The initial condition of all weights is
wj1(0)=…=wj2m(0)=1 (3.27)
which defines that each category is uncommitted.
For the input vector A and each node j in the F_{2} layer, the choice function T_{j} is computed
Tj(A)=⃓A˄wj⃓α+⃓wj⃓ (3.28)
where ˄ is the fuzzy AND operator and deﬁned by
(p˄q)i=minpi, qi (3.29)
and ⃓ .⃓ is the norm operator in fuzzy theory which deﬁned by
⃓ p⃓= ∑i=1M⃓ p⃓ (3.30)
for any Mdimensional vectors p and q. α (α > 0) is choice parameter. For simplicity, let, T_{j}(A) in Equation (3.28) be denoted as T_{j} when the input A is ﬁxed. A category choice is made when one F_{2} node becomes active at a given time. The category choice is indexed by J, where
TJ=max(Tj:j=1,…,N) (3.31)
When more than one T_{j} is maximal, the category with the smallest index is chosen.
Resonance occurs if the match function, ⃓ A ^ W_{J}⃓ / ⃓ A ⃓ of the chosen category meets the vigilance criterion:
⃓ A˄WJ⃓⃓ A⃓≥ρ (3.32)
where ρ is vigilance parameter. With resonance, learning starts, as explained below. Mismatch reset occurs if condition in Equation (3.32) is not met and then the value of the choice function T_{J} is set to 0 and a new index J is chosen by (3.31).
The search process continues until the chosen J satisﬁes (3.32). If the existence node in F_{2} is not satisfied the condition, the new node in F_{2} is created. Once the search is completed, the weight vector W_{J} is updated according to the equation
wJ(new)= β A ˄ wJold+(1β)wJ(old) (3.33)
where β ϵ [0,1] but for fast learning corresponds to setting β = 1.
The same search process happens in ARTb to find a prototype node in F_{2}^{b} that best matches B.
The map field F^{ab} is activated whenever one of the ARTa or ARTb categories is active. If node J of F_{2}^{a} is chosen, then its weights w_{J}^{ab} activate F^{ab}. If node K in F_{2}^{b} is active, then the node K in F^{ab} is activated by 1to1 pathways between F_{2}^{b} and F^{ab}. If both ARTa and ARTb are active, then F^{ab }becomes active only if ARTa predicts the same category as ARTb via the weights w_{J}^{ab}. x^{ab} = 0 if the prediction w_{J}^{ab} is disconfirmed by y^{b}. Such a mismatch event triggers an ARTa search for a better category, as follows.
At the start of each input presentation the ARTa vigilance parameter ρ^{a} equals baseline vigilance. The map field vigilance parameter is ρ^{ab}. If
⃓xab⃓<ρab⃓vb⃓ (3.34)
then ρ^{a} is increased until it is slightly larger than ⃓A ˄W_{J}^{a}⃓⃓ A⃓^{1}, where A is the input to F_{1}^{a}, in complement coding form. Then
⃓xa⃓= ⃓A˄wJa ⃓<ρa⃓A⃓ (3.35)
where J is the index of the active F_{2}^{a} node, as in (3.32). When this occurs, ARTa search leads either to activation of another F_{2}^{a} node J with
⃓xa⃓= ⃓A˄wJa ⃓≥ρa⃓A⃓ (3.36)
and
⃓xab⃓= ⃓vb˄wJab ⃓≥ρab⃓vb⃓ (3.37)
or, if no such node exists, to the shutdown of F_{2}^{a} for the remainder of the input presentation [41].
Learning rules determine how the map field weights w_{jk}^{ab} change through time, as follows. Weights w_{jk}^{ab} in F_{2}^{a} → F^{ab} paths initially satisfy
wjkab(0)=1 (3.38)
During resonance with the ART, category J active, w_{J}^{ab} approaches the map field vector x^{ab}. With fast learning, once J learns to predict the ARTb category K, that association is permanent; i.e., w_{JK}^{ab} = 1 for all time [41].
1.4 Brief Description of Industrial Gas Turbine
The case study machine investigated in this study is a Ruston ta 1500 gas turbine which is made in the United States. This turbine is used for transmission the oil from plant to Omidiyeh booster and enhanced oil pressure from 14psi to 200psi. The compressor part is axial flow type and includes 15 stages. Nine stages are fixed blades and six stages are guidance blades. It has 18 thermocouples to measure temperature in different parts. Inlet systems of this machine are 80. Inlet gas pressure is 32bar with fixed input flow rate. Annular type combustion chamber utilizes a lean premix combustion system and is designed for operation on natural gas fuel. It is a High Pressure Turbine (HPT) which extracts energy from the gas stream to drive the compressor. Power turbine consists of two stage blades. The generated power of this machine is 8.39MW.
CBM is employed to this gas turbine based on vibration analysis on compressor shaft. In order to perform CBM, vibration and speed of shaft is needed. Range of speed and vibration for this gas turbine is:
Speed ϵ (13600rpm, 14500rpm)
Vibration ϵ (0.07mm, 0.22mm]
The amount of 0.22 for vibration is a threshold value. The gas turbine is shut down if vibration increases over 0.22. It is a threshold value to classify normal condition and faulty condition. When vibration value is greater than 0.22, it shows a fault condition in the gas turbine.
1.5 General Flow Chart of the Project
Figure 3.9 shows a general scheme of project. The models and the process of current study are illustrated in flow chart.
Generally, a condition monitoring program performs in three steps [4]: Data acquisition, Data processing, and DecisionMaking (see Figure 2.3). According to Figure 3.9, the first step of condition monitoring is an acquisition of appropriate data. The collected vibration data is used in the time domain to predict the next step state. The sampling time of present data is two hours which is a large value and some information loses in this way. In order to reduce sampling time range and improve information, an interpolation operation is used. In the last part (decisionmaking), data is divided into train set and test set which both sets consist of normal and faulty condition. Recurrent Neural Network and Fuzzy ARTMAP are two intelligent methods which are used to predict condition of compressor of a gas turbine. Results of each model interpret condition of machine system and recommend if maintenance action is necessary.
CHAPTER FOUR
2 RESULTS AND DISCUSSION
The object of this chapter is to find a model to monitor the condition of gas turbine. Two conditions are investigated: Normal and Faulty conditions. The neural network and fuzzy ARTMAP are used to monitor these conditions.
2.1 Jordan Recurrent Neural Network
Multilayer perceptrons (MLP) network with feedback from output layer is designed. In this model, inputs are current speed and its delayed values (u(t), u(t1), …) and delayed values of vibration (y(t1), y(t2), …) to predict current vibration (one stepahead prediction). Range of speed and vibration for this gas turbine is:
Speed ϵ (13600rpm, 14500rpm)
Vibration ϵ (0.07mm, 0.22mm]
The amount of 0.22 for vibration is a threshold value. The gas turbine is shut down if vibration increases over 0.22. It is a threshold value to predict if it is a normal condition or faulty condition. When vibration value is greater than 0.22, it shows a fault condition in the gas turbine.
Use of delayed values of vibration (output feedback) make Jordan recurrent neural network (JRNN) model. Figure 4.1 illustrates the whole scheme of open loop JRNN model.
Figure 4.1. An open loop JRNN.
Number of hidden layer nodes and delays for both external input (speed) and feedback input (vibration) are changed by trial and error to reach optimum values. The optimum values and a summary of model information are given in Table 4.1. The performance criterion is Mean Square Error (MSE). Activation function of hidden layer and output layer is hyperbolic tangent (tanh) and linear respectively. There is one hidden layer in present network. Size of hidden layer nodes is 10. The model trained with backpropagation and the training algorithm of Levenberg– Marquardt (LM).
Table 4.1. Information of Jordan recurrent neural network.
Parameter  Information 
Optimum external input delay  3 
Optimum feedback delay  3 
Optimum hidden layer node  10 
Hidden layer activation function  tanh function 
Output layer activation function  Linear function 
Training method  LM 
Sampling time range of data  2 hour 
Performance measure  MSE 
The JRNN is run for operational data. Figure 4.2 shows one stepahead prediction for train, validation and test data in JRNN which their amount are 80%, 10%, and 10% respectively. The general MSE is 6.6568e04 for this model. The performance compared with the range of vibration is not acceptable. The main cause of this prediction is large value of sampling time range. In this range some of the information is lost. In Figure 4.3, the accuracy of model (MSE) is presented.
Figure 4.2. One stepahead prediction using JRNN.
In Figure 4.2, the real data is shown with red dash line and continuous blue line shows prediction value using JRNN.
Figure 4.3. Accuracy of prediction for JRNN.
The MSE values of network are presented in Table 4.2. All MSE values have large amount for vibration data.
MSE  Train MSE  Validation MSE  Test MSE  ClosedLoop MSE  Multistep MSE 
6.6568e04  5.7681e04  0.0011  9.7165e04  6.9016e04  0.0011 
Reducing the sampling time range is recommended to improve performance of prediction. To reduce sampling time range, interpolation operation is used. Sampling time range is changed to 30 minute to reach better performance for prediction. Figure 4.4 shows the predicted values of vibration. The network information is presented in Table 4.3. The all information is same as the previous network except the sampling time range is changed to 30 minute.
Table 4.3. Information of Jordan recurrent neural network.
Parameter  Information 
Optimum external input delay  3 
Optimum feedback delay  3 
Optimum hidden layer node  10 
Hidden layer activation function  tanh function 
Output layer activation function  Linear function 
Training method  LM 
Sampling time range of data  30 minute 
Performance measure  MSE 
The JRNN is run again for operational data which its sampling time range is reduced to 30 minute. Figure 4.4 shows one stepahead prediction for train, validation and test data in JRNN. The general MSE is decreased from 6.6568e04 to 5.3344e05 for JRNN model. The predicted values follow the real data very well, but as can be seen in the Figure 4.4, errors are high at peak values.
Figure 4.4. One stepahead prediction using JRNN.
Closed loop network is used to predict vibration in long term. A longterm prediction model is needed for reliable condition monitoring. Long term prediction is used to reach more information in the future. The estimated output is replaced to real data in input layer. The general structure of closed loop network is shown in Figure 4.5.
Figure 4.5. Closed loop JRNN.
The closed loop network for JRNN is used to predict five stepahead. Error is increased in closed loop network. In Figure 4.6 can be seen that the predicted values cannot follow the real data. The performance of the long term prediction model is lower than the accuracy of the one stepahead prediction model when JRNN is used.
Figure 4.6. Long term prediction using closed loop network.
Figure 4.7. Accuracy of prediction for JRNN.
In Figure 4.7, the accuracy (MSE) of JRNN model is presented. MSE values decrease when sampling time range reduces to 30 minute. It is obvious that MSE has large value in some point. The MSE values of network are presented in Table 4.4.
MSE  Train MSE  Validation MSE  Test MSE  ClosedLoop MSE  Multistep MSE 
5.3344e05  2.3430e05  3.7031e05  2.9791e05  6.7290e04  0.0021 
2.2 Fully Connected Recurrent Neural Network
Elman network with feedback connections from hidden layer is added to JRNN to create Fully Connected Recurrent Neural Network (FCRNN). FCRNN is used to minimize the error and improve the performance. The whole structure of FCRNN is illustrated in Figure 4.8.
Figure 4.8. The whole structure of open loop FCRNN.
The FCRNN information is given in Table 4.5. Optimum number of hidden layer nodes and delayed value for both external input (speed) and feedback input (vibration), activation function of each layer, training method, performance criterion, and sampling time range are presented in Table 4.5. There is one hidden layer in present network. Size of hidden layer nodes is 10.
Table 4.5. Information of FCRNN.
Parameter  Information 
Optimum external input delay  2 
Optimum feedback delay  2 
Optimum hidden layer node  10 
Hidden layer activation function  tanh function 
Output layer activation function  Linear function 
Training method  LM 
Sampling time range  30 minute 
Performance measure  MSE 
In Figure 4.9, the real data is shown with red dash line and continuous blue line shows prediction value using FCRNN. The predicted values follow the real data very well. Errors at peak values are decreased and the performance is improved for FCRNN. The performance is decreased from 5.3344e05 in JRNN to 1.7409e05 in FCRNN.