Artificial Neural Network and fuzzy ARTMAP Case Study

CHAPTER THREE

 

1         METHODOLOGY

The aim of this chapter is an introduction to the two intelligent methods of artificial neural network and fuzzy ARTMAP. Moreover, a summary of case study is presented. Finally, the process of this project is illustrated in a flowchart.

1.1        Time Series Prediction

A time series is a sequence of data points measured over time such as vectors, x(t), t = 0, 1, . . . , where t represents elapsed time. For simplicity here only sequences of scalars is considered, although the techniques considered generalize readily to vector series. Theoretically, x may be a value which varies continuously with t, such as a temperature. In practice, for any given physical system, x will be sampled to give a series of discrete data points, equally spaced in time [32].

Data sequence is created by measuring input and output parameters of a process at discrete, regular time intervals. Process may also receive input in every time step, which affects its future behavior or it may be purely generative receiving no input at all. A model which is introduced to used depending on receiving data. Some of popular linear models are Finite Impulse Response (FIR), AutoRegressive eXogenous (ARX), and AutoRegressive Moving Average eXogenous (ARMAX) model [33]. FIR model is the simplest input-output relationship with a polynomial in the backward shift operator

q-1for input and a white noise error term. ARX model is created when an AutoRegressive part add to FIR model. A further extension is obtained when the error term is modeled as a moving average of white noise. Due to the Moving Average part, it will be called an ARMAX model. In the following the structure of ARX model is described.

  • ARX Model: Another simple input–output relationship and extended with a noise term, is given by the linear difference equation

yt+a1yt-1+…+anayt-na=b1ut-1+…+bnbu(t-nb)+et                                                                      (3.1)

Rewriting Equation (3.1) in transfer-function form gives:

A(q)yt=B(q)u(t)+et                                                       (3.2)

where

Aq=∑k=0naakq-k=a0+a1q-1+a2q-2+…+anaq-nawith

a0=1, and again

Bq=∑k=1nbbkq-k=b1q-1+b2q-2+…+bnbq-nb. The adjustable parameters are in this structure:

θ=[ a1 a2… ana  b1 b2…bnb]T                                                  (3.3)

Notice that Equation (3.2) has an AutoRegressive part A(q)y(t) and an eXogenous part B(q)u(t). Therefore, this model structure is also indicated as an ARX model, which can be rewritten in explicit form as

yt=B(q)A(q)u(t)+1A(q)et                                                           (3.4)

More specifically, ARX model structures are also denoted as ARX (na, nb, nk), where nk indicates the number of sampling intervals related to dead time. Consequently, in case of dead time b1 = ··· = bnk = 0.

A special case is obtained when na = 0, which reduces the ARX to an FIR model structure.

The objective is to forecast future values of the time series from values of x up to the current time. Formally this can be stated as: find a function F: RN →R to predict x at time t + d, from the N time samples of x from time t-N+1 to t. In this project, a combination of the ARX model and neural network make a nonlinear model which is used to monitor the condition of a gas turbine.

1.2        Artificial Neural Network

Artificial Intelligence (AI) or computational intelligence is a computer program that demonstrates intelligent behavior. The purpose is to artificially resemble a brain’s capacity to learn draw conclusions, plan, solve problems, and etc.

ANN is a non-linear data modeling tool mimicking the neural structure of the human brain, and it basically learns from examples. ANN can be used to solve a variety of tasks, including classification, regression, general estimations problems, etc. An ANN consists of a group of interconnected artificial neurons processing information in parallel [34].

1.2.1     Model of a Neuron

A neuron is an information processing unit that is fundamental to the operation of a neural network. Figure 3.1 shows the model of a neuron, which forms the basis for designing artificial neural networks. Four basic elements of neural network are: weight (wkj), bias (bk), adder (sum the input signals), and activation function. A signal xj at the input connected to neuron k is multiplied by the weight wkj. The first subscript refers to the neuron and the second subscript refers to the input.

In mathematical terms, we may describe the neuron k depicted in Figure 3.1 by writing the pair of equations:

vk=∑j=0mwkjxj                                                                                     (3.5)

and

yk= φ(vk)                                                                                            (3.6)

where x1, x2, …, xm are the input signals; wk1, wk2, …, wkm are the respective weights of neuron k; bk (wk0) is the bias and its input is x= +1; vk is the linear combiner output due to the input signals and bias input; φ(.) is the activation function; and yk is the output signal of the neuron.

Figure 3.1. Another nonlinear model of a neuron[35].

1.2.2     Types of Activation Function

The activation function, denoted by φ(v), defines the output of a neuron in terms of the induced local field v. It uses to limit the amplitude of the output of a neuron. Most commonly activation functions for neural network arethreshold function, linear function, sigmoid function and hyperbolic tangent function (tanh). The graphs and equations of which are represented in Table 3.1, respectively.

Table 3.1. Most commonly used activation functions.

Activation function Equation
Threshold function: 

φv=1 if v≥00 if v<0

Linear function: 

φv=purelinev=v

Sigmoid function: 

φv=11+exp⁡(-av)

Hyperbolic tangent function: 

φv=21+exp⁡(-2av)-1

1.2.3     Learning process

The progress of learning is an iterative process and involves modifying the strength of connections between the elements. The two main learning paradigms are:

  • Supervised learning, in which both inputs and desired outputs are known. This means that the network can measure its predictive performance for given inputs. It means that the network is learned with a teacher.
  • Unsupervised learning, in which the targets are unknown and the ANN has to find the underlying relationships within the data set by itself (without a teacher), and build clusters of data.

Supervised learning is used for tasks of classification and regression, whereas unsupervised learning is more suitable for data clustering, compression and filtering tasks [34].

1.2.4     Network Architectures

The manner in which the neurons of a neural network are structured is intimately linked with the learning algorithm used to train the network. Three fundamentally different types of network architectures are:

1.2.4.1   Multilayer Feedforward Networks

A type of a feedforward neural network distinguishes itself by the presence of one or more hidden layers, whose computation nodes are correspondingly called hidden neurons. The term hidden refers to the fact that this part of the neural network is not seen directly from either the input or output of the network. The function of hidden neurons is to intervene between the external input and the network output in some useful manner. By adding one or more hidden layers, the network is enabled to extract higher-order statistics from its input. Multi-layer feed-forward networks are often called Multi-Layer Perceptrons (MLP), because of their similarity to the Perceptron.

The source nodes in the input layer of the network supply respective elements of the activation pattern (input vector), which constitute the input signals applied to the neurons (computation nodes) in the second layer (i.e., the first hidden layer). The output signals of the second layer are used as inputs to the third layer, and so on for the rest of the network. Typically, the neurons in each layer of the network have as their inputs the output signals of the preceding layer only. The set of output signals of the neurons in the output (final) layer of the network constitutes the overall response of the network to the activation pattern supplied by the source nodes in the input (first) layer. The architectural graph in Figure 3.2 illustrates the layout of a multilayer feedforward neural network for the case of a single hidden layer. For the sake of brevity, the network in Figure 3.2 is referred to as a 10–4–2 network because it has 10 source nodes, four hidden neurons, and two output neurons [35]. In certain cases an MLP can have more than one hidden layer, but it has been proven that a single hidden layer is enough to approximate any continuous function provided that this layer has a sufficient number of units and that the transfer functions of these units are non-linear. Finding the sufficient number of hidden neurons is a trial-and-error process [34].

C:UserssamanehPictures7.png

Figure 3.2. Fully connected feedforward network with one hidden layer.

The neural network in Figure 3.2 is said to be fully connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer. If, however, some of the communication links are missing from the network, the network is partially connected.

1.2.4.2   Recurrent Networks

A recurrent neural network distinguishes itself from a feedforward neural network that it has at least one feedback loop. For the purpose of time series prediction, a neural network can be thought of as a general nonlinear mapping between a subset of the past time series and the future time series values. The recurrent neural network models are the Jordan Recurrent Neural Network (JRNN), the Elman Recurrent Neural Network (ERNN) and the Fully Connected Recurrent Neural Network (FCRNN) or extended Recurrent Neural Network.

The Jordan network is a multi-layered network in which there are feedback connections from the output layer to the context layer. In the structure depicted in Figure 3.3, there are no self-feedback loops in the network [27].

C:UserssamanehPictures8.PNG

Figure 3.3. Recurrent network with no self-feedback loops and no hidden neurons.

The Elman network is another class of recurrent neural networks which has three layers with feedback connections from the hidden layer to the context layer. In Figure 3.4 the structure of this model is illustrated [27].

The presence of feedback loops be it in the recurrent structure of Figure 3.3 or in that of Figure 3.4, has a profound impact on the learning capability of the network and on its performance. Moreover, the feedback loops involve the use of particular branches composed of unit-time delay elements (denoted by z-1), which result in a nonlinear dynamic behavior, assuming that the neural network contains nonlinear units.

C:UserssamanehPictures9.PNG

Figure 3.4. Recurrent network with hidden neurons.

In a FCRNN, each neuron gets feedbacks from all the neurons. That is, the outputs of all the neurons in the current iteration will be fed back to each neuron in the next iteration. Self-feedback refers to a situation where the output of a neuron is fed back into its own input.

1.2.5     Back Propagation Algorithm

The back propagation algorithm is a supervised training of multilayer perceptrons. To describe this algorithm, consider Figure 3.5, which depicts neuron j being fed by a set of function signals produced by a layer of neurons to its left.

C:UserssamanehPictures96.png

Figure 3.5. Signal-flow graph highlighting the details of output neuron j [35].

The induced local field vj(n) produced at the input of the activation function associated with neuron j is therefore

vjn=∑i=0mwji(n)yi(n)                                                                       (3.7)

where m is the total number of inputs (excluding the bias) applied to neuron j. The synaptic weight wj0 (corresponding to the fixed input y0 = +1) equals the bias bj applied to neuron j. Hence, the function signal yj(n) appearing at the output of neuron j at iteration n is

yjn=φj(vj(n))                                                                                        (3.8)

ejn=djn-yjn                                                                                 (3.9)

En=12∑jej2(n)                                                                                      (3.10)

The back-propagation algorithm applies a correction Δwji(n) to the weight wji(n), which is proportional to the partial derivative E(n)/ wji(n). According to the chain rule of calculus, this gradient is expressed as

∂E(n)∂wji(n)=∂E(n)∂ej(n)∂ej(n)∂yj(n)∂yj(n)∂vj(n)∂vj(n)∂wji(n)                                                           (3.11)

The partial derivative E(n)/ wji(n) represents a sensitivity factor, determining the direction of search in weight space for the synaptic weight wji.

Differentiating both sides of Equation (3.10) with respect to ej(n), we get:

∂E(n)∂ej(n)=ej(n)                                                                                          (3.12)

Differentiating both sides of Equation (3.9) with respect to yj(n), we get

∂ej(n)∂yj(n)=-1                                                                                             (3.13)

Next, differentiating Equation (3.8) with respect to vj(n), we get

∂yj(n)∂vj(n)=φjˊ(vj(n))                                                                                  (3.14)

where the use of prime (on the right-hand side) signifies differentiation with respect to the argument. Finally, differentiating Equation (3.7) with respect to wji(n) yields

∂vj(n)∂wji(n)=yi(n)                                                                                        (3.15)

The use of Equation (3.12) to (3.15) in Equation (3.11) yields

∂E(n)∂wji(n)=-ej(n) φjˊ(vj(n))yi(n)                                                           (3.16)

The correction Δwji(n) applied to wji(n) is defined by the delta rule, or

Δwjin=-η∂E(n)∂wji(n)                                                                              (3.17)

where η is the learning-rate parameter of the back-propagation algorithm. The use of the minus sign in Equation (3.17) accounts for gradient descent in weight space (i.e., seeking a direction for weight change that reduces the value of E(n)). Accordingly, the use of Equation (3.16) in Equation (3.17) yields

Δwjin=ηδj(n)yi(n)                                                                               (3.18)

where the local gradient δj(n) is defined by

δjn=∂E(n)∂vj(n)=∂E(n)∂ej(n)∂ej(n)∂yj(n)∂yj(n)∂vj(n)=ej(n) φjˊ(vj(n))                              (3.19)

The local gradient points to required changes in synaptic weights. According to Equation (3.19), the local gradient δj(n) for output neuron j is equal to the product of the corresponding error signal ej(n) for that neuron and the derivative φˊj(vj(n)) of the associated activation function.

The gradient-based learning method is illustrated in Figure 3.6.

Figure 3.6. A gradient-based updating of the weights [34].

From Equations (3.18) and (3.19), we note that a key factor involved in the calculation of the weight adjustment Δwji(n) is the error signal ej(n) at the output of neuron j. Two distinct cases are identified depending on where in the network neuron j is located.

  • Case 1 Neuron j Is an Output Node

When neuron j is located in the output layer of the network, it is supplied with a desired response of its own. We may use Equation (3.9) to compute the error signal ej(n) associated with this neuron. Having determined ej(n), we find it a straightforward matter to compute the local gradient j(n) by using Equation (3.19).

  • Case 2 Neuron j Is a Hidden Node

When neuron j is located in a hidden layer of the network, there is no specified desired response for that neuron. Accordingly, the error signal for a hidden neuron would have to be determined recursively and working backwards in terms of the error signals of all the neurons to which that hidden neuron is directly connected.

1.2.6     Levenberg–Marquardt Method

The Levenberg–Marquardt (LM) method is a compromise between the following two methods:

  • Newton’s method, which converges rapidly near a local or global minimum, but may also diverge.
  • Gradient Descent (GD), which is assured of convergence through a proper selection of the step-size parameter, but converges slowly.

To be specific, consider the optimization of a second-order function F(w), and let g be its gradient vector and H be its Hessian. According to the Levenberg–Marquardt method, the optimum adjustment Δw applied to the parameter vector w is defined by

Δw=[H+λI]-1g                                                                                 (3.20)

where I is the identity matrix of the same dimensions as H and λ is a regularizing, or loading, parameter that forces the sum matrix (H+λI) to be positive definite and safely well-conditioned throughout the computation.

With this background, consider a multilayer perceptron with a single output neuron. The network is trained by minimizing the cost function

Eavw=12N∑i=1N[di-Fxi;w]2                                                 (3.21)

where [x(i), d(i)] for i=1,…,N is the training sample and F(x(i); w) is the approximating function realized by the network; the synaptic weights of the network are arranged in some orderly manner to form the weight vector w. The gradient and the Hessian of the cost function Eav(w) are respectively defined by

gw=∂Eav(w)∂w=-1N∑i=1Ndi-Fxi;w∂Fxi;w∂w                     (3.22)

and

Hw=∂2Eav(w)∂w2=1N∑i=1N[∂F(xi;w)∂w]∂Fxi;w∂wT

-1N∑i=1N[di-F(xi;w)]∂2F(xi;w)∂w2                                                 (3.23)

Thus, substituting Equation (3.22) and (3.23) into Equation (3.20), the desired adjustment Δw is computed for each iteration of the Levenberg-Marquardt algorithm.

However, from a practical perspective, the computational complexity of Equation (3.23) can be demanding, particularly when the dimensionality of the weight vector w is high; the computational difficulty is attributed to the complex nature of the Hessian H(w). To mitigate this difficulty, the recommended procedure is to ignore the second term on the right-hand side of Equation (3.23), thereby approximating the Hessian simply as

Hw≈1N∑i=1N[∂F(xi;w)∂w]∂Fxi;w∂wT                                                   (3.24)

This approximation is recognized as the outer product of the partial derivative F(w, x(i))/ w with itself, averaged over the training sample; accordingly, it is referred to as the outer-product approximation of the Hessian. The use of this approximation is justified when the Levenberg-Marquardt algorithm is operating in the neighborhood of a local or global minimum.

Clearly, the approximate version of the Levenberg–Marquardt algorithm, based on the gradient vector of Equation (3.22) and the Hessian of Equation (3.24), is a first-order method of optimization that is well suited for nonlinear least-squares estimation problems. Moreover, because of the fact that both of these equations involve averaging over the training sample, the algorithm is of a batch form.

The regularizing parameter λ plays a critical role in the way the Levenberg-Marquardt algorithm functions. If we set λ equal to zero, then the formula of Equation (3.20) reduces to Newton’s method. On the other hand, if we assign a large value to λ such that overpowers the Hessian H, the Levenberg-Marquardt algorithm functions effectively as a gradient descent method. From these two observations, it follows that at each iteration of the algorithm, the value assigned to λ should be just large enough to maintain the sum matrix (H+λI) in its positive-definite form. In specific terms, the recommended Marquardt recipe for the selection of is as follows:

  1. Compute Eav(w) at iteration n-1.
  2. Choose a modest value for λ, say λ= 10-3.
  3. Solve Equation (3.20) for the adjustment Δw at iteration n and evaluate Eav(w+Δw).
  4. If Eav(w+Δw) ≥ Eav(w) increase λ by a factor of 10 (or any other substantial factor) and go back to step 3.
  5. If, on the other hand, Eav(w+Δw) < Eav(w) decrease λ by a factor of 10, update the trial solution  w+Δw, and go back to step 3.

1.3        Adaptive Resonance Theory (ART)

Adaptive Resonance Theory (ART) architectures are neural networks that carry out stable self-organization of recognition codes for arbitrary sequences of input patterns [36]. ART first emerged from an analysis of the instabilities inherent in feed forward adaptive coding structures [37]. ART1 is self-organizes recognition categories for arbitrary sequences of binary input patterns [38]. Next ART2 was introduced for analog inputs [39]. A Fuzzy Adaptive Resonance Theory (ART) model capable of rapid stable learning of recognition categories in response to arbitrary sequences of analog or binary input patterns was introduced by Carpenter [40]. Fuzzy ART incorporates computations from fuzzy set theory into the ART1 neural network, which learns to categorize only binary input patterns. Figure 3.7 shows the fuzzy ART block diagram. The generalization to learning both analog and binary input patterns is achieved by replacing appearances of the intersection operator (∩) in ART1 by the MIN operator (˄) of fuzzy set theory. The MIN operator reduces to the intersection operator in the binary case. Category proliferation is prevented by, normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (˄) and the MAX operator (˅) of fuzzy set theory play complementary roles [40]. After that Fuzzy ARTMAP was developed [41] and is a generalized ARTMAP system that learns to classify inputs by a fuzzy set of features, or a pattern of fuzzy membership’s values between 0 and 1, which indicate the extent that each feature is present.

Figure 3.7. A block diagram of the Fuzzy ART architecture [42].

1.3.1     Fuzzy ARTMAP

The Fuzzy ARTMAP generally consists of two fuzzy ART modules, ARTa and ARTb, which create stable recognition categories in response to arbitrary sequences of input patterns. The two modules are linked together by a map field which is a learning network and internal controller that ensures autonomous system operation in real time [43]. The overall architecture of fuzzy ARTMAP is shown in Figure 3.8.

Figure 3.8. Fuzzy ARTMAP architecture [43].

The ARTa and ARTb produce compressed recognition codes to represent the class of their input vector a and b, respectively. Vector a is the measured data vector and vector b is the prediction of a [44].

Fuzzy ART (ARTa or ARTb) module has three different layers: F0 , F1 and F2 consisting of nodes. Input layer F0 nodes represent a current input vector and F0 activity vector is denoted by A = (a1, . . ., aM ), with each component ai in the interval [0,1], i = 1, . . .,M.

The proliferation of categories in Fuzzy ART is avoided if the inputs are normalized using the method of complement coding. Thus, the complement coded input A to the field F1 is the 2M dimensional vector

A=a, ac                                                                                                  (3.25)

Where

aic=1-ai                                                                                                 (3.26)

Weight vector w connects the F1 layer nodes to the output layer F2 nodes. For each F2 category node j (j = 1, . . .N), there is a weight vector associated with layer of F1 nodes, wj = (wj1, . . . , wj2M ) of adaptive weights. The initial condition of all weights is

wj1(0)=…=wj2m(0)=1                                                                   (3.27)

which defines that each category is uncommitted.

For the input vector A and each node j in the F2 layer, the choice function Tj is computed

Tj(A)=⃓A˄wj⃓α+⃓wj⃓                                                                                       (3.28)

where ˄ is the fuzzy AND operator and defined by

(p˄q)i=minpi, qi                                                                                  (3.29)

and ⃓ .⃓ is the norm operator in fuzzy theory which defined by

⃓ p⃓= ∑i=1M⃓ p⃓                                                                                      (3.30)

for any M-dimensional vectors p and q. α (α > 0) is choice parameter. For simplicity, let, Tj(A) in Equation (3.28) be denoted as Tj when the input A is fixed. A category choice is made when one F2 node becomes active at a given time. The category choice is indexed by J, where

TJ=max⁡(Tj:j=1,…,N)                                                                           (3.31)

When more than one Tj is maximal, the category with the smallest index is chosen.

Resonance occurs if the match function, ⃓ A ^ WJ⃓ / ⃓ A ⃓ of the chosen category meets the vigilance criterion:

⃓ A˄WJ⃓⃓ A⃓≥ρ                                                                                                  (3.32)

where ρ is vigilance parameter. With resonance, learning starts, as explained below. Mismatch reset occurs if condition in Equation (3.32) is not met and then the value of the choice function TJ is set to 0 and a new index J is chosen by (3.31).

The search process continues until the chosen J satisfies (3.32). If the existence node in F2 is not satisfied the condition, the new node in F2 is created. Once the search is completed, the weight vector WJ is updated according to the equation

wJ(new)= β A ˄ wJold+(1-β)wJ(old)                                            (3.33)

where β ϵ [0,1] but for fast learning corresponds to setting β = 1.

The same search process happens in ARTb to find a prototype node in F2b that best matches B.

The map field Fab is activated whenever one of the ARTa or ARTb categories is active. If node J of F2a is chosen, then its weights wJab activate Fab. If node K in F2b is active, then the node K in Fab is activated by 1-to-1 pathways between F2b and Fab. If both ARTa and ARTb are active, then Fab becomes active only if ARTa predicts the same category as ARTb via the weights wJab. xab = 0 if the prediction wJab is disconfirmed by yb. Such a mismatch event triggers an ARTa search for a better category, as follows.

At the start of each input presentation the ARTa vigilance parameter ρa equals baseline vigilance. The map field vigilance parameter is ρab. If

⃓xab⃓<ρab⃓vb⃓                                                                                 (3.34)

then ρa is increased until it is slightly larger than ⃓A ˄WJa⃓⃓ A⃓-1, where A is the input to F1a, in complement coding form. Then

⃓xa⃓= ⃓A˄wJa ⃓<ρa⃓A⃓                                                                    (3.35)

where J is the index of the active F2a node, as in (3.32). When this occurs, ARTa search leads either to activation of another F2a node J with

⃓xa⃓= ⃓A˄wJa ⃓≥ρa⃓A⃓                                                                    (3.36)

and

⃓xab⃓= ⃓vb˄wJab ⃓≥ρab⃓vb⃓                                                            (3.37)

or, if no such node exists, to the shutdown of F2a for the remainder of the input presentation [41].

Learning rules determine how the map field weights wjkab change through time, as follows. Weights wjkab in F2a → Fab paths initially satisfy

wjkab(0)=1                                                                                             (3.38)

During resonance with the ART, category J active, wJab approaches the map field vector xab. With fast learning, once J learns to predict the ARTb category K, that association is permanent; i.e., wJKab = 1 for all time [41].

1.4        Brief Description of Industrial Gas Turbine

The case study machine investigated in this study is a Ruston ta 1500 gas turbine which is made in the United States. This turbine is used for transmission the oil from plant to Omidiyeh booster and enhanced oil pressure from 14psi to 200psi. The compressor part is axial flow type and includes 15 stages. Nine stages are fixed blades and six stages are guidance blades. It has 18 thermocouples to measure temperature in different parts. Inlet systems of this machine are 80. Inlet gas pressure is 32bar with fixed input flow rate. Annular type combustion chamber utilizes a lean premix combustion system and is designed for operation on natural gas fuel. It is a High Pressure Turbine (HPT) which extracts energy from the gas stream to drive the compressor. Power turbine consists of two stage blades. The generated power of this machine is 8.3-9MW.

CBM is employed to this gas turbine based on vibration analysis on compressor shaft. In order to perform CBM, vibration and speed of shaft is needed. Range of speed and vibration for this gas turbine is:

Speed ϵ (13600rpm, 14500rpm)

Vibration ϵ (0.07mm, 0.22mm]

The amount of 0.22 for vibration is a threshold value. The gas turbine is shut down if vibration increases over 0.22. It is a threshold value to classify normal condition and faulty condition. When vibration value is greater than 0.22, it shows a fault condition in the gas turbine.

1.5        General Flow Chart of the Project

Figure 3.9 shows a general scheme of project. The models and the process of current study are illustrated in flow chart.

Generally, a condition monitoring program performs in three steps [4]: Data acquisition, Data processing, and Decision-Making (see Figure 2.3). According to Figure 3.9, the first step of condition monitoring is an acquisition of appropriate data. The collected vibration data is used in the time domain to predict the next step state. The sampling time of present data is two hours which is a large value and some information loses in this way. In order to reduce sampling time range and improve information, an interpolation operation is used. In the last part (decision-making), data is divided into train set and test set which both sets consist of normal and faulty condition. Recurrent Neural Network and Fuzzy ARTMAP are two intelligent methods which are used to predict condition of compressor of a gas turbine. Results of each model interpret condition of machine system and recommend if maintenance action is necessary.

Figure 3.9. General scheme of the project.

CHAPTER FOUR

2         RESULTS AND DISCUSSION

The object of this chapter is to find a model to monitor the condition of gas turbine. Two conditions are investigated: Normal and Faulty conditions. The neural network and fuzzy ARTMAP are used to monitor these conditions.

2.1        Jordan Recurrent Neural Network

Multilayer perceptrons (MLP) network with feedback from output layer is designed. In this model, inputs are current speed and its delayed values (u(t), u(t-1), …) and delayed values of vibration (y(t-1), y(t-2), …) to predict current vibration (one step-ahead prediction). Range of speed and vibration for this gas turbine is:

Speed ϵ (13600rpm, 14500rpm)

Vibration ϵ (0.07mm, 0.22mm]

The amount of 0.22 for vibration is a threshold value. The gas turbine is shut down if vibration increases over 0.22. It is a threshold value to predict if it is a normal condition or faulty condition. When vibration value is greater than 0.22, it shows a fault condition in the gas turbine.

Use of delayed values of vibration (output feedback) make Jordan recurrent neural network (JRNN) model. Figure 4.1 illustrates the whole scheme of open loop JRNN model.

C:UserssamanehPictures42.PNG

Figure 4.1. An open loop JRNN.

Number of hidden layer nodes and delays for both external input (speed) and feedback input (vibration) are changed by trial and error to reach optimum values. The optimum values and a summary of model information are given in Table 4.1. The performance criterion is Mean Square Error (MSE). Activation function of hidden layer and output layer is hyperbolic tangent (tanh) and linear respectively. There is one hidden layer in present network. Size of hidden layer nodes is 10. The model trained with back-propagation and the training algorithm of Levenberg– Marquardt (LM).

Table 4.1. Information of Jordan recurrent neural network.

Parameter Information
Optimum external input delay 3
Optimum feedback delay 3
Optimum hidden layer node 10
Hidden layer activation function tanh function
Output layer activation function Linear function
Training method LM
Sampling time range of data 2 hour
Performance measure MSE

The JRNN is run for operational data. Figure 4.2 shows one step-ahead prediction for train, validation and test data in JRNN which their amount are 80%, 10%, and 10% respectively. The general MSE is 6.6568e-04 for this model. The performance compared with the range of vibration is not acceptable. The main cause of this prediction is large value of sampling time range. In this range some of the information is lost. In Figure 4.3, the accuracy of model (MSE) is presented.

Figure 4.2. One step-ahead prediction using JRNN.

In Figure 4.2, the real data is shown with red dash line and continuous blue line shows prediction value using JRNN.

Figure 4.3. Accuracy of prediction for JRNN.

The MSE values of network are presented in Table 4.2. All MSE values have large amount for vibration data.

Table 4.2. JRNN performance.

MSE Train MSE Validation MSE Test MSE ClosedLoop MSE Multistep MSE
6.6568e-04 5.7681e-04 0.0011 9.7165e-04 6.9016e-04 0.0011

Reducing the sampling time range is recommended to improve performance of prediction. To reduce sampling time range, interpolation operation is used. Sampling time range is changed to 30 minute to reach better performance for prediction. Figure 4.4 shows the predicted values of vibration. The network information is presented in Table 4.3. The all information is same as the previous network except the sampling time range is changed to 30 minute.

Table 4.3. Information of Jordan recurrent neural network.

Parameter Information
Optimum external input delay 3
Optimum feedback delay 3
Optimum hidden layer node 10
Hidden layer activation function tanh function
Output layer activation function Linear function
Training method LM
Sampling time range of data 30 minute
Performance measure MSE

The JRNN is run again for operational data which its sampling time range is reduced to 30 minute. Figure 4.4 shows one step-ahead prediction for train, validation and test data in JRNN. The general MSE is decreased from 6.6568e-04 to 5.3344e-05 for JRNN model. The predicted values follow the real data very well, but as can be seen in the Figure 4.4, errors are high at peak values.

Figure 4.4. One step-ahead prediction using JRNN.

Closed loop network is used to predict vibration in long term. A long-term prediction model is needed for reliable condition monitoring. Long term prediction is used to reach more information in the future. The estimated output is replaced to real data in input layer. The general structure of closed loop network is shown in Figure 4.5.

C:UserssamanehPictures43.PNG

Figure 4.5. Closed loop JRNN.

The closed loop network for JRNN is used to predict five step-ahead. Error is increased in closed loop network. In Figure 4.6 can be seen that the predicted values cannot follow the real data. The performance of the long term prediction model is lower than the accuracy of the one step-ahead prediction model when JRNN is used.

Figure 4.6. Long term prediction using closed loop network.

Figure 4.7. Accuracy of prediction for JRNN.

In Figure 4.7, the accuracy (MSE) of JRNN model is presented. MSE values decrease when sampling time range reduces to 30 minute. It is obvious that MSE has large value in some point. The MSE values of network are presented in Table 4.4.

Table 4.4. JRNN performance.

MSE Train MSE Validation MSE Test MSE ClosedLoop MSE Multistep MSE
5.3344e-05 2.3430e-05 3.7031e-05 2.9791e-05 6.7290e-04 0.0021

2.2        Fully Connected Recurrent Neural Network

Elman network with feedback connections from hidden layer is added to JRNN to create Fully Connected Recurrent Neural Network (FCRNN). FCRNN is used to minimize the error and improve the performance. The whole structure of FCRNN is illustrated in Figure 4.8.

C:UserssamanehPictures512.PNG

Figure 4.8. The whole structure of open loop FCRNN.

The FCRNN information is given in Table 4.5. Optimum number of hidden layer nodes and delayed value for both external input (speed) and feedback input (vibration), activation function of each layer, training method, performance criterion, and sampling time range are presented in Table 4.5. There is one hidden layer in present network. Size of hidden layer nodes is 10.

Table 4.5. Information of FCRNN.

Parameter Information
Optimum external input delay 2
Optimum feedback delay 2
Optimum hidden layer node 10
Hidden layer activation function tanh function
Output layer activation function Linear function
Training method LM
Sampling time range 30 minute
Performance measure MSE

In Figure 4.9, the real data is shown with red dash line and continuous blue line shows prediction value using FCRNN. The predicted values follow the real data very well. Errors at peak values are decreased and the performance is improved for FCRNN. The performance is decreased from 5.3344e-05 in JRNN to 1.7409e-05 in FCRNN.

Professor

You must be logged in to post a comment