Probability density estimation techniques for gas turbine diagnosis.
1 Introduction
It is a standard worldwide practice to apply health monitoring systems to detect, identify, and predict gas turbine faults. The diagnostic algorithms using gas path models and measured variables constitute an important integral part of these systems. Many of the algorithms apply pattern classification techniques, mostly different artificial neural networks [1-3].
Effectiveness of the monitoring system strongly depends on accuracy of its diagnostic decisions. That is why all system algorithms including the used classification technique should be optimized.
Among the neural networks applied to diagnose gas turbines, a multilayer perceptron (MLP) is the most widely used [3]. In our previous studies [4,5], diagnostic accuracy of the perceptron and some other classification techniques has been examined. It was found that on average four techniques including the MLP provide equally good results. Thus, to choose the best technique any additional criterion is required. The ability to accompany every diagnostic decision by a confidence measure is an important property of some classification techniques that can be accepted as such a criterion.
The present paper deals with two classification methods, Parzen Windows (PW) and K-Nearest Neighbors (K-NN) described, for example, in [6]. For a given pattern, they compute probability of each considered class and then classify the pattern according to the highest probability. The class probabilities are determined through class probability densities in the point of the pattern. In their turn, the densities are estimated counting nearby patterns of each class.
The paper compares the MLP, K-NN, and some variations of the PW under different diagnostic conditions. To this end, they are embedded by turn into a special testing procedure that repeats numerous cycles of gas turbine diagnosis and finally computes an average probability of correct diagnostic decisions for each technique. The testing procedure has been developed and comparative calculations have been carried out in MATLAB® (MathWorks, Inc).
A gas turbine driver for a natural gas pumping unit has been chosen as a test case to perform the comparative calculations. A nonlinear mathematical model of this engine was employed for simulating faults and building fault classes.
The next section describes the classification techniques examined in the paper.
1. Classification techniques
Foundations of the chosen techniques can be found in many books on classification theory, for example, in [6]. The next subsections includes only their brief description required to better understand the present paper.
1.1. Multilayer perceptron
The MLP a feed-forward network i.e. signals propagate from its input to the output with no feedback. Figure 1 helps to describe this neural network.
The MLP shown in the figure consists of two principal layers: hidden
layer and output layer. The input to each neuron of the hidden layer
is a sum of
Fig. 1. Multilayer perceptron
perceptron inputs (elements of vector
)
multiplied by the weight coefficients of a matrix W1
with a bias (element of vector
)
added. The neuron input is transformed by an activation function f1
into a neuron output (element of vector
).
The output layer process signals in a similar manner considering the
vector
as an input vector. Thus, signal processing in the perceptron is
expressed as
.
Each output yk
is a closeness measure between the input pattern
and the class Dk
and the pattern is assigned to the class with a maximal closeness.
A back-propagation algorithm is usually applied
for learning the MLP. In the algorithm, a network output error is
propagated backwards to change unknown perceptron’s quantities W1,
,
W2
and
in the direction that provides error reduction. The learning cycles
repeat unless the process converges to a global error minimum. The
back-propagation algorithm needs differentiable
activation functions and usually they are
of a sigmoid type.
The other techniques analyzed in the present paper are based on probability density estimation.
1.2. Probability density estimation
A conditional probability
is a perfect criterion to classify the patterns
because it minimizes classification errors and also provides a
probabilistic measure of confidence to a classification decision
.
We can compute the probability
through the Bayess formula
.
(1)
If a priori information on possible faults is not
available and we accept that all a priori probabilities
are equal each other, probability densities
are sufficient to determine a posteriori probabilities
.
Since common parametric distribution functions
rarely fit real distributions, let us consider nonparametric
procedures. For a given point (pattern)
they use nearby class patterns (learning patterns) to estimate the
necessary densities. The estimation formula is simple
,
(2)
where n
is a total number of class patterns, V is a volume of a selected
region around the point
,
and k is a number of learning patterns that fall into the region.
If the estimation ρ is to converge to an exact density, the quantity n should increase ensuring that
(3)
There two ways to determine V and k. The first way is to fix V and to look for k. This is a principle of the Parzen Window method. The second way is to specify k and seek for V. It is realized in the K-Nearest Neighbor method.
1.3. Parzen Windows
Different types of the region (window) for accounting the patterns are employed in the PW. For each type a specific parameter, window spread s, characterizing a region volume can be introduced.
To better describe the PW method, let us
temporary assume that the region is a hypercube with a center
situated at the point
and length of cube edge as the spread
parameter. Obviously, region volume in a m-dimensional
classification space will be
.
To formalize counting the patterns, let us introduce the following window function
.
(4)
With this function the number of patterns inside the cube is
(5)
and the necessary density is given by
,
(6)
where
are training patterns. Although each window type has its own window
function and spread parameter s, the function argument
as well as computational equations (5) and (6) will remain the same.
Since for the given point
we intend to account the nearest patterns,
a hypercube is not an ideal region because the points on its surface
are in a variable distance from the center. Following this logic, we
can consider a hypersphere as a better choice. For a sphere window,
the spread parameter is radius and the window function is expressed
as
.
(7)
Thus, a variation of the PW with the sphere window may have a better classification performance because of more exact density estimation according to equation (6).
As described above, for the considered cube and sphere window functions, the contribution of all inside patterns is equal to one while the outside patterns have zero contribution. Such rigid pattern separation looks like somewhat artificial. It seems more natural to assume the following rule: the closer the pattern is situated to the window center, the greater the pattern contribution will be. To realize this rule, a Gaussian window function
(8)
is usually used in the PW. The spread parameter of the Gaussian window determines its action area. To estimate probability density, the same equation (6) is employed.
On the basis of the above reasoning, we can suppose that the variation of the PW with the Gaussian window will provide the best classification performance.
The last technique to analyze and compare is the K-NN method.
1.4. K-Nearest Neighbors
All PW variations use constant window size during classification process. If actual density is low, no patterns may fall into the window resulting in zero density estimate and miss classifying. A potential remedy for this difficulty is to let the window be a function of training data. In particular, in the K-NN method we let the window grow until it captures k patterns called K-Nearest Neighbors.
The number k is set beforehand. Then for patterns of each class the sphere is determined that embraces exactly k patterns. The greater a sphere radius and volume are, the lower the density estimated by equation (2) will be.
To examine and compare the classification techniques described in this section, a special testing procedure has been developed.
2. Testing procedure
This procedure simulates a whole diagnostic process including the steps of fault simulation, feature extraction, fault classification formation, making a classification decision, and classification accuracy estimation.
2.1. Fault simulation
Within the scope of the paper, faults of engine components (compressor, turbine, combustor etc.) are simulated by means of a nonlinear gas turbine thermodynamic model
.
(9)
The model compute monitored variables
(temperature, pressure, rotation speed, fuel consumption, etc.) as a
function of steady state operating conditions
and engine health parameters
.
Nominal values
correspond to a healthy engine whereas fault parameters
imitating fault influence by shifting component performance maps.
2.2. Feature extraction
Although gas turbine monitored variables are
affected by engine deterioration, the influence of the operating
conditions
is much more significant. To extract
diagnostic information from raw measured data, a
deviation (fault feature) is computed for each monitored variable as
a difference between actual and baseline values. With the
thermodynamic model, the deviations Zi
i=1,m induced by the fault parameters are
calculated for all m monitored variables according to an expression
.
(10)
A random error
makes the deviation more realistic. A parameter
normalizes the deviations errors resulting that they will be
localized within the interval [-1,1] for all monitored variables.
Such normalization simplifies fault class description.
An (m×1)-vector
(feature vector) forms a diagnostic space. A value of this vector is
a point in this space and a pattern to be recognized.
2.3. Fault classification formation
Numerous gas turbine faults are divided into a
limited number q of classes
.
In the present paper, each class corresponds to varying severity
faults of one engine component. The class is described by
component’s fault parameters
.
Two class types are analyzed. A class of single faults is formed by
changing one fault parameter. To create a class of multiple faults,
two parameters of the same component are varied independently.
Each class is composed from numerous patterns
.
They are computed according to expression (10) where the necessary
quantities
and
are generated by the uniform and Gaussian distributions accordingly.
To ensure high computational precision, each class is composed from
many patterns. A learning set Z1
uniting patterns of all classes
presents a whole fault classification.
2.4. Making a classification decision
In addition to the given (observed) pattern
and the constructed fault classification Z1,
one of the chosen classification techniques is an integral part of a
whole diagnostic process.
To apply and test the classification techniques, a validation set Z2 is also created in the same way as the set Z1. The difference between the sets consists in other random numbers that are generated within the same distributions.
In the considered testing procedure, an actually
examined technique uses by turn one of the set Z2
patterns to set an actual point
and set Z1
patterns
to compute the probability densities and to make classification
decision.
2.5. Classification accuracy estimation
Although the most of the considered techniques
provide a confidence estimate
for every pattern
and classification decision (diagnosis) dl,
it is of practical interest to know classification accuracy on
average for each fault class and whole engine. To this end, the
testing procedure consequently applies the classification technique
to all patterns of the set Z2 producing
diagnoses dl.
Since true fault classes Dj
are also known, probabilities of correct diagnosis (true positive
rates)
can be calculated for all classes resulting in a probability vector
.
A mean number
of these probabilities characterizes accuracy of engine diagnosis by
the applied technique. In the present paper, the probability
is employed as a criterion to compare the techniques described in
section 1.
3. Comparison conditions
For comparative calculations within the present study, a gas turbine for driving a natural gas centrifugal compressor has been chosen as a test case. It is an aeroderivative engine with a power turbine. Its thermodynamic model necessary to compute fault patterns is available. An engine operating point is close to a maximum regime and is set by a gas generator rotation speed and standard ambient conditions.
Apart from these operating conditions, the other 6 measured variables can be monitored and are used to compute patterns. These variables and their normalization parameters ai are specified in Table 1.
Table 1
Monitored variables
№ |
Variable’s name |
ai |
1 |
Compressor pressure |
0.015 |
2 |
Exhaust gas pressure |
0.015 |
3 |
Compressor temperature |
0.025 |
4 |
Exhaust gas temperature |
0.015 |
5 |
Power turbine temperature |
0.020 |
6 |
Fuel consumption |
0.020 |
The faults are simulated through 9 fault parameters embedded into the model. They change from 0 to -5%. As shown in Table 2, 9 single fault classes and 4 multiple fault classes are formed. Regardless of simulated faults, single or multiple, each class is presented by n = 1000 patterns.
4. Techniques Comparison
According to the description in section 1, five
classification techniques will be compared: Multilayer Perceptron
(MLP), three variations of the Parzen Window method (briefly called
PW-cube, PW-sphere, and PW-Gauss), and the K-Nearest Neighbor (K-NN)
method. The true positive rate
is a criterion to choose the best technique.
Table 2
Fault parameters and fault classes
№ |
|
Fault classes |
|
Single |
Multiple |
||
1 |
Compressor flow parameter
|
D1 |
D1 |
2 |
Compressor efficiency parameter |
D2 |
|
3 |
High pressure turbine flow parameter |
D3 |
D2 |
4 |
High pressure turbine efficiency parameter |
D4 |
|
5 |
Power turbine flow parameter
|
D5 |
D3 |
6 |
Power turbine efficiency parameter |
D6 |
|
7 |
Combustion chamber total pressure recovery parameter |
D7 |
D4 |
8 |
Combustion efficiency parameter |
D8 |
|
9 |
Inlet device total pressure recovery factor |
D9 |
|
4.1. Technique adjustment
For the sake of correct comparison, each technique should be tuned to the solved problem, diagnosis of the chosen engine. The MLP was tuned for a diagnostic application in our previous works [for example, 4]. In particular, a number 27 of hidden layer nodes and a resilient back-propagation training algorithm have been found the best and were accepted for the present study.
Now, we need to tune the spread s for the
variations of the PW and the nearest neighbor’s number k for the
K-NN. The criterion to determine the best values of these parameters
is the same, probability
.
For the PW-cube
technique and single fault classes, calculations with different
values of the hypercube edge length s
have been performed.
Three groups of calculations were executed
with varying seeds: with Seed 1, with Seed 2, and with 10 different
seeds and averaging the probabilities
.
Seed means here a specific parameter that determines a series of
random numbers of the used uniform and normal distributions. Figure
2 shows the resulting probabilities
as a functions of the spread parameter and
its optimal value s =
1.14.
Similar tuning calculations were repeated for all the techniques and two classification types. The resulting optimal values are given in Table 3. To better imagine the proportion between a window and a fault class region, remember that a maximum amplitude of pattern random errors is 1 and a total class patterns number is 1000.
One can see from the table data that the optimal spread values for the multiple faults are greater than the corresponding values of the single faults. Additionally, an optimal sphere diameters 2s is greater than the corresponding cube edges s. From our point of view,
Fig. 2. Tuning the Parzen Window method
(single faults)
these facts reflect a general rule that for all cases an approximately constant proportion is conserved between the number of patterns inside the optimal window and the total number of class patterns.
Table 3
Technique |
Parameter |
Classes |
|
Single |
Multiple |
||
PW-cube |
s |
1.14 |
1.30 |
PW-sphere |
s |
0.86 |
0.95 |
PW-Gauss |
s |
0.30 |
0.35 |
K-NN |
k |
21 |
18 |
4.2. Comparison results
With the known tuning parameter values, the calculations of the correct diagnosis probability have been executed once more by each technique and for both class types. The results are given in Table 4 where the techniques are arranged according to the probability increment. As can be seen, the PW-sphere technique is approximately equal to the PW-cube for the single faults and is more accurate for the multiple faults. In its turn, the PW-Gauss classifies fault pattern better than the PW-sphere for both class types. We can see that these conclusions about technique accuracy coincide with the suppositions made in section 1. As to the K-NN technique, it is more or less equal to the PW-Gauss: for the single faults the K-NN gains, but it yields for the multiple faults. However, these two best techniques using probability density perform worth than the MLP.
It is also can be seen that the techniques do not differ a lot: the maximum probability change within the same class type is only 0.014 (1.4%). On the other hand, it was shown in [4] that computational errors are pretty great, ±0.01. This means that the differences between the techniques can be partly explained by low computational precision.
Table 4
Probabilities
for different techniques
Technique |
Classes |
|
Single |
Multiple |
|
PW-cube |
0.8101 |
0.8648 |
PW-sphere |
0.8098 |
0.8698 |
PW-Gauss |
0.8131 |
0.8748 |
K-NN |
0.8160 |
0.8720 |
MLP |
0.8238 |
0.8760 |
Preliminary, we can state that the PW-Gauss and K-NN techniques do not yield a lot to the MLP. Because these two classification techniques have an advantage of providing a confidence measure for every classification decision, they can be recommended for real application.
Discussion
The present paper can be considered only as a preliminary study. In spite of some results obtained, the paper revealed important issues to be solved in future.
First, to draw final conclusion on techniques efficiency, the comparative calculations should be repeated with higher precision. We find it possible to decrease computational errors in 10 times.
Second, since estimating diagnostic decision confidence is an important property of the analyzed techniques, it seems to be of practical interest to determine the estimation precision.
Third, in the present study, the techniques were examined at one static gas turbine operating point i.e. for one-point diagnosis. Because multipoint diagnosis and diagnosis at transients promise more accurate results, it seems important to examine the techniques for these perspective diagnostic options.
Conclusions
Thus, in the present paper four techniques have been examined that classify gas turbine faults through estimating probability densities for the considered classes. They were compared with each other and with the Multilayer Perceptron (MLP) using the criterion of mean probability of correct diagnosis. It was found that the best two techniques, Parzen Windows with Gaussian window and K-Nearest Neighbors, yield just a little to the MLP. These two techniques are recommended for gas turbine diagnosis because they provide confidence estimation for each diagnostic decision, the property very valuable in practice.
The present study also revealed some issues to solve in future investigations. They are related with more precise probability computation and with study extension on multipoint and transient diagnosis.
Acknowledgments
The work has been carried out with the support of the National Polytechnic Institute of Mexico (research project 20131509).
1. Roemer M. J. Advanced diagnostics and prognostics for gas turbine engine risk assessment [Text] / M. J. Roemer and G. J. Kacprzynski // Proc. ASME Turbo Expo 2000, Munich, Germany, May 8-11, 2000. − 10p.
2. Ogaji S.O.T. Gas path fault diagnosis of a turbofan engine from transient data using artificial neural networks [Text] / S.O.T. Ogaji, Y. G. Li, S. Sampath, et al. // Proc. ASME Turbo Expo 2003, Atlanta, Georgia, USA, June 16-19, 2003. − 10p.
3. Volponi A.J. The use of Kalman filter and neural network methodologies in gas turbine performance diagnostics: a comparative study [Text] / A.J. Volponi, H. DePold, R. Ganguli // Journal of Engineering for Gas Turbines and Power. − 2003. − Vol. 125, Issue 4. − P. 917-924.
4. Loboda I. Neural networks for gas turbine fault identification: multilayer perceptron or radial basis network [Text] / I. Loboda, Ya. Feldshteyn, V. Ponomaryov // International Journal of Turbo & Jet Engines. − 2012. − Vol.29, Is. 1. − P. 37-48 (ASME Paper No. GT2011-46752).
5. Loboda I. On the selection of an optimal pattern recognition technique for gas turbine diagnosis [Text] / I. Loboda, S. Yepifanov // Proc. ASME Turbo Expo 2013, San Antonio, Texas, USA, June 3-7, 2013. − 11p., ASME Paper No. GT2013-95198.
6. Duda, R.O. Pattern Classification [Text] / R.O. Duda, P.E. Hart, D.G. Stork. − New York: Wiley-Interscience, 2001. − 654 p.