wind turbine high-凯发k8网页登录

this example shows how to build an exponential degradation model to predict the remaining useful life (rul) of a wind turbine bearing in real time. the exponential degradation model predicts the rul based on its parameter priors and the latest measurements (historical run-to-failure data can help estimate the model parameters priors, but they are not required). the model is able to detect the significant degradation trend in real time and updates its parameter priors when a new observation becomes available. the example follows a typical prognosis workflow: data import and exploration, feature extraction and postprocessing, feature importance ranking and fusion, model fitting and prediction, and performance analysis.

dataset

the dataset is collected from a 2mw wind turbine high-speed shaft driven by a 20-tooth pinion gear [1]. a vibration signal of 6 seconds was acquired each day for 50 consecutive days (there are 2 measurements on march 17, which are treated as two days in this example). an inner race fault developed and caused the failure of the bearing across the 50-day period.

a compact version of the dataset is available in the toolbox. to use the compact dataset, copy the dataset to the current folder and enable its write permission.

copyfile(...
    fullfile(matlabroot, 'toolbox', 'predmaint', ...
    'predmaintdemos', 'windturbinehighspeedbearingprognosis'), ...
    'windturbinehighspeedbearingprognosis-data-master')
fileattrib(fullfile('windturbinehighspeedbearingprognosis-data-master', '*.mat'), ' w')

the measurement time step for the compact dataset is 5 days.

timeunit = '\times 5 day';

for the full dataset, go to this link , download the entire repository as a zip file and save it in the same directory as this live script. unzip the file using this command. the measurement time step for the full dataset is 1 day.

if exist('windturbinehighspeedbearingprognosis-data-master.zip', 'file')
    unzip('windturbinehighspeedbearingprognosis-data-master.zip')
    timeunit = 'day';
end

the results in this example are generated from the full dataset. it is highly recommended to download the full dataset to run this example. results generated from the compact dataset might not be meaningful.

data import

create a fileensembledatastore of the wind turbine data. the data contains a vibration signal and a tachometer signal. the fileensembledatastore will parse the file name and extract the date information as independentvariables. see the helper functions in the supporting files associated with this example for more details.

hsbearing = fileensembledatastore(...
    fullfile('.', 'windturbinehighspeedbearingprognosis-data-master'), ...
    '.mat');
hsbearing.datavariables = ["vibration", "tach"];
hsbearing.independentvariables = "date";
hsbearing.selectedvariables = ["date", "vibration", "tach"];
hsbearing.readfcn = @helperreaddata;
hsbearing.writetomemberfcn = @helperwritetohsbearing;
tall(hsbearing)

ans =
  m×3 tall table
            date                vibration             tach      
    ____________________    _________________    _______________
    07-mar-2013 01:57:46    [585936×1 double]    [2446×1 double]
    08-mar-2013 02:34:21    [585936×1 double]    [2411×1 double]
    09-mar-2013 02:33:43    [585936×1 double]    [2465×1 double]
    10-mar-2013 03:01:02    [585936×1 double]    [2461×1 double]
    11-mar-2013 03:00:24    [585936×1 double]    [2506×1 double]
    12-mar-2013 06:17:10    [585936×1 double]    [2447×1 double]
    13-mar-2013 06:34:04    [585936×1 double]    [2438×1 double]
    14-mar-2013 06:50:41    [585936×1 double]    [2390×1 double]
             :                      :                   :
             :                      :                   :

sample rate of vibration signal is 97656 hz.

fs = 97656; % hz

data exploration

this section explores the data in both time domain and frequency domain and seeks inspiration of what features to extract for prognosis purposes.

first visualize the vibration signals in the time domain. in this dataset, there are 50 vibration signals of 6 seconds measured in 50 consecutive days. now plot the 50 vibration signals one after each other.

reset(hsbearing)
tstart = 0;
figure
hold on
while hasdata(hsbearing)
    data = read(hsbearing);
    v = data.vibration{1};
    t = tstart   (1:length(v))/fs;
    % downsample the signal to reduce memory usage
    plot(t(1:10:end), v(1:10:end))
    tstart = t(end);
end
hold off
xlabel('time (s), 6 second per day, 50 days in total')
ylabel('acceleration (g)')

the vibration signals in time domain reveals an increasing trend of the signal impulsiveness. indicators quantifying the impulsiveness of the signal, such as kurtosis, peak-to-peak value, crest factors etc., are potential prognostic features for this wind turbine bearing dataset [2].

on the other hand, spectral kurtosis is considered powerful tool for wind turbine prognosis in frequency domain [3]. to visualize the spectral kurtosis changes along time, plot the spectral kurtosis values as a function of frequency and the measurement day.

hsbearing.datavariables = ["vibration", "tach", "spectralkurtosis"];
colors = parula(50);
figure
hold on
reset(hsbearing)
day = 1;
while hasdata(hsbearing)
    data = read(hsbearing);
    data2add = table;
    
    % get vibration signal and measurement date
    v = data.vibration{1};
    
    % compute spectral kurtosis with window size = 128
    wc = 128;
    [sk, f] = pkurtosis(v, fs, wc);
    data2add.spectralkurtosis = {table(f, sk)};
    
    % plot the spectral kurtosis
    plot3(f, day*ones(size(f)), sk, 'color', colors(day, :))
    
    % write spectral kurtosis values
    writetolastmemberread(hsbearing, data2add);
    
    % increment the number of days
    day = day   1;
end
hold off
xlabel('frequency (hz)')
ylabel('time (day)')
zlabel('spectral kurtosis')
grid on
view(-45, 30)
cbar = colorbar;
ylabel(cbar, 'fault severity (0 - healthy, 1 - faulty)')

fault severity indicated in colorbar is the measurement date normalized into 0 to 1 scale. it is observed that the spectral kurtosis value around 10 khz gradually increases as the machine condition degrades. statistical features of the spectral kurtosis, such as mean, standard deviation etc., will be potential indicators of the bearing degradation [3].

feature extraction

based on the analysis in the previous section, a collection of statistical features derived from time-domain signal and spectral kurtosis are going to be extracted. more mathematical details about the features are provided in [2-3].

first, pre-assign the feature names in datavariables before writing them into the fileensembledatastore.

hsbearing.datavariables = [hsbearing.datavariables; ...
    "mean"; "std"; "skewness"; "kurtosis"; "peak2peak"; ...
    "rms"; "crestfactor"; "shapefactor"; "impulsefactor"; "marginfactor"; "energy"; ...
    "skmean"; "skstd"; "skskewness"; "skkurtosis"];

compute feature values for each ensemble member.

hsbearing.selectedvariables = ["vibration", "spectralkurtosis"];
reset(hsbearing)
while hasdata(hsbearing)
    data = read(hsbearing);
    v = data.vibration{1};
    sk = data.spectralkurtosis{1}.sk;
    features = table;
    
    % time domain features
    features.mean = mean(v);
    features.std = std(v);
    features.skewness = skewness(v);
    features.kurtosis = kurtosis(v);
    features.peak2peak = peak2peak(v);
    features.rms = rms(v);
    features.crestfactor = max(v)/features.rms;
    features.shapefactor = features.rms/mean(abs(v));
    features.impulsefactor = max(v)/mean(abs(v));
    features.marginfactor = max(v)/mean(abs(v))^2;
    features.energy = sum(v.^2);
    
    % spectral kurtosis related features
    features.skmean = mean(sk);
    features.skstd = std(sk);
    features.skskewness = skewness(sk);
    features.skkurtosis = kurtosis(sk);
    
    % write the derived features to the corresponding file
    writetolastmemberread(hsbearing, features);
end

select the independent variable date and all the extracted features to construct the feature table.

hsbearing.selectedvariables = ["date", "mean", "std", "skewness", "kurtosis", "peak2peak", ...
    "rms", "crestfactor", "shapefactor", "impulsefactor", "marginfactor", "energy", ...
    "skmean", "skstd", "skskewness", "skkurtosis"];

since the feature table is small enough to fit in memory (50 by 15), gather the table before processing. for big data, it is recommended to perform operations in tall format until you are confident that the output is small enough to fit in memory.

featuretable = gather(tall(hsbearing));

evaluating tall expression using the parallel pool 'local':
- pass 1 of 1: completed in 1 sec
evaluation completed in 1 sec

convert the table to timetable so that the time information is always associated with the feature values.

featuretable = table2timetable(featuretable)

featuretable=50×15 timetable
            date             mean       std       skewness      kurtosis    peak2peak     rms      crestfactor    shapefactor    impulsefactor    marginfactor      energy        skmean       skstd      skskewness    skkurtosis
    ____________________    _______    ______    ___________    ________    _________    ______    ___________    ___________    _____________    ____________    __________    __________    ________    __________    __________
    07-mar-2013 01:57:46    0.34605    2.2705      0.0038699     2.9956      21.621      2.2967      4.9147         1.2535          6.1607           3.3625       3.0907e 06      0.001253    0.025674     -0.22882        3.362  
    08-mar-2013 02:34:21    0.24409    2.0621      0.0030103     3.0195       19.31      2.0765      4.9129         1.2545           6.163           3.7231       2.5266e 06     0.0046823    0.020888     0.057651       3.3508  
    09-mar-2013 02:33:43    0.21873    2.1036     -0.0010289     3.0224      21.474      2.1149      5.2143         1.2539          6.5384           3.8766       2.6208e 06    -0.0011084    0.022705     -0.50004       4.9953  
    10-mar-2013 03:01:02    0.21372    2.0081       0.001477     3.0415       19.52      2.0194       5.286         1.2556           6.637           4.1266       2.3894e 06     0.0087035    0.034456       1.4705       8.1235  
    11-mar-2013 03:00:24    0.21518    2.0606      0.0010116     3.0445      21.217      2.0718      5.0058         1.2554          6.2841           3.8078        2.515e 06      0.013559    0.032193      0.11355        3.848  
    12-mar-2013 06:17:10    0.29335    2.0791      -0.008428      3.018       20.05      2.0997      4.7966         1.2537          6.0136           3.5907       2.5833e 06     0.0017539    0.028326      -0.1288       3.8072  
    13-mar-2013 06:34:04    0.21293     1.972     -0.0014294     3.0174      18.837      1.9834      4.8496         1.2539          6.0808           3.8441       2.3051e 06     0.0039353    0.029292      -1.4734       8.1242  
    14-mar-2013 06:50:41    0.24401    1.8114      0.0022161     3.0057      17.862      1.8278      4.8638         1.2536          6.0975           4.1821       1.9575e 06     0.0013107    0.022468      0.27438       2.8597  
    15-mar-2013 06:50:03    0.20984    1.9973       0.001559     3.0711       21.12      2.0083      5.4323         1.2568          6.8272           4.2724       2.3633e 06     0.0023769    0.047767      -2.5832       20.171  
    16-mar-2013 06:56:43    0.23318    1.9842     -0.0019594     3.0072      18.832      1.9979      5.0483          1.254          6.3304           3.9734       2.3387e 06     0.0076327    0.030418      0.52322       4.0082  
    17-mar-2013 06:56:04    0.21657     2.113     -0.0013711     3.1247      21.858      2.1241      5.4857         1.2587          6.9048           4.0916       2.6437e 06     0.0084907    0.037876       2.3753       11.491  
    17-mar-2013 18:47:56    0.19381    2.1335      -0.012744     3.0934      21.589      2.1423      4.7574         1.2575          5.9825           3.5117       2.6891e 06      0.019624    0.055537       3.1986       17.796  
    18-mar-2013 18:47:15    0.21919    2.1284     -0.0002039     3.1647      24.051      2.1396      5.7883         1.2595          7.2902           4.2914       2.6824e 06      0.016315    0.064516       2.8735       11.632  
    20-mar-2013 00:33:54    0.35865    2.2536      -0.002308     3.0817      22.633      2.2819      5.2771         1.2571          6.6339           3.6546       3.0511e 06     0.0011097    0.051539    -0.056774       7.0712  
    21-mar-2013 00:33:14     0.1908    2.1782    -0.00019286     3.1548      25.515      2.1865      6.0521           1.26          7.6258           4.3945       2.8013e 06     0.0040392    0.066254     -0.39587       12.111  
    22-mar-2013 00:39:50    0.20569    2.1861      0.0020375     3.2691      26.439      2.1958      6.1753         1.2633          7.8011           4.4882        2.825e 06      0.020676    0.077728       2.6038       11.088  
      ⋮

feature postprocessing

extracted features are usually associated with noise. the noise with opposite trend can sometimes be harmful to the rul prediction. in addition, one of the feature performance metrics, monotonicity, to be introduced next is not robust to noise. therefore, a causal moving mean filter with a lag window of 5 steps is applied to the extracted features, where "causal" means no future value is used in the moving mean filtering.

variablenames = featuretable.properties.variablenames;
featuretablesmooth = varfun(@(x) movmean(x, [5 0]), featuretable);
featuretablesmooth.properties.variablenames = variablenames;

here is an example showing the feature before and after smoothing.

figure
hold on
plot(featuretable.date, featuretable.skmean)
plot(featuretablesmooth.date, featuretablesmooth.skmean)
hold off
xlabel('time')
ylabel('feature value')
legend('before smoothing', 'after smoothing')
title('skmean')

moving mean smoothing introduces a time delay of the signal, but the delay effect can be mitigated by selecting proper threshold in the rul prediction.

training data

in practice, the data of the whole life cycle is not available when developing the prognostic algorithm, but it is reasonable to assume that some data in the early stage of the life cycle has been collected. hence data collected in the first 20 days (40% of the life cycle) is treated as training data. the following feature importance ranking and fusion is only based on the training data.

breaktime = datetime(2013, 3, 27);
breakpoint = find(featuretablesmooth.date < breaktime, 1, 'last');
traindata = featuretablesmooth(1:breakpoint, :);

feature importance ranking

in this example, monotonicity proposed by [3] is used to quantify the merit of the features for prognosis purpose.

monotonicity of $i$ th feature $x_{i}$ is computed as

$monotonicity (x_{i}) = \frac{1}{m} \sum_{j = 1}^{m} \frac{| number of positive diff (x_{i}^{j}) - number of negative diff (x_{i}^{j}) |}{n - 1}$

where $n$ is the number of measurement points, in this case $n = 50$ . $m$ is the number of machines monitored, in this case $m = 1$ . $x_{i}^{j}$ is the $i$ th feature measured on $j$ th machine. $diff (x_{i}^{j}) = x_{i}^{j} (t) - x_{i}^{j} (t - 1)$ , i.e. the difference of the signal $x_{i}^{j}$ .

% since moving window smoothing is already done, set 'windowsize' to 0 to
% turn off the smoothing within the function
featureimportance = monotonicity(traindata, 'windowsize', 0);
helpersortedbarplot(featureimportance, 'monotonicity');

kurtosis of the signal is the top feature based on the monotonicity.

features with feature importance score larger than 0.3 are selected for feature fusion in the next section.

traindataselected = traindata(:, featureimportance{:,:}>0.3);
featureselected = featuretablesmooth(:, featureimportance{:,:}>0.3)

featureselected=50×5 timetable
            date             mean      kurtosis    shapefactor    marginfactor     skstd  
    ____________________    _______    ________    ___________    ____________    ________
    07-mar-2013 01:57:46    0.34605     2.9956       1.2535          3.3625       0.025674
    08-mar-2013 02:34:21    0.29507     3.0075        1.254          3.5428       0.023281
    09-mar-2013 02:33:43    0.26962     3.0125        1.254          3.6541       0.023089
    10-mar-2013 03:01:02    0.25565     3.0197       1.2544          3.7722       0.025931
    11-mar-2013 03:00:24    0.24756     3.0247       1.2546          3.7793       0.027183
    12-mar-2013 06:17:10    0.25519     3.0236       1.2544          3.7479       0.027374
    13-mar-2013 06:34:04      0.233     3.0272       1.2545          3.8282       0.027977
    14-mar-2013 06:50:41    0.23299     3.0249       1.2544          3.9047        0.02824
    15-mar-2013 06:50:03     0.2315      3.033       1.2548          3.9706       0.032417
    16-mar-2013 06:56:43    0.23475     3.0273       1.2546          3.9451       0.031744
    17-mar-2013 06:56:04    0.23498     3.0407       1.2551          3.9924       0.032691
    17-mar-2013 18:47:56    0.21839     3.0533       1.2557          3.9792       0.037226
    18-mar-2013 18:47:15    0.21943     3.0778       1.2567          4.0538       0.043097
    20-mar-2013 00:33:54    0.23854     3.0905       1.2573          3.9658       0.047942
    21-mar-2013 00:33:14    0.23537     3.1044       1.2578          3.9862       0.051023
    22-mar-2013 00:39:50    0.23079     3.1481       1.2593           4.072       0.058908
      ⋮

dimension reduction and feature fusion

principal component analysis (pca) is used for dimension reduction and feature fusion in this example. before performing pca, it is a good practice to normalize the features into the same scale. note that pca coefficients and the mean and standard deviation used in normalization are obtained from training data, and applied to the entire dataset.

meantrain = mean(traindataselected{:,:});
sdtrain = std(traindataselected{:,:});
traindatanormalized = (traindataselected{:,:} - meantrain)./sdtrain;
coef = pca(traindatanormalized);

the mean, standard deviation and pca coefficients are used to process the entire data set.

pca1 = (featureselected{:,:} - meantrain) ./ sdtrain * coef(:, 1);
pca2 = (featureselected{:,:} - meantrain) ./ sdtrain * coef(:, 2);

visualize the data in the space of the first two principal components.

figure
numdata = size(featuretable, 1);
scatter(pca1, pca2, [], 1:numdata, 'filled')
xlabel('pca 1')
ylabel('pca 2')
cbar = colorbar;
ylabel(cbar, ['time (' timeunit ')'])

the plot indicates that the first principal component is increasing as the machine approaches to failure. therefore, the first principal component is a promising fused health indicator.

healthindicator = pca1;

visualize the health indicator.

figure
plot(featureselected.date, healthindicator, '-o')
xlabel('time')
title('health indicator')

fit exponential degradation models for remaining useful life (rul) estimation

exponential degradation model is defined as

$h (t) = ϕ θ \exp (β t ϵ - \frac{σ^{2}}{2})$

where $h (t)$ is the health indicator as a function of time. $ϕ$ is the intercept term considered as a constant. $θ$ and $β$ are random parameters determining the slope of the model, where $θ$ is lognormal-distributed and $β$ is gaussian-distributed. at each time step $t$ , the distribution of $θ$ and $β$ is updated to the posterior based on the latest observation of $h (t)$ . $ϵ$ is a gaussian white noise yielding to $n (0, σ^{2})$ . the $- \frac{σ^{2}}{2}$ term in the exponential is to make the expectation of $h (t)$ satisfy

$e [h (t) | θ, β] = ϕ θ \exp (β t)$ .

here an exponential degradation model is fit to the health indicator extracted in the last section, and the performances is evaluated in the next section.

first shift the health indicator so that it starts from 0.

healthindicator = healthindicator - healthindicator(1);

the selection of threshold is usually based on the historical records of the machine or some domain-specific knowledge. since no historical data is available in this dataset, the last value of the health indicator is chosen as the threshold. it is recommended to choose the threshold based on the smoothed (historical) data so that the delay effect of smoothing will be partially mitigated.

threshold = healthindicator(end);

if historical data is available, use fit method provided by exponentialdegradationmodel to estimate the priors and intercept. however, historical data is not available for this wind turbine bearing dataset. the prior of the slope parameters are chosen arbitrarily with large variances ( $e (θ) = 1, var (θ) = 10^{6}, e (β) = 1, var (β) = 10^{6}$ ) so that the model is mostly relying on the observed data. based on $e [h (0)] = ϕ e (θ)$ , intercept $ϕ$ is set to $- 1$ so that the model will start from 0 as well.

the relationship between the variation of health indicator and the variation of noise can be derived as

$δ h (t) \approx (h (t) - ϕ) δ ϵ (t)$

here the standard deviation of the noise is assumed to cause 10% of variation of the health indicator when it is near the threshold. therefore, the standard deviation of the noise can be represented as $\frac{10 % \cdot threshold}{threshold - ϕ}$ .

the exponential degradation model also provides a functionality to evaluate the significance of the slope. once a significant slope of the health indicator is detected, the model will forget the previous observations and restart the estimation based on the original priors. the sensitivity of the detection algorithm can be tuned by specifying slopedetectionlevel. if p value is less than slopedetectionlevel, the slope is declared to be detected. here slopedetectionlevel is set to 0.05.

now create an exponential degradation model with the parameters discussed above.

mdl = exponentialdegradationmodel(...
    'theta', 1, ...
    'thetavariance', 1e6, ...
    'beta', 1, ...
    'betavariance', 1e6, ...
    'phi', -1, ...
    'noisevariance', (0.1*threshold/(threshold   1))^2, ...
    'slopedetectionlevel', 0.05);

use predictrul and update methods to predict the rul and update the parameter distribution in real time.

% keep records at each iteration
totalday = length(healthindicator) - 1;
estruls = zeros(totalday, 1);
trueruls = zeros(totalday, 1);
ciruls = zeros(totalday, 2);
pdfruls = cell(totalday, 1);
% create figures and axes for plot updating
figure
ax1 = subplot(2, 1, 1);
ax2 = subplot(2, 1, 2);
for currentday = 1:totalday
    
    % update model parameter posterior distribution
    update(mdl, [currentday healthindicator(currentday)])
    
    % predict remaining useful life
    [estrul, cirul, pdfrul] = predictrul(mdl, ...
                                         [currentday healthindicator(currentday)], ...
                                         threshold);
    truerul = totalday - currentday   1;
    
    % updating rul distribution plot
    helperplottrend(ax1, currentday, healthindicator, mdl, threshold, timeunit);
    helperplotrul(ax2, truerul, estrul, cirul, pdfrul, timeunit)
    
    % keep prediction results
    estruls(currentday) = estrul;
    trueruls(currentday) = truerul;
    ciruls(currentday, :) = cirul;
    pdfruls{currentday} = pdfrul;
    
    % pause 0.1 seconds to make the animation visible
    pause(0.1)
end

here is the animation of the real-time rul estimation.

performance analysis

$α$ - $λ$ plot is used for prognostic performance analysis [5], where $α$ bound is set to 20%. the probability that the estimated rul is between the $α$ bound of the true rul is calculated as a performance metric of the model:

$pr (r^{*} (t) - α r^{*} (t) < r (t) < r^{*} (t) α r^{*} (t) | θ (t))$

where $r (t)$ is the estimated rul at time $t$ , $r^{*} (t)$ is the true rul at time $t$ , $θ (t)$ is the estimated model parameters at time $t$ .

alpha = 0.2;
detecttime = mdl.slopedetectioninstant;
prob = helperalphalambdaplot(alpha, trueruls, estruls, ciruls, ...
    pdfruls, detecttime, breakpoint, timeunit);
title('\alpha-\lambda plot')

since the preset prior does not reflect the true prior, the model usually need a few time steps to adjust to a proper parameter distribution. the prediction becomes more accurate as more data points are available.

visualize the probability of the predicted rul within the $α$ bound.

figure
t = 1:totalday;
hold on
plot(t, prob)
plot([breakpoint breakpoint], [0 1], 'k-.')
hold off
xlabel(['time (' timeunit ')'])
ylabel('probability')
legend('probability of predicted rul within \alpha bound', 'train-test breakpoint')
title(['probability within \alpha bound, \alpha = ' num2str(alpha*100) '%'])

references

[1] bechhoefer, eric, brandon van hecke, and david he. "processing for improved spectral analysis." annual conference of the prognostics and health management society, new orleans, la, oct. 2013.

[2] ali, jaouher ben, et al. "online automatic diagnosis of wind turbine bearings progressive degradations under real experimental conditions based on unsupervised machine learning." applied acoustics 132 (2018): 167-181.

[3] saidi, lotfi, et al. "wind turbine high-speed shaft bearings health prognosis through a spectral kurtosis-derived indices and svr." applied acoustics 120 (2017): 1-8.

[4] coble, jamie baalis. "merging data sources to predict remaining useful life–an automated method to identify prognostic parameters." (2010).

[5] saxena, abhinav, et al. "metrics for offline evaluation of prognostic performance." international journal of prognostics and health management 1.1 (2010): 4-23.

wind turbine high-凯发k8网页登录

dataset

data import

data exploration

feature extraction

feature postprocessing

training data

feature importance ranking

dimension reduction and feature fusion

fit exponential degradation models for remaining useful life (rul) estimation

performance analysis

references

see also

related topics

wind turbine high-凯发k8网页登录

dataset

data import

data exploration

feature extraction

feature postprocessing

training data

feature importance ranking

dimension reduction and feature fusion

fit exponential degradation models for remaining useful life (rul) estimation

performance analysis

references

see also

related topics

wechat