1

I am working on a final year project on an unlabelled dataset consisting of vibration data from multiple components inside a wind turbine.

Datasets:

I have data from 4 wind turbines each consisting of 415 10-second intervals.

About the 10 second interval data:

  • Each of the 415 10-second intervals consist of vibration data for the generator, gearbox etc. (14 features in total)
  • The vibration data (the 14 features) have a resolution of 25.6kHz (262144 rows in each interval)
  • The 10-seconds are recorded once every day, at different times => A little more than 1 year worth of data

Head of dataframe with some of the features shown:

enter image description here

Plan:

My current plan is to

  1. Do a Fast Fourier Transformation (FFT) from the time domain for each of the different sensors (gearbox, generator etc.) for each of the 415 intervals. From the FFT I am able to extract frequency information to put in a dataframe. (Statistical data from the FFT like spectral RMS per bin)

  2. Build different data sets for different components.

  3. Add features such as wind speed, wind direction, power produced etc.

  4. I will then build unsupervised ML models that can detect anomalies.

Unsupervised models I consider using are Encoder-Decorder and clustering.

Questions:

  1. Does it look like I have enough data for this type of task? 415 intervals x 4 different turbines = 1660 rows and approx. 20 features
  2. Should the data be treated as a time series? (It is sampled for 10 seconds once a day at random times..)
  3. What other unsupervised ML models/approaches that could be good for this task?

I hope this was clearly written. Thanks in advance for any input!

meerkat
  • 932
  • 2
  • 14
  • 38

0 Answers0