I am working on a final year project on an unlabelled dataset consisting of vibration data from multiple components inside a wind turbine.
Datasets:
I have data from 4 wind turbines each consisting of 415 10-second intervals.
About the 10 second interval data:
- Each of the 415 10-second intervals consist of vibration data for the generator, gearbox etc. (14 features in total)
- The vibration data (the 14 features) have a resolution of 25.6kHz (262144 rows in each interval)
- The 10-seconds are recorded once every day, at different times => A little more than 1 year worth of data
Head of dataframe with some of the features shown:
Plan:
My current plan is to
Do a Fast Fourier Transformation (FFT) from the time domain for each of the different sensors (gearbox, generator etc.) for each of the 415 intervals. From the FFT I am able to extract frequency information to put in a dataframe. (Statistical data from the FFT like spectral RMS per bin)
Build different data sets for different components.
Add features such as wind speed, wind direction, power produced etc.
I will then build unsupervised ML models that can detect anomalies.
Unsupervised models I consider using are Encoder-Decorder and clustering.
Questions:
- Does it look like I have enough data for this type of task? 415 intervals x 4 different turbines = 1660 rows and approx. 20 features
- Should the data be treated as a time series? (It is sampled for 10 seconds once a day at random times..)
- What other unsupervised ML models/approaches that could be good for this task?
I hope this was clearly written. Thanks in advance for any input!