Simulating run-to-failure SCADA time series to enhance wind turbine fault detection and prognosis
Abstract. Wind turbine Supervisory Control and Data Acquisition (SCADA) datasets available for research usually contain a limited number of failure events. This limitation hinders the successful application of Deep Learning (DL) methods for fault detection and prognosis, as they require large datasets for robust training and generalisation. This work proposes a method using Conditional Generative Adversarial Networks (cGANs) to generate synthetic SCADA time series that replicate wind turbine behaviour under controllable operational, environmental, and degradation conditions. Given a set of SCADA time series representing these conditions, the cGAN generates temperature and pressure time series simulating gearbox operation. Results show that augmenting the training set of an Artificial Neural Network (ANN) fault detection model with synthetic time series reduces false positives in the detected gearbox faults by 84 % on average, enabling the model to blindly detect a fault in a test wind turbine without prior knowledge of the event. Furthermore, training a Convolutional Autoencoder-based unsupervised health indicator (HI) model with both real and synthetic SCADA time series leads to an HI that more accurately captures the expected degradation trend. Using this HI, the gearbox's remaining useful life (RUL) can be predicted within the defined error bounds from around 4.5 months before the detection of the fault, while the HI obtained without the synthetic data fails to produce reliable RUL estimates.