Scalable SCADA-driven Failure Prediction for Offshore Wind Turbines Using Autoencoder-Based NBM and Fleet-Median Filtering

Vervlimmeren, Ivo; Chesterman, Xavier; Verstraeten, Timothy; Nowé, Ann; Helsen, Jan

doi:https://doi.org/10.5194/wes-2025-49

Preprints

https://doi.org/10.5194/wes-2025-49

Preprints

20 May 2025

| 20 May 2025

Status: a revised version of this preprint is currently under review for the journal WES.

Scalable SCADA-driven Failure Prediction for Offshore Wind Turbines Using Autoencoder-Based NBM and Fleet-Median Filtering

Ivo Vervlimmeren, Xavier Chesterman, Timothy Verstraeten, Ann Nowé, and Jan Helsen

Abstract. Offshore wind turbines are crucial for sustainable energy production but face significant challenges in operational reliability and maintenance costs. In particular, the scalability and practicality of failure detection systems are a key challenge in large-scale wind farms. This paper presents a scalable, comprehensive approach to failure prediction based on the Normal Behavior Modeling (NBM) framework that integrates three components: a cloud-based pipeline, an undercomplete autoencoder for temperature-based anomaly detection, and a physics-informed, time-aware anomaly filtering method. The pipeline enables dynamic scaling and streamlined deployment across multiple wind farms. The autoencoder was trained exclusively on healthy 10-minute SCADA data and produces detailed anomaly scores that serve as the input for our filtering technique. It was trained on four years of data from a large offshore wind farm in the Dutch-Belgian zone and achieved UHH-ratios (UnHealthy-Healthy) of up to 1.69 and 1.21 for the generator and gearbox models, respectively. The filtering method refines the raw anomaly scores by comparing turbine signals to a windowed fleet median. By aggregating scores via sliding windows and employing robust distance metrics, the method reduces the volume of anomaly scores by up to 65 % without sacrificing predictive accuracy. This selective filtering effectively minimizes noise and non-relevant anomalies, enhancing the efficiency of maintenance analysis.

Received: 20 Mar 2025 – Discussion started: 20 May 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Ivo Vervlimmeren, Xavier Chesterman, Timothy Verstraeten, Ann Nowé, and Jan Helsen

Status: final response (author comments only)

RC1: 'Comment on wes-2025-49', Anonymous Referee #1, 09 Jun 2025

This manuscript presents a well-structured and methodologically rigorous approach to scalable failure prediction in offshore wind turbines using SCADA data, autoencoder-based normal behavior modeling, and fleet median filtering. The authors have developed and validated a cloud-based, modular pipeline and propose a post-processing technique to reduce false positives in anomaly detection. While the work is timely and technically sound, several aspects could benefit from further clarification.
1. The filtering method is described as novel, however similar fleet based anomaly filtering strategies have been discussed in prior work (Hendrickx et al. 2020, Li et al. 2020). A clearer articulation of what distinguishes this work is needed.

2. The fleet median filtering method assumes most turbines operate under the same conditions at any given time. This assumption may break down, when turbines are shut down for maintenance. Furthermore, in region I downstream turbines produce less power due to wake losses, hence their generator and gearbox temperatures are lower than those of upstream turbines. The authors should discuss how such conditions might affect the effectiveness of the filtering method.

3. The scalability of the pipeline is asserted and architecturally supported, but not empirically demonstrated in the manuscript. If this is claimed as a major contribution, the authors should have included for example:

- Report runtime performance under different fleet sizes

- Demonstrate linear or sublinear scaling

- Show cost, memory or latency metrics as functions of load

Citation: https://doi.org/10.5194/wes-2025-49-RC1
RC2:
'Comment on wes-2025-49', Anonymous Referee #2, 16 Jun 2025
This paper presents an autoencoder-based anomaly detection approach for failure prediction in offshore wind turbines that analyzes temperature signals from SCADA data. The proposed approach also uses a fleet-level median filtering technique to reduce non-relevant anomalies. While the work addresses important challenges in wind turbine condition monitoring, several aspects require clarification and additional validation.
The term "physics-informed" used to describe the filtering method could benefit from further clarification. The description of the filtering method in section 3.3 (distance to fleet median, windowing, multidimensional distances) appears to be primarily statistical and temporal, rather than directly incorporating physical models or principles. It would enhance clarity if the authors could explicitly detail how "physics-informed" aspects are integrated into the filtering logic.

The paper describes its cloud-based pipeline, highlighting its modularity and scalability for managing anomaly detection across wind farms. However, the contribution of this solution remains unclear as the results section focuses solely on the autoencoder and filtering methods. There are no empirical data or quantitative metrics presented to validate the pipeline's actual performance, scalability, or efficiency.

There is not enough detail about the specific failure types examined in this work. The authors mention gearbox and generator failures, but more information is needed about the failure sub-types and their locations for enhanced clarity.

While the paper acknowledges that data can differ greatly across the fleet and emphasizes the importance of having a large enough fleet for reliable median calculation, I think more discussion is needed about specific sources of variability that could affect the fleet median approach. The paper assumes that a large fleet size will normalize variations, but factors like seasonal variation, turbine location within the wind farm (wake effects, wind exposure differences), and individual operational patterns might create systematic rather than random variations. It would be helpful to have more analysis of how these location-based and operational differences are distinguished from actual anomalies, especially since some turbines might consistently operate differently due to their position rather than equipment issues.

Figures 5-7 show negative reconstruction errors and Figures 12-14 show negative anomaly scores. Since reconstruction errors are typically positive differences between predicted and actual values, it's unclear how to interpret negative values in this anomaly detection context.

I would recommend comparing the autoencoder model with other normal behavior modeling approaches, especially since a cloud-based solution has been provided in this work and deployment of autoencoder models could be expensively high. Alternative models like isolation forest, one-class SVM, or statistical approaches might offer better cost-effectiveness and computational efficiency for cloud deployment while achieving similar anomaly detection performance.
Citation: https://doi.org/10.5194/wes-2025-49-RC2
AC1:
'Answer to RC1 and RC2', Ivo Vervlimmeren, 25 Jul 2025
We are grateful for the thorough feedback and will respond to each raised point of each comment separately, and note what we did to address it and what changes we made to the manuscript.
Comment 1 (RC1)
1. The filtering method is described as novel, however similar fleet based anomaly filtering strategies have been discussed in prior work (Hendrickx et al. 2020, Li et al. 2020). A clearer articulation of what distinguishes this work is needed.

Response: We have adjusted the Introduction and Related Work sections to clarify what makes our filtering method novel and how it differs exactly from the prior work by Hendrickx et al. 2020, Li et al. 2020.

2. The fleet median filtering method assumes most turbines operate under the same conditions at any given time. This assumption may break down, when turbines are shut down for maintenance. Furthermore, in region I downstream turbines produce less power due to wake losses, hence their generator and gearbox temperatures are lower than those of upstream turbines. The authors should discuss how such conditions might affect the effectiveness of the filtering method.

Response: We have clarified in the Anomaly Filtering Methodology section that the chance of the fleet median being impacted by turbine shutdowns is negligible by referencing additional papers investigating failure rates and availability of offshore wind turbines. Furthermore, in the same section we added discussion exploring the impact it would have if it did occur. We similarly explain the effect of individual operation patterns, caused by wake or not, on the effectiveness of the filtering method, and why our method does not directly account for it. Then we also clarify that this has been accounted for in our results.

3. The scalability of the pipeline is asserted and architecturally supported, but not empirically demonstrated in the manuscript. If this is claimed as a major contribution, the authors should have included for example:

Report runtime performance under different fleet sizes

Demonstrate linear or sublinear scaling

Show cost, memory or latency metrics as functions of load

Response: We have added the Framework subsection in the Result section, describing the deployments of the pipeline framework, detailing and reporting the available metrics, and demonstrating the scaling capability of the used framework features.

Comment 2 (RC2)

1. The term "physics-informed" used to describe the filtering method could benefit from further clarification. The description of the filtering method in section 3.3 (distance to fleet median, windowing, multidimensional distances) appears to be primarily statistical and temporal, rather than directly incorporating physical models or principles. It would enhance clarity if the authors could explicitly detail how "physics-informed" aspects are integrated into the filtering logic.

Response: Thank you for pointing this out; this term is insufficiently supported, and we have removed it from our descriptions of the filtering method.

2. The paper describes its cloud-based pipeline, highlighting its modularity and scalability for managing anomaly detection across wind farms. However, the contribution of this solution remains unclear as the results section focuses solely on the autoencoder and filtering methods. There are no empirical data or quantitative metrics presented to validate the pipeline's actual performance, scalability, or efficiency.

Response: We have added the Framework subsection in the Result section, describing the deployments of the pipeline framework, detailing and reporting the available metrics, and demonstrating the scaling capability of the used framework features.

3. There is not enough detail about the specific failure types examined in this work. The authors mention gearbox and generator failures, but more information is needed about the failure sub-types and their locations for enhanced clarity.

Response: We have extended the Turbine data section in the results with all non-confidential information about the failure types that we show in the paper.

4. While the paper acknowledges that data can differ greatly across the fleet and emphasizes the importance of having a large enough fleet for reliable median calculation, I think more discussion is needed about specific sources of variability that could affect the fleet median approach. The paper assumes that a large fleet size will normalize variations, but factors like seasonal variation, turbine location within the wind farm (wake effects, wind exposure differences), and individual operational patterns might create systematic rather than random variations. It would be helpful to have more analysis of how these location-based and operational differences are distinguished from actual anomalies, especially since some turbines might consistently operate differently due to their position rather than equipment issues.

Response: We have further clarified in the anomaly filtering methodology section that fleet-wide events such as weather anomalies and seasonal variation are automatically accounted for by our method. We also elaborate there on how the effect of individual operation patterns, caused by wake or turbine quirks, affects our method. And how this has been accounted for in our results.

5. Figures 5-7 show negative reconstruction errors and Figures 12-14 show negative anomaly scores. Since reconstruction errors are typically positive differences between predicted and actual values, it's unclear how to interpret negative values in this anomaly detection context.

Response: We added a paragraph that clarifies this point to the manuscript. Positive differences are when the observed temperatures are larger than the predicted temperatures. Negative differences are the opposite. Distinguishing the two situations is more informative. For more details, see section 3.1.3 in the manuscript.

6. I would recommend comparing the autoencoder model with other normal behavior modeling approaches, especially since a cloud-based solution has been provided in this work and deployment of autoencoder models could be expensively high. Alternative models like isolation forest, one-class SVM, or statistical approaches might offer better cost-effectiveness and computational efficiency for cloud deployment while achieving similar anomaly detection performance.

Response: In section 3.1.3, we have clarified why we used the autoencoder-based NBM and referred to previous research comparing the performance of various NBM implementations. We also included a brief comparison of the computational cost of running the autoencoder model in the new Framework result section, where we discuss the pipeline framework results.
Citation: https://doi.org/10.5194/wes-2025-49-AC1

Ivo Vervlimmeren, Xavier Chesterman, Timothy Verstraeten, Ann Nowé, and Jan Helsen

Viewed

Total article views: 331 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
228	68	35	331	14	26

HTML: 228
PDF: 68
XML: 35
Total: 331
BibTeX: 14
EndNote: 26

Views and downloads (calculated since 20 May 2025)

Month	HTML	PDF	XML	Total
May 2025	74	13	3	90
Jun 2025	53	32	10	95
Jul 2025	82	12	19	113
Aug 2025	19	11	3	33

Cumulative views and downloads (calculated since 20 May 2025)

Month	HTML	PDF	XML	Total
May 2025	74	13	3	90
Jun 2025	53	32	10	95
Jul 2025	82	12	19	113
Aug 2025	19	11	3	33

Viewed (geographical distribution)

Total article views: 331 (including HTML, PDF, and XML) Thereof 331 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Aug 2025

Short summary

We introduce a new method to refine failure prediction for wind turbines, leading to better and more efficient alarming. We do this by filtering detected anomalies based on the anomalies from the whole fleet. We compare submethods and find one that removes up to 65 % of detected anomalies while leaving the failure-predicting ones. We also detail how we trained the model that generated these anomalies and discuss the construction of the scalable pipeline that was used to deploy such models.


Total:	0
HTML:	0
PDF:	0
XML:	0