Inferring Wind Turbine Operational State and Fatigue from High-Frequency Acceleration using Self-Supervised Learning for SCADA-free Monitoring

Bel-Hadj, Yacine; de Nolasco Santos, Francisco; Weijtjens, Wout; Devriendt, Christof

doi:10.5194/wes-2025-255

Preprints

https://doi.org/10.5194/wes-2025-255

Preprints

01 Dec 2025

| 01 Dec 2025

Status: this preprint is currently under review for the journal WES.

Inferring Wind Turbine Operational State and Fatigue from High-Frequency Acceleration using Self-Supervised Learning for SCADA-free Monitoring

Yacine Bel-Hadj, Francisco de Nolasco Santos, Wout Weijtjens, and Christof Devriendt

Abstract. Wind-turbine operation is commonly described using Supervisory Control and Data Acquisition (SCADA) systems. the vast majority of fleet-wide records available for analysis consist of 10-minute averages. These coarse aggregates obscure short transients and dynamic interactions, access is often restricted by proprietary control systems, and the data frequently contain gaps.

Wind-turbine operation is commonly described using SCADA systems. While high-frequency SCADA data (e.g. 1 s resolution) exist, the vast majority of fleet-wide records available for analysis consist of 10-minutes aggergates. These coarse aggregates make them insensitive to short transients. Additionally, access is often restricted by proprietary control systems, and the records frequently contain gaps. To address these limitations, a SCADA-free approach is developed in which operational states are inferred directly from high-frequency nacelle acceleration, a sensor that is increasingly being installed across wind farms, e.g. to monitor loads. The proposed method is based on a denoising autoencoder, to which a Domain-Adversarial Neural Network (DANN) mechanism and a Deep Embedded Clustering (DEC) self-supervision are added. Compact eight-dimensional representations of one-minute vibration spectra between 0 and 3 Hz are learned. Turbine-specific signatures are suppressed through a domain-adversarial regularization, leading to turbine-invariant embeddings that capture a generalized representation of turbine dynamics. A self-supervised DEC objective structures the latent space into discrete and physically meaningful operational regimes. DEC facilitates the post-hoc analysis of the learned embedding Training is performed on data from a 22 out of 44 turbines offshore wind farm sampled at 31.25 Hz, while SCADA signals are used only for validation. Strong correspondence is observed between the learned embeddings and pitch, rotor speed, power, and wind speed, with normalized mutual information above 0.8. Turbine invariance is verified through mutual-information analysis between embeddings and turbine identity. This analysis also reveals clusters within the wind farm and indicates whether the learned representation can be consistently applied across different turbines. As an auxiliary validation, regression models were trained on the learned embeddings to predict 10-minute damage-equivalent moments (DEM). The regressors were fitted using data from only five strain-instrumented turbines and then applied fleet-wide. Accurate fatigue predictions were obtained across all turbines R²= 0.96, surpassing SCADA-based baselines. This demonstrates that the learned embeddings generalize beyond operational description and contain sufficient load-related information to support fleet-wide fatigue estimation, enabling high-resolution monitoring without dependence on SCADA.

Received: 14 Nov 2025 – Discussion started: 01 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Yacine Bel-Hadj, Francisco de Nolasco Santos, Wout Weijtjens, and Christof Devriendt

Status: final response (author comments only)

RC1:
'Comment on wes-2025-255', Anonymous Referee #1, 02 Jan 2026
Summary

The manuscript "Inferring Wind Turbine Operational State and Fatigue from High-Frequency Acceleration using Self-Supervised Learning for SCADA-free Monitoring" presents a methology for estimating operationnal regimes and damage-equivalent moments (DEM) with high frequency acceleration data, without relying on SCADA.
Acceleration data is mapped onto a latent space of reduced dimension with an autoencoder. Regularization is also applied to suppress turbine-specific signature and improve generalization. A clustering approach is incorporated to the learning process in order to separate operating regimes. Finally, a separate model is trained to predict DEM from the latent representation. The methods is validated using 10-minute SCADA.

The paper presents a new and performant method that does rely on SCADA, which is a valuable contribution.
The step by step approach is clearly defined, and the paper easy to read.
Therefore, the paper should be published in the Journal after considering the following remarks.

Comments and questions:

The first paragraph of the abstract "Wind turbine operation (...) contain gaps" is repeated twice.

Line 15 of the abstract : missing a dot after "DEC facilitates the post-hoc analysis of the learned embedding".

Section 2.2.2 : Denoting the dimension of the latent space by L (line 232)is ambiguous, since L is already used for the length of the time window in 2.2.1, line 213. Inconsistency with line 240, where the dimension of latent space is denoted as d.

Line 232 : L<

Section 2.2.7 : How did you choose the number of epoch t_warm for pretraining ? Has the lost converged after 100 epochs, or is it still decreasing ?

How does each term compare on the loss ? Do they weight equally ?

Line 320 (and line 368) says "This sequencing avoids competition between objectives". Could you clarify why?

Did you visualize how each loss changes as training progresses ? Optimizing the composite loss does not necessarily mean that all three losses are decreasing. One loss may drive the composite loss, while another one is not moving, or even increasing.

Section 2.4 : When you mention "A linear regression head", does it refer to the last layer of the LSTM network, or to an independent model ? In the latter, how is the LSTM trained ?

Line 352 : Unless I am mistaken, the abbreviation MLP (Multilayer perceptron?) is not introduced explicitly, which could confuse the reader.

Section 2.6 : Isn't two weeks a short period for testing, compared to one year for training ? Is the autoencoder performance similar in all periods of the year ?

line 488-490 : "Since SCADA signals were assumed constant within each 10-minute interval, these values should be regarded as conservative lower bounds of the attainable correspondence; in other words, the model likely performs better than these metrics suggest." Is this true ? One could argue that high frequency data is more complex and therefore more difficult to estimate from acceleration data.

Line 500 : If we were to perform post-hoc clustering, would we get clusters similar to DEC ? One also has to make sure that the clustering loss does not affect the latent space mapping negatively, in particular for DEM estimation which is its other application.

Line 512 : spelling mistake : "there" instead of "their" (twice).

What model is used for the SCADA baseline ?

Line 535 : "The baseline combines SCADA variables with handcrafted features derived from acceleration data". Why not using only SCADA as a baseline ? If acceleration is included, then information received by the autoencoder+LSTM approach is partially included in the baseline. As a result, the comparison evaluates the performance of the model architecture and data processing more than the difference between SCADA and acceleration data.
Citation: https://doi.org/10.5194/wes-2025-255-RC1
RC2: 'Comment on wes-2025-255', Anonymous Referee #2, 14 Jan 2026

The paper presents an impressive combination of advanced tools for monitoring a population of wind turbines, based solely on acceleration data. To my knowledge, although none of the methods used are new in themselves, their combination for farm-level monitoring is innovative and warrants publication.

Motivations and methods are well introduced and described with an appropriate level of detail. Very interesting results are shown regarding three aspects: the learning of invariant features within a population of turbines, the classification of operational states and virtual sensing for fatigue estimation.
Overall, I would recommend acceptance after minor revisions.
Major comments

--------------
(1) Some methodological choices would benefit from sharper justification. In §2.2.6, the reason for why using specifically a deep embedding clustering approach (compared to simpler methods such e.g. simple k-means) is not really justified. Can you elaborate on why this approach is preferable here? For instance, are the cluster boundaries stable when the number of requested clusters is varied?
(3) In §3.3, I am not sure to understand what you mean by "train-on-4/test-on-1" : is that you train the models on 4/5 of the data, and test on 1/5 of the data, per fleet leader turbine? Please clarify this point. Assuming that my understanding is correct (correct me if I am wrong!), this raises some questions to me, when you state that "the learned embeddings preserve sufficient load-related information to enable fleet-wide fatigue estimation without dependence on SCADA or manually engineered features" and follow by the claim of "[feeling] more comfortable deploying the model to the whole windfarm".

* Per FL turbine, what about the DEM prediction performance with and without domain-adversarial training? This would help to quantify the risk of over-suppression of physically meaningful variability (e.g. soil–structure interaction differences) for DEMs.

* Practically, a very valuable workflow would be to learn the prediction model on some fleet leader turbines, then evaluate the approach on other turbines (not instrumented with strain gauges). In your case, learning e.g. on a subset of the FL turbines and evaluating on the remaining FL turbine(s), what would be the R2/MSE metrics (since you have the reference data)? This would show the generalization you refer to.

Minor comments

--------------
lines 230-235: the respective meanings of dimensions M vs. F is not very clear, please clarify.
line 420: print typo in "R2".
line 500: missing "there is".
line 540: ends suddenly.

Citation: https://doi.org/10.5194/wes-2025-255-RC2

Yacine Bel-Hadj, Francisco de Nolasco Santos, Wout Weijtjens, and Christof Devriendt

Viewed

Total article views: 400 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
279	101	20	400	25	24

HTML: 279
PDF: 101
XML: 20
Total: 400
BibTeX: 25
EndNote: 24

Views and downloads (calculated since 01 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	126	42	13	181
Jan 2026	139	50	7	196
Feb 2026	14	9	0	23

Cumulative views and downloads (calculated since 01 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	126	42	13	181
Jan 2026	139	50	7	196
Feb 2026	14	9	0	23

Viewed (geographical distribution)

Total article views: 382 (including HTML, PDF, and XML) Thereof 382 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Feb 2026

Short summary

We show that simple vibration sensors on wind turbines can reveal how each machine is operating without relying on control system data. By learning patterns from short acceleration segments, our model identifies turbine behavior, detects changes in operation, and tracks events over time. These patterns also support estimating fatigue, providing a new way to understand turbine performance using only vibration measurements.


Total:	0
HTML:	0
PDF:	0
XML:	0