A machine-learning-based approach for better prediction of fatigue life of offshore wind turbine foundations using smaller data sizes

Mujtaba, Ahmed; Weijtjens, Wout; Sadeghi, Negin; Devriendt, Christof

doi:10.5194/wes-11-443-2026

Articles | Volume 11, issue 2

https://doi.org/10.5194/wes-11-443-2026

Articles | Volume 11, issue 2

Research article

10 Feb 2026

Research article |

| 10 Feb 2026

A machine-learning-based approach for better prediction of fatigue life of offshore wind turbine foundations using smaller data sizes

Ahmed Mujtaba, Wout Weijtjens, Negin Sadeghi, and Christof Devriendt

Abstract

As offshore wind turbine (OWT) foundations approach the end of their design life, the industry is increasingly focused on strategies for lifetime extension. As fatigue is the design driver for foundations of OWTs, reliable fatigue damage predictions are essential to support informed decisions for lifetime extensions. While simulation-based fatigue life reassessments are common, data-driven approaches using measured strain data have emerged as an alternative that can reduce modeling uncertainties. But, data-driven approaches face challenges, as having access to strain data over the entire past lifetime is not an industry standard. Often, measurement campaigns are only kicked off when a lifetime extension is considered, thus limiting the availability of strain data. However, environmental and operational conditions (EOCs) of the wind turbines are usually recorded during the whole operational period. Using limited strain measurements and long-term EOCs to estimate fatigue damage in unmonitored periods during the lifetime of the turbine requires temporal extrapolation techniques. Existing work on this topic presents several extrapolation methods, including linear time-based extrapolation, binning based on correlations between EOCs and average damage, and machine learning (ML) models. The accuracy of these methods depends on factors such as the selected EOC parameters, the duration and starting point of available strain data, the power rating and the type of the wind turbine, as well as the type and architecture of the extrapolation model used. This study presents a novel machine-learning-based extrapolation model using random forest (RF) for the temporal extrapolation of strain measurements. A comparative analysis of a novel RF model with previously identified binning models is presented. The extrapolation performance is validated using 5 years of measured strain, Supervisory Control and Data Acquisition (SCADA), and wave data from a 3 MW and a 9 MW OWT installed on monopile foundations in the Belgian North Sea. Using a sliding window approach on the available monitoring data, we estimate and compare the statistical uncertainty in fatigue life predictions of various extrapolation models. The results indicate that wave parameters play a more significant role in fatigue prediction for larger turbines of 9 MW compared to smaller ones of 3 MW power rating. For limited data sizes – less than 12 months – the proposed RF model demonstrates superior performance, offering more reliable fatigue life predictions with reduced statistical uncertainty. However, for longer datasets of greater than 12 months, the performance advantage of the RF model over binning methods becomes less pronounced. For 3 MW OWTs with datasets greater than 18 months, the RF model is outperformed by binning methods.

Download & links

Article (PDF, 7137 KB)

Download & links

How to cite.

Received: 16 Sep 2025 – Discussion started: 25 Sep 2025 – Revised: 11 Dec 2025 – Accepted: 16 Jan 2026 – Published: 10 Feb 2026

1 Introduction

As offshore wind farms mature, extending the operational lifetime of wind turbines has become a pressing concern for the wind energy industry. According to WindEurope (2017), a substantial share of the installed wind capacity in the European Union is expected to reach the end of its design life between 2020 and 2030. Decommissioning these assets without extending their service would hinder progress toward the EU's 2030 target of achieving 50 % of electricity from renewable sources. Therefore, lifetime extension, alongside re-powering and new installations, is vital for meeting long-term sustainability goals. As highlighted by Shafiee (2024), extending the operational life of OWTs offers significant economic and environmental benefits, including reduced levelized cost of energy (LCOE) and lower emissions.

Fatigue is a governing factor in the structural design of wind turbine support structures, which are primarily optimized for dynamic- rather than static-loading conditions (Sparrevik, 2019). Monopile foundations are typically designed for a service life of 20–25 years. However, findings from some structural health monitoring (SHM) campaigns suggest that actual fatigue loads may be lower than anticipated, revealing unexploited fatigue capacity and motivating detailed reassessments for lifetime extension purposes (Tewolde et al., 2018). For example, in the case of lifetime extension of the Samsø offshore wind farm in Denmark, the Danish Energy Agency required detailed fatigue reassessments using updated load conditions to support lifetime extension permits (Buljan, 2025). International guidelines for lifetime extension, such as those outlined in DNV (2016), recommend fatigue life reassessments based on updated models that incorporate measured environmental and operational data. Although traditional fatigue reassessments rely heavily on simulation-based models, the availability of measured strain, as typically provided by SHM systems, and long-term EOC data open the door for data-driven alternatives that can reduce uncertainties associated with initial design assumptions in the simulation-based models (Kinne and Thöns, 2023).

To support lifetime assessments, two primary approaches exist: simulation-based reassessments using updated models, and data-driven methods using measured strains. The latter mitigates modeling assumptions but introduces new challenges, such as the high cost of installation and maintenance of strain gauges on OWTs (Bezziccheri et al., 2017) and limited data availability (Pacheco et al., 2023). Strain gauges are typically installed at a few critical locations, requiring spatial extrapolation to predict strains at locations where no direct measurements are available. Strain measurements are limited in duration as having strain data over the entire past lifetime is not an industry standard and because of the challenges associated with traditional sensor deployment in harsh marine environments, such as sensor failures due to corrosion caused by saltwater and humidity, and high costs related to complex installation logistics (Encalada-Dávila et al., 2025). Often, measurement campaigns are only started when a lifetime extension is considered, thus limiting the availability of strain data. On the other hand, a Supervisory Control and Data Acquisition (SCADA) system is typically installed on OWTs to capture and record EOCs such as wind speed, power, rotational speed, etc., to enable the wind farm operators to track and control the turbine performance in real time (Moynihan et al., 2024). The available data pave the way for temporal extrapolation techniques that use limited strain measurements and long-term EOCs to estimate fatigue damage in unmonitored periods during the lifetime of a turbine.

Prior research has extensively explored the extrapolation of strain and load data for offshore wind turbines, both spatially, to uninstrumented locations, and temporally, beyond the measurement window. Spatial extrapolation studies include those targeting different positions on the same turbine (Ziegler et al., 2019; Moynihan et al., 2024; Fallais et al., 2025; Ziegler et al., 2017; Zou et al., 2023; Zhang et al., 2024; Simpson et al., 2025; Encalada-Dávila et al., 2025), as well as farm-wide extrapolation approaches where measurements from one or a few instrumented turbines, called fleet leaders, are extended to others across the farm (de N Santos et al., 2024; Weijtens et al., 2016; Noppe et al., 2020; Pacheco et al., 2023).

Temporal extrapolation methods vary in complexity, ranging from simple linear techniques to binning strategies that relate fatigue damage to EOCs to more advanced ML approaches. Notable studies have assessed the accuracy of these methods using various datasets (Hübler and Rolfes, 2022; Ziegler and Muskulus, 2016; Weijtens et al., 2016). For example, Hübler et al. (2018) evaluate linear extrapolation, also called 0-dimensional (0D) binning; 1D binning, where 10 min fatigue damage is split into wind speed bins; and 2D binning, splitting fatigue damage into wind speed and wind direction bins using strain measurements from a 3 MW OWT installed on a monopile. Hübler et al. (2018) conclude that strain measurements of 9 to 10 months provide a representative and unbiased dataset with 1D binning using wind speeds giving the most reliable fatigue life estimations. Hübler and Rolfes (2022) study the performance of multi-dimensional binning extrapolation, artificial neural networks (ANNs) and Gaussian process regression (GPR) trained using multiple 1-year periods of measured strains. In Hübler and Rolfes (2022), binning approaches using wind speed correlations provide the best results. Pacheco et al. (2022) summarize the steps for calculating and extrapolating fatigue damage in wind turbines using strain measurements. Apart from estimating the fatigue life of an instrumented turbine, Pacheco et al. (2022) conclude that the results from instrumented turbines can be extrapolated to uninstrumented turbines in the same wind farm. Pacheco et al. (2023) use a damage capture matrix formed using 2D binning in wind speed and turbulence intensity to extrapolate strain measurements from one turbine to other turbines in an onshore wind farm. Pacheco et al. (2023) recommend using 1-year monitoring periods and conclude that the uncertainties in extrapolations decrease with increasing monitoring periods.

The rapid development and advances in computer science and its applications in wind energy pose machine learning models as a favorable option for the temporal extrapolation of fatigue damage in wind turbines (He et al., 2022; Raju et al., 2025). Literature focuses on using ML models as surrogates trained on simulated data to predict fatigue loads on each turbine in a wind farm (Bossanyi, 2022; Gasparis et al., 2020; Singh et al., 2022). There is limited literature on the use of ML models trained on strain measurements and used for fatigue damage predictions such as de N Santos et al. (2023), who use physics-guided learning of neural networks trained on 9 months of strain measurements for long-term fatigue damage estimation using SCADA and acceleration data. In wind energy applications, the random forest (RF) model is proved to be a powerful ensemble learning method, widely used for regression and classification tasks. RF models adapt excellently to high-dimensional data and large-scale datasets, and are robust in dealing with missing values and unbalanced datasets (Karadeniz, 2025). For example, Karadeniz (2025) compares RF, long short-term memory network (LSTM) and gated recurrent unit (GRU) for predicting total harmonic distortion voltage (THDV) of offshore wind farms to conclude that RF models outperform LSTM and GRU in predicting THDV with lowest root mean squared error (RMSE). Similarly, Zhou et al. (2016) use an RF regression model to predict short-term power production of a wind farm and Rouholahnejad and Gottschall (2025) use an RF model to extrapolate near-surface wind speed up to 200 m.

A literature review highlights the fact that the sensitivity of ML- and binning-based extrapolation models to different turbine power ratings remains poorly understood. Many studies assume homogeneity in turbine design and operating conditions, often focusing on a single turbine type and neglecting spatial variability across the farm (Bouty et al., 2017). Consequently, it is unclear how extrapolation performance varies across turbines with different power ratings or environmental exposures.

Moreover, the validation of these models is often limited to short timescales (e.g., several months), which restricts confidence in their long-term predictive capability. Although several binning-based and regression-based methods have been proposed (Ziegler et al., 2017; Pacheco et al., 2023; Sadeghi et al., 2023 b), their comparative effectiveness and robustness under varying data availability, input features, and turbine power ratings are not yet well established. Especially in discussions of lifetime extension for aging turbines, methodologies that require shorter measurement campaigns are advantageous. This raises key questions like: what is the minimum duration of the monitoring period required for statistically reliable fatigue life predictions? and: can machine learning models reduce this monitoring period without sacrificing accuracy or increasing uncertainty?

Despite the increasing attention to data-driven extrapolation, key gaps remain in the literature:

A systematic comparison of binning and machine learning, specifically random forest models – for fatigue life prediction across multiple turbine sizes – is missing.
The relative importance of SCADA and wave parameter selection for these models to predict fatigue damage has not been thoroughly assessed for turbines of different power ratings.
The ability of these models to predict fatigue damage across different directions (fore–aft, side–side, or single-sensor measurements) has not been thoroughly evaluated for turbines of varying capacities.
The sensitivity of these models to data availability, including variations in dataset size and measurement start time, is underexplored.

While previous studies have compared binning approaches with machine learning models such as ANNs and GPR (Hübler and Rolfes, 2022; de N Santos et al., 2023), these comparisons were generally performed for a single turbine and often using relatively limited datasets. Similarly, earlier work has examined the sensitivity of data-driven fatigue extrapolation to dataset length and measurement start time (Hübler and Rolfes, 2022) but again typically for a single turbine. Consequently, a systematic evaluation of RF models across multiple turbine sizes using extensive monitoring data, together with a multi-directional assessment of model performance and parameter relevance, remains missing.

This study addresses these gaps by first introducing a novel ML-based extrapolation model using random forest for the temporal extrapolation of strain measurements. This model is compared with state-of-the-art temporal extrapolation techniques for fatigue life prediction and validated using 5 years of measured strain, SCADA, and wave data from two offshore wind turbines: a 3 MW and a 9 MW turbine, both installed on monopiles in the Belgian North Sea. We analyze the influence of dataset size, measurement start time, model dimensionality, and feature selection on extrapolation accuracy. Multiple configurations of random forest models are tested, including recursive feature elimination with cross-validation (RFECV) and state-specific modeling, to explore trade-offs between model complexity and predictive performance.

The findings provide new insights into the reliability, convergence behavior, and practical limitations of data-driven fatigue extrapolation models, offering valuable guidance for their deployment in support of lifetime extension assessments.

The remainder of the paper is structured as follows: it begins with the objective section, leading into the measurement setup for strain, collecting SCADA, and wave data. The paper then explains the methodology section. This section elaborates on model development, with a focus on feature selection and binning. The Methodology section describes the RF model and outlines the technique for statistical uncertainty estimation applicable to these extrapolation models and details the approach for damage extrapolation used in estimating fatigue life. The paper then concludes with a presentation of the results of fatigue lifetime predictions in various directions, along with discussions of the results.

2 Objective

Consider initiating a strain measurement campaign aimed at reassessing the fatigue life of OWT with the goal of extending their lifetime. How long should measurements be conducted to ensure a reliable estimate of fatigue life? The growing number of aging OWTs demands fast, reliable, and data-efficient methods to accurately predict their end of life (EOL). Addressing this question is important since it can result in reduced costs for prolonged measurement campaigns and more time-efficient fatigue life estimates. Precise predictions of fatigue life are essential for making informed lifetime extension decisions; however, this requires balancing the duration of measurement campaigns with both the financial costs of data acquisition and the accuracy of the predictions.

https://wes.copernicus.org/articles/11/443/2026/wes-11-443-2026-f01

Figure 1A timeline showing model developed using monitoring data used for hind-casting and forecasting using SCADA-wave data where available, and forecasting in future using expected distributions of EOCs.

A machine-learning-based approach for better prediction of fatigue life of offshore wind turbine foundations using smaller data sizes

3.1 Strain data

3.2 SCADA and wave data

4.1 Model development

4.1.1 Feature selection

4.1.2 Binning extrapolation

4.1.3 Random forest model

4.2 Statistical uncertainty estimation

4.3 Damage extrapolation for TEOL

5.1 Feature selection

5.2 Single sensor

5.3 Fore–aft direction

5.4 Side–side direction

5.5 Prediction performance for different target parameters

B1 RFECV-Global features

B2 RFECV-Statewise features

4.3 Damage extrapolation for T_EOL