Research article 17 Aug 2020
Research article  17 Aug 2020
A surrogate model approach for associating wind farm load variations with turbine failures
 DTU Wind Energy, Risø Campus, Frederiksborgvej 399, 4000 Roskilde, Denmark
 DTU Wind Energy, Risø Campus, Frederiksborgvej 399, 4000 Roskilde, Denmark
Correspondence: Laura Schröder (lausc@dtu.dk)
Hide author detailsCorrespondence: Laura Schröder (lausc@dtu.dk)
In order to ensure structural reliability, wind turbine design is typically based on the assumption of gradual degradation of material properties (fatigue loading). Nevertheless, the relation between the wakeinduced load exposure of turbines and the reliability of their major components has not been sufficiently well defined and demonstrated. This study suggests a methodology that makes it possible to correlate loads with reliability of turbines in wind farms in a computationally efficient way by combining physical modeling with machine learning. It can be used for estimating the current health state of a turbine and enables a more precise prediction of the “load budget”, i.e., the effect of loadinduced degradation and faults on the operating costs of wind farms. The suggested approach is demonstrated on an offshore wind farm for comparing performance, loads and lifetime estimations against recorded main bearing failures from maintenance reports. The validation of the estimated power against the 10 min supervisory control and data acquisition (SCADA) power signals shows that the surrogate model is able to capture the power performance relatively well with a 1.5 % average error in the prediction of the annual energy production (AEP). It is found that turbines positioned at the border of the wind farm with a higher expected AEP are estimated to experience earlier main bearing failures. However, a clear connection between the load estimations and failure observations could not be confirmed in this study. Finally, the analysis stresses that more failure data are required in future work to enable statistically significant associations of the observed main bearing lifetimes with load exposures across the wind farm and to validate and generalize the suggested approach and its associated findings.
1.1 Motivation
For the past decades, wind energy has been one of the world’s fastestgrowing sources of renewable energy, and it is expected to show a similar trend of growth in the future. The development of wind energy with increased wind turbine size and rated capacity has a significant influence on the operation and maintenance (O & M) costs (Gonzalez et al., 2016). Together with poor site accessibility as for offshore installations where wind turbines might be inaccessible for 4–5 months per year (Van Bussel and Zaaijer, 2001), failures are causing severe consequences in terms of downtime and maintenance costs (Bangalore and Patriksson, 2018). Therefore, optimizing the wind farm operation by improving performance and reliability in order to minimize the levelized cost of energy (LCoE) is gaining more and more importance. The O & M costs of wind turbines amount to around 25 % of the LCoE for onshore wind turbines and 35 % for offshore wind turbines (Dinwoodie et al., 2012). For reducing the O & M costs, monitoring and predicting the condition of the turbine's components in terms of operational health, material degradation and remaining lifetime plays an important role. Improving the detection rate of a monitoring system for blades, drive train, tower and grout from 60 % to 99 % for instance results in an increase of lifetime levelized savings by 32 % (May et al., 2015).
Most current wind turbine maintenance strategies are timebased and assume a reliability degradation dependent on the system age (Reder and Melero, 2018). Throughout the lifetime of a turbine, its failure rate is assumed to follow a Weibull distribution with a higher failure frequency in the first years of operation, followed by a longer period of a lower constant failure rate. Towards the end of life, an increasing failure rate can be observed again due to wear and damage accumulation caused by fatigue loading (Mudholkar and Srivastava, 1993; Hahn et al., 2007).
However, the relation between the load exposure of turbines in a wind farm and their component reliability has not been sufficiently well defined and demonstrated. Characterizing this relation would enable to assess the current health state of a turbine and help to better understand the effect of loadinduced degradation and faults on the operating costs of wind farms. Especially for offshore wind farms where failures can lead to high downtime, this plays an important role for reducing the LCoE.
1.2 Objective
The objective of the present paper is twofold:

Firstly, the aim is to suggest a methodology that makes it possible to investigate the correlation between loads and component reliability of turbines in wind farms, by combining data (10 min averages from a supervisory control and data acquisition, SCADA, system) and physical modeling (HAWC2 aeroelastic load simulations) with machine learning.

Secondly, the suggested approach is demonstrated on a case study to investigate whether the loading conditions can be clearly associated with the observed reliability of the main bearing.
1.3 Background and related work
Information about the turbine reliability can be derived either by modeling structural reliability parameters (e.g., failure frequency, likelihood of observing failure over a reference period) or by using collected data from inspection and maintenance reports (e.g., observed failure rates, observed time to failure). Opposed to the assumption that turbine reliability only decreases with operational time, several studies have demonstrated the effect of meteorological conditions on the turbine reliability, such as Reder and Melero (2016) and Tavner et al. (2006). Also example studies of the influence of wake effects on the turbine reliability can be seen in Kim et al. (2012) and Huang and Chiang (2006). Previous work aimed at defining a relationship between fatigue and extreme loading conditions on a turbine and its reliability can be found in Colone et al. (2018) and Scott et al. (2012). Colone et al. (2018) modeled the impact of turbulence induced loads on the fatigue reliability of offshore wind turbine monopiles. In Scott et al. (2012) the damage effect of extreme and transient loads on the drivetrain reliability is estimated. However, these studies focus on modeling the reliability, rather than investigating observed failure rates from measurement data. Therefore, the present paper aims at suggesting a methodology for modeling various wakeinduced loads, performance and estimated lifetime, and comparing it against measured failure rates and times to failure. The suggested approach can be used for modeling various performance and load variables under different operating conditions.
Modeling wakeinduced loads in wind farms is a crucial step for fatigue load assessments both in the design process and during the operational phase of a wind farm where load measurements are costly and therefore rarely conducted. Carrying out aeroelastic simulations each time a load assessment is required is impractical. Therefore, various methods have been developed to reduce the number of computations required. A popular approach is the use of socalled surrogate models which are reducedorder models that are trained on a limited number of aeroelastic simulations. Once the surrogate model has been trained, multiple sitespecific load assessments at arbitrary sites can be obtained at a low computational cost and without the need of new aeroelastic simulations. Examples are Toft et al. (2016) and Müller et al. (2017), who propose a methodology based on response surface (RS) for sitespecific load estimations. Teixeira et al. (2017) demonstrate the use of kriging surfaces for fatigue load estimations of offshore wind turbines.
These approaches focus solely on one surrogate model and use a relatively small variable space. In Dimitrov et al. (2018) the surrogate model framework is expanded with the motivation to fully characterize the wind field conditions, as well as to enable comparing different surrogate models within the framework. Based on this framework, a benchmark of different surrogate models in Schröder et al. (2018) has shown that an artificial neural network (ANN)based surrogate model outperforms other methods using polynomial chaos expansion and RS in terms of model accuracy, computational time as well as convergence stability.
The abovementioned approaches are only applicable for estimations on single turbines. In Dimitrov (2019) the surrogate modeling framework is extended in order to estimate wakeinduced loads for a wind farm with arbitrary layout. In this approach the number of simulations required for modeling different wake conditions is reduced by parametrizing the wake effects. This method has been demonstrated in a case study on the Horns Rev I wind farm (Galinos et al., 2016) and further validated against measurement data in Dimitrov and Natarajan (2019). In the present study, the abovementioned wind farm surrogate modeling framework is expanded for estimating further performance and lifetime parameters under additional operating condition, and its predictions are compared against observed failures.
The suggested methodology for comparing wakeinduced loads against the component reliability of turbines is illustrated in Fig. 1. It can be used to estimate various performance, loading and lifetime characteristics of a wind farm. The approach can be applied to any wind farm with arbitrary layout and turbine type as long as data recorded from the SCADA system are available together with observed failure events, e.g., from inspection and maintenance reports. The framework can be split into six main steps which are more thoroughly described in the following sections:

define variable input space and create samples X_{1} from predefined distributions and boundaries;

create highfidelity simulation database for normal operation S_{1}(X_{1}), to be used as training inputs for a surrogate model;

train a surrogate model M(X) (an ANN) mapping undisturbed environmental conditions to load and power outputs;

obtain sitespecific load and power estimations under normal operating conditions, M(X_{site}), by sampling the surrogate model over the joint distribution of sitespecific environmental conditions X_{site};
 a.
establish a sitespecific joint probability distribution of undisturbed wind conditions by analyzing measured data;
 b.
carry out a Monte Carlo (MC) simulation with the surrogate model, drawing samples X_{site} from the sitespecific joint distribution;
 a.

add other operational conditions (e.g., transients such as startups and shutdowns);
 a.
simulate scenarios with the selected (transient) operating conditions S_{2}(X_{2});
 b.
analyze SCADA data and fault and event logs to establish the annual frequency of the events;
 c.
weight estimates according to the probabilities of the operational states w_{1} and w_{2} obtained from data;
 a.

compute a summary statistic S_{sum} to be considered as a proxy for component reliability and compare estimates against observed failure events.
2.1 Define variable input space and sample from predefined distributions
Selecting the variable input space is a crucial step in the creation of the simulation database. The performance and mechanical load variations of turbines within a wind farm mainly depend on the wakeinduced turbulence. In this analysis, the wakeinduced turbulence is characterized by variables that can be grouped into ambient conditions X_{amb}, turbine position X_{pos} and wakeinduced effects X_{farm} based on the study by Dimitrov (2019):

X_{amb}=[u, σ_{U}, α, H_{s}, T_{p}, Δ] (mean wind speed, turbulence, wind shear, significant wave height, wave peak period and wind‐wave misalignment);

X_{pos}=[Z_{w}] (water depth);

X_{farm}=[R_{D}, γ, N_{rows}] (row spacing, wake incidence angle and number of disturbing turbines).
The environmental variables from X_{amb} and X_{pos} include the most relevant factors that affect mechanical loads on both the nacelle and the foundation. The variables from X_{farm} intend to describe the relative position of the wake source(s) with respect to the disturbed turbine such that the model is generalized for arbitrary wind farm layouts. The choice of the three wakeinduced variables of X_{farm} is explained more in detail in Dimitrov (2019).
To make sure that the model is able to cover a wide range of conditions, the distributions and boundary functions of each variable have to be defined accordingly. Since some of the ambient variables are conditional on each other, the variable space is generated by sampling from their joint probability distribution using a Rosenblatt transformation (Rosenblatt, 1952) that takes into account the predefined distributions and bounding functions (Dimitrov, 2019).
It should be noted that the variable space should be defined specifically for each use case. For instance, if only nacelle load estimates are of interest, the variables for waveinduced loads (H_{s}, T_{p}, Δ) can be neglected since they most likely will not effect the final estimates.
2.2 Create aeroelastic simulation database
The set of sampled input variables which can be represented as X=[X_{amb}, X_{pos}, X_{farm}] is then used for simulating the desired output variables S(X) (see Fig. 2). For running aeroelastic time series simulations a wind flow model as well as a wake model that allows the superposition of multiple wake sources N_{rows} for modeling wakeinduced effects is required. Furthermore, a structural model, aerodynamic model and the controller of the turbine need to be included in order to model the structural response. In case this approach is applied to offshore turbines, also a hydrodynamic model and soil model (or alternatively a simplified apparent fixity model) are necessary for including the effects of hydrodynamic and soil forces.
Subsequently, the time series simulations S_{t}(X) are postprocessed in order to obtain 10 min statistics, lifetime indicators and damageequivalent fatigue loads (DELs) for assessing performance, lifetime or fatigue. By applying the Palmgren–Miner's rule the lifetime DEL can be formulated for a given Wöhler exponent m using the following equation:
with the 1 Hz equivalent fatigue load R_{eq} that is simulated, e.g., for 600 s corresponding to n_{eq}=1 Hz ⋅ 600 s = 600 equivalent cycles, the joint probability p(u, θ) of the wind speed u and wind direction θ and the number of equivalent cycles n_{eq,L} corresponding to operation over the intended lifetime of the wind farm.
For assessing the component reliability of a main bearing, the fatigue life indicator L_{10} for which 10 % of the bearings would not survive (Calderon, 2015) can be calculated:
where n is the rotational speed, a_{i}, i=1, 2, 3 are life correction coefficients, C is the dynamic bearing rating and $p=\mathrm{10}/\mathrm{3}$ is the life exponent for roller bearings. A high value indicates a longer main bearing lifetime. The dynamic equivalent force P_{d} is defined as a hypothetical force resulting in the same lifetime as if acting on the bearing center as pure radial load (in case of radial bearing) or pure axial load (in case of thrust bearing) (NTN, 2009). It can be calculated using the radial force F_{r} and the axial force F_{a} as follows:
with calculation factors b_{x} and b_{y} that depend on the specific roller bearing type; i.e., if $\frac{{F}_{\mathrm{a}}}{{F}_{\mathrm{r}}}\le \mathrm{0.27}$, then b_{x}=1 and b_{y}=2.5. Otherwise if $\frac{{F}_{\mathrm{a}}}{{F}_{\mathrm{r}}}>\mathrm{0.27}$, then b_{x}=0.67 and b_{y}=3.7.
2.3 Train surrogate model
Once the simulation database is created, the surrogate model can be trained using the set of input variables X and set of target variables S(X) as shown in Fig. 3. As mentioned before, the selection of which variables should be included in the target set S(X) depends on the intention of the specific use case.
The transfer function for mapping the input variables to the targets can be any type of regression model. However, this study suggests using feedforward ANNs, since they were found to be the most suitable method for the task of sitespecific load estimations in terms of prediction time, accuracy and convergence robustness with smaller training samples (Schröder et al., 2018).
Feedforward ANNs (Goodfellow et al., 2016) consist of multiple fully connected layers. In each layer the input x is transformed linearly to $\mathit{z}=\mathbf{W}\mathit{x}+\mathit{b}$ with weight matrix W and bias b. After the result is passed through a nonlinear activation function σ(z), it will serve as input to the next layer ${\mathit{x}}^{[i+\mathrm{1}]}=\mathit{\sigma}\left({\mathit{z}}^{\left[i\right]}\right)=\mathit{\sigma}({\mathbf{W}}^{\left[i\right]}{\mathit{x}}^{\left[i\right]}+{\mathit{b}}^{\left[i\right]})$. When training an ANN, the weight parameters W and bias parameters b can be estimated by minimizing the cost function J(W, b). The cost function is a measure of the difference between the model prediction g(W, b, x) and the observed output y. When using a leastsquares approach the cost function can be calculated as shown in Fig. 4, where N_{e} is the number of training samples.
2.4 Sitespecific estimations using surrogate model
In order to deploy the trained surrogate model to give estimations for the desired offshore wind farm, a new input data set has to be generated that includes the sitespecific ambient environmental conditions, as well as farmrelated parameters for the specific wind farm. Similarly as in Sect. 2.1, the ambient input variables X_{amb} are sampled with a Monte Carlo simulation using Rosenblatt transformation in order to construct the sitespecific joint probability distributions with the wind direction θ being the first independent variable. The distributions of these ambient conditions can be obtained from any available measured or modeled source, such as SCADA data or a meteorological mast. Since the input variables X_{pos} and X_{farm} on the other hand depend on the turbine position within the wind farm, they have to be generated for each turbine separately. Regarding the wakerelated input X_{farm}, the row spacing R_{D}, wake incident angle γ and number of upstream turbines N_{rows} have to be collected for each wind direction sector separately. The trained ANN is then applied using these input variables X_{amb}, X_{pos} and X_{farm} for estimating the output S(X_{site}). In case there are several lines of turbine rows upstream, the output is estimated for each equally spaced turbine line and the most conservative estimate is selected. Algorithm 1 shows the implementation steps required for the abovementioned procedure. For a more detailed explanation of this approach including an implemented example case, see Dimitrov (2019).
The estimations from the ANN are then simply summed up for each turbine. A probability weighting of the samples is not necessary since they are already generated taking into account the probability distributions of the input space. The annual energy production (AEP) of each turbine can be calculated using Eq. (5) with the number of Monte Carlo samples N_{sim}, estimated electrical power $\widehat{{P}_{i}}$ and the number of operating hours per year N_{hours,y}. The DEL values can be summed up according to Eq. (6). Note that before the summation, the estimations ${\widehat{L}}_{i}$ need to be inverted to 1 Hz fatigue range sums ${L}_{\mathrm{inv},i}=\mathrm{600}\cdot {\widehat{{L}_{i}}}^{m}$. Afterwards the sum can be converted back to lifetime DEL using the number of 1 Hz equivalent load cycles corresponding to 25 years N_{sec,L}.
2.5 Add other operational conditions (e.g., transients)
Further scenarios can be included by simulating selected operating conditions (e.g., startup, shutdown events). When summing up estimations for normal operation with these selected conditions, weights for the probability of the operational state need to be included in Eqs. (5) and (6). The probability of the turbine operating in normal, startup and shutdown condition varies per wind speed and can be extracted from SCADA data or fault and event logs. For transient events the probabilityweighted AEP and DEL can be calculated using the number of transient events per year N_{TR,y}.
It follows that the probabilityweighted AEP and DEL for normal operation can be calculated using Eqs. (9) and (10).
Finally, the weighted AEP and lifetime DELs can simply be added.
In the following case study the suggested methodology is applied to an offshore wind farm to assess which conditions might be correlated with the component reliability of a main bearing. Main bearings support the rotor shaft, which transfers the aerodynamic torque from the rotor into the gearbox while reducing nontorque loads entering the gearbox (Calderon, 2015). With around USD 150 000 to 300 000 per failure (Dvorak, 2013) unplanned bearing replacement costs are a significant part of the total yearly O & M expenses, which can be approximately USD 645 000 for an offshore 5 MW turbine (Stehly and Beiter, 2020). Figure 4 illustrates the loads considered in this study which are expected to have highest impacts on the main bearing.
The performance, fatigue loads and main bearing lifetime are estimated within the offshore wind farm and compared against the observed failure records. The data used in this study consist of a 5year SCADA data set with a sampling rate of 10 min. The bearing type observed in this study is a SKF CARB toroidal roller bearing in nonlocating position. The main bearing failure records are available from inspection and maintenance reports for the same period. Figure 5 shows the normalized failure rate of the main bearing, i.e., the frequency at which the main bearing has failed. Figure 6 illustrates the inverted time to failure (TTF) $\frac{\mathrm{1}}{\mathrm{TTF}}$, where TTF is defined as the time between start of uptime and start of downtime of the main bearing. A higher inverted TTF therefore indicates earlier failures and shorter lifetimes of the main bearing.
3.1 Define variable input space and sample from predefined distributions
The variable space used for creating the simulation database in this analysis is generated following the approach described in Sect. 2.1. The wind speed is sampled from a uniform distribution ranging between 4 and 30 m s^{−1} covering the power production range of the wind turbine. For each wind speed sample the remaining variables are drawn from a uniform distribution as well with the selected boundaries as presented in Table 1. It should be noted, however, that the input variables can be sampled following any suitable distribution function without influencing the power and load estimations of the resulting model as the sampling only influences the training process. The boundary functions of the wind speed standard deviation is based on the IEC class IA for offshore conditions and result in a range of 0.16 to 3.89 m s^{−1}. The wind shear boundaries are hard coded based on Dimitrov et al. (2018). Regular waves are modeled as windspeeddependent deterministic function for the significant wave height H_{s} and wave peak period T_{p}. However, the wind shear and wave conditions are not used as input variables for the surrogate model later on, since the database is simulated using a constant wind shear of 0.14 and the study only observes loads that are expected to not be influenced by waves. The boundaries for the wind wave misalignment Δ are selected based on Van Vledder (2013). The selected boundaries of the water depth and row spacing is based on the wind farm layout. Studies have shown that a turbine does not seem to experience wake condition with wind–wake angles of bigger range than ±25^{∘} (Dimitrov, 2019; Frandsen, 2007). Finally, up to four upstream turbines are considered for generating multiple wake conditions based on Dimitrov (2019) showing that including more wake sources does not have a significant effect on the resulting load estimations.
Dimitrov et al. (2018)Johannessen et al. (2001)Johannessen et al. (2001)Van Vledder (2013)Dimitrov (2019)Frandsen (2007)Dimitrov (2019)A 2000point pseudoMonte Carlo approach based on a lowdiscrepancy Halton sequence is used to generate the variable space. The resulting samples can be seen in Fig. 7.
3.2 Aeroelastic simulations for normal operation and transients
A total number of 32 output channels are simulated using the aeroelastic tool HAWC2 (Larsen and Hansen, 2019; Madsen et al., 2020) of the NREL offshore 5 MW reference turbine with a jacket structure (Vorpahl et al., 2011). The simulation settings and turbine model are chosen in order to be representative of the actual wind farm. Turbulence is included with the help of socalled turbulence boxes which are “random realizations of threedimensional, stationary and homogeneous turbulent wind fields” (Dimitrov, 2019). Under exactly same conditions, the simulated time series will differ from realization to realization due to this effect of the turbulence, which is called the seedtoseed uncertainty. However, by using a large Monte Carlo sample as in this approach the effect of seedtoseed uncertainty is reduced (Dimitrov et al., 2018). For simulating the wake effects the dynamic wake meandering (DWM) model (Larsen et al., 2008) is used. It models the wake effects by generating three turbulence boxes for each simulation: the “ambient wind field over the rotor area” (Larsen et al., 2008) is introduced by a standard turbulence box on which the wake deficit, introduced by a microturbulence box, is superimposed (Larsen and Hansen, 2019). The relative position of these two turbulence boxes depends on the meandering of the wake which is introduced by a largescale turbulence field.
The simulations are carried out on each of the 2000 samples and repeated for three different yaw misalignments (−10, 0, +10^{∘}) including from zero up to four wake sources, which results in a total of 30 000 simulations for each output channel. These time series simulations are carried out for 600 s for normal operation and 250 s for startup and shutdown operation. A total of 19 startup simulations are carried out according to the standard DLC 3.1 (IEC, 2019) for each wind speed ranging between 4 and 22 m s^{−1}. Higher wind speeds are not considered as the controller would trigger an emergency shutdown due to an exceedance of the maximum rotor speed. A total of 27 shutdown simulations are carried out according to DLC 4.1 (IEC, 2019) for wind speeds between 4 and 30 m s^{−1}.
Subsequently, the time series are postprocessed in order to obtain the desired 10 min statistics, DELs and bearing lifetime. For calculating the DELs of the simulated loads the rainflow counting method (Matsuishi and Endo, 1968) is used with a Wöhler exponent of 4 for the tower top, 8 for the shaft and 10 for the blade root. In order to calculate the lifetime indicator of the main bearing first the time series of the radial force on the main bearing is calculated using the simulated lateral and vertical forces:
With the radial force F_{r} the equivalent dynamic force on the main bearing P_{d} is calculated using Eq. (3), and next the lifetime L_{10} is calculated using Eq. (2). A dynamic bearing rating of C=19 600 kN is used, which is the recommended value for the specific bearing type with the specific inner diameter and mass based on the SKF handbook on roller bearings (SKF, 2018). A factor of a_{1}=0.21 is used corresponding to a 99 % probability of surviving the estimated lifetime. The factor a_{2} refers to the bearing material and is set to 1 based on Harris (2001). Finally, the factor a_{3} representing the bearing condition, including lubrication and cleanness conditions amongst other things, is set to 1 since the necessary information is not available.
3.3 Train and validate surrogate model (ANN)
The surrogate model is calibrated for estimating 11 output variables S(X) as shown in Fig. 4. However, only estimations for the power, main bearing lifetime, torsional moment at the main bearing and bladeroot flapwise bending moment are presented in this paper since the remaining loads show similar resulting patterns.
Various ANN architectures have been trained and evaluated on the test set. After hyperparameter tuning the most suitable settings as shown in Table 2 are selected. The data set of 30 000 samples is divided into a 90 % training, 5 % validation and 5 % testing set. Since the number of samples is relatively large, using other ratios for the train–test split did not affect the model performance. The model parameters are estimated with error backpropagation using the adaptive moment estimation (Adam) (Kingma and Ba, 2014) as an adaptive learning rate optimization algorithm for minimizing the cost function J(W, b). Instead of calculating the cost function for the complete data set, at each iteration a minibatch optimization is used in order to increase computational efficiency and to achieve a more robust convergence. Furthermore, a regularization factor is included in the parameter estimations to avoid overfitting to the training data.
The model performance is then evaluated by calculating the accuracy of the model predictions on the test set (see Table 2). Figures 8 and 9 show a onetoone plot for the estimated power P and main bearing lifetime L_{10} on the test set against the simulation data from HAWC2.
3.4 Sitespecific estimations
In order to exclude outliers from the SCADA data, the OpenOA filtering toolkit developed at NREL (Optis et al., 2019) is applied. Figure 10 shows the probability of each wind direction sector that is obtained from the filtered freestream SCADA data.
For each wind direction sector a Weibull distribution is fit to the wind speed measurements, and a lognormal distribution is fit per wind speed bin to the wind speed standard deviation measurements. The wind–wave misalignment which describes the difference between wind direction and wave direction of windgenerated wave can depend on the wind speed and significant wave height (Van Vledder, 2013). However, since the bearing in the rotor is almost not affected by the wave conditions the wind–wave misalignment is assumed to be normally distributed with a mean μ=0 and standard deviation σ=5 based on presented distributions in Van Vledder (2013). The three above sitespecific input variables of the environmental conditions X_{amb} are generated using a 20 000point pseudoMonte Carlo simulation based on Sobol sequences following the approach described in Sect. 2.4. The final input samples for the surrogate model are shown in Fig. 11 for the wind speed, Fig. 12 for the wind speed standard deviation and Fig. 13 for the wind–wave misalignment.
For summing up the model predictions of both normal and transient operation, the model predictions are weighted according to their probability of operational state. Figure 15 shows the probabilities of startup and shutdown events for an example turbine. The annual number of transients over the whole wind farm can be seen in Fig. 14.
3.5 Operationstate weighted sum
The final resulting probabilityweighted outputs for the offshore wind farm for the AEP, main bearing lifetime, bladeroot flapwise DEL and torsional bearing DEL are shown in Figs. 16 to 19. These results should be analyzed in comparison with Figs. 5 and 6, which show the actual failure maps over the wind farm.
For validation purposes, the surrogate model is used to estimate the power time series of each turbine for a time period of 1 year under normal operation and compared against the measured power from the SCADA system (see Fig. 20). The coefficient of determination R^{2} of the power predictions for the single turbines ranges between 0.89 and 0.93 (see Fig. 21). The power for the northernmost turbine could not be calculated since its measurement data were not available. The AEP is calculated for each turbine showing a relative error between the measured and the estimated normal behavior AEP between 0.1 % and 3.4 % (see Fig. 22). The mean relative error of the AEP estimation for all 29 turbines is 1.5 %.
The results show that the ANN is able to accurately model the simulated power, DEL and L_{10} with a coefficient of determination R^{2} between 0.95 and 0.99. The validation of the estimated time series against the measured 10 min SCADA statistics shows that the power is modeled with a mean prediction error of 1.5 % and an average R^{2} value of 0.91. The time series predictions show a consistent offset at rated power (see Fig. 20). A reason for the difference might be that a generic model had to be used since the more accurate model by the turbine manufacturer was not available. Furthermore, higher uncertainty can be observed for the eastern turbines, i.e., the turbines which are more often experiencing wake conditions (see Fig. 21).
The surrogate modeling approach discussed in this study includes several assumptions and uncertainties which are propagated to the final predictions. The uncertainties in the final model predictions depend on various matters, such as the defined variable space, the wake model used, the selection of environmental input parameters, assumptions for modeling the wake effects in the surrogate model and the surrogate model performance. Investigating different model setups has shown that the results are sensitive towards the way in which the wake is observed (i.e., size of wind direction sector) and how wake is defined as input variables (i.e., considering upstream turbines resulting in most conservative estimates). Despite these uncertainties and data limitations, the model is able to capture the relative differences in the power and fatigue load accumulation over the wind farm well.
The DEL predictions of the bladeroot flapwise bending moment and the torsional bearing moment in Figs. 18 and 19 seem to increase when moving east within the farm. This is expected as those turbines are experiencing multiple wake conditions with prevailing wind from southwest. The model predictions in Fig. 16 show that the highest AEP is observed at turbines positioned in the outer border of the wind farm. This makes sense as well because these turbines are more likely to experience freestream conditions and therefore higher wind speeds as compared to inner positioned turbines. Comparing the AEP map (Fig. 16) with the main bearing lifetime predictions (Fig. 17), it can be seen that those mentioned outer turbines with increased AEP are estimated to have a shorter main bearing lifetime. This indicates the possible correlation that turbines within a wind farm that are located at positions of higher expected AEP might be prone to experiencing earlier main bearing failures as compared to the rest of the wind farm.
Although the lifetime L_{10} is a rather simplistic indicator and misses additional condition information (e.g., about the lubrication status), the lifetime estimations (Fig. 17) do not contradict the observed main bearing lifetime (Fig. 6): while turbines at the outer border of the wind farm are estimated to have a shorter main bearing lifetime, most turbines that were observed to have a premature failure already within the first 3 years of operation are positioned on the border as well, with only one exception. A comparison of the observed failure rates (Fig. 5) with the DEL estimations does not show any clear patterns or correlations, except that 2 out of 12 turbines with higher failure rates are positioned in the region of highest estimated bladeroot DEL M_{x} and main bearing DEL M_{z}. Furthermore, there does not seem to be an obvious connection between the prevailing wind direction of around 240^{∘} (Fig. 10) and the failures.
The reference value by SKF for the required L_{10} lifetime of a wind turbine roller bearing ranges between 30 000 and 100 000 h of operation (SKF, 2018), i.e., that 10 % of a sufficiently large number of identical main bearings under identical conditions are expected to fail within the first 3.4 to 11.4 years of operation. Given that already 40 % of the turbines of the studied wind farm have experienced a main bearing failure by the sixth year of operation, the observations might indicate an unexpectedly high failure rate. However, when interpreting the results and drawing conclusions about possible correlations, it is also important to keep in mind the limitations of the model and data. Since the number of recorded failures is rather limited, it might not be representative of the underlying main bearing failure statistics. More observations are necessary in order to demonstrate a statistically significant difference in the averages of the main bearing lifetime (mean TTF) per turbine or subgroup of turbines. It becomes clear that more failure data from the same wind farm as well as from other wind farms are needed to validate and generalize the possible relationships. Furthermore, the observed main bearing failures are not necessarily fatigueinduced and might have been caused by other factors that are not included in the analysis (e.g., faults during manufacturing process). Finally, the case study shows model estimations for a limited number of operational states, i.e., normal operation and startup and shutdown behavior. Other operational states or wind conditions could have an impact on the main bearing reliability (i.e., parking, curtailment, wake steering, wind gusts, faults, emergency shutdown).
This study presents a procedure that makes it possible to correlate performance and loading conditions within a wind farm with its component reliability in a computationally efficient way. It can be used for assessing the health state of turbines in a wind farm and for getting a better understanding and definition of how fatigue loading can lead to failures. In the demonstration on an offshore wind farm with the focus on observed main bearing failures, the following was found:

The ANN is able to predict the electrical power, bladeroot flapwise DEL, torsional bearing DEL and main bearing lifetime accurately with an R^{2} value of higher than 0.95 compared to the simulated values.

The validation of the estimated power time series against the 10 min SCADA power signals shows that the surrogate model is able to capture the power performance relatively well with a 1.5 % average error in the AEP prediction.

Turbines at the border of the wind farm are estimated to have a shorter bearing lifetime. These estimations do not contradict the observed bearing lifetime from inspection and maintenance reports.

A clear connection between the load estimations and failure observations could not be confirmed.

Further future work can expand the case study to more operating states which could affect the bearing reliability, such as parking conditions. Also, more valuable insights can be gained by including other types of data sources, e.g., SCADA alarms.
Finally, the analysis stresses that more failure data are needed in order to validate and generalize the suggested approach and its associated findings.
The HAWC2 simulation database used for training a surrogate model is available at https://doi.org/10.11583/DTU.12245978 (Schröder, 2020). It contains the turbine model, HAWC2 input files, as well as the postprocessed simulation results.
LS carried out the aeroelastic simulations, SCADA data processing, training of surrogate model and wind farm estimations, and wrote the paper. NKD participated in the conceptual development, contributed with elements of the programming code and provided critical review. DRV gave critical review and provided support for carrying out aeroelastic simulations.
The authors declare that they have no conflict of interest.
We would like to thank Vattenfall for the close collaboration and for sharing the operational data for this study.
This paper was edited by Athanasios Kolios and reviewed by two anonymous referees.
Bangalore, P. and Patriksson, M.: Analysis of SCADA data for early fault detection, with application to the maintenance management of wind turbines, Renew. Energ., 115, 521–532, 2018. a
Calderon, J. F. G.: Electromechanical drivetrain simulation, DTU Wind Energy, Roskilde, Denmark, 2015. a, b
Colone, L., Natarajan, A., and Dimitrov, N.: Impact of turbulence induced loads and wave kinematic models on fatigue reliability estimates of offshore wind turbine monopiles, Ocean Eng., 155, 295–309, 2018. a, b
Dimitrov, N.: Surrogate models for parameterized representation of wakeinduced loads in wind farms, Wind Energy, 22, 1371–1389, https://doi.org/10.1002/we.2362, 2019. a, b, c, d, e, f, g, h, i, j
Dimitrov, N. and Natarajan, A.: From SCADA to lifetime assessment and performance optimization: how to use models and machine learning to extract useful insights from limited data, J. Phys. Conf. Ser., 1222, 012032, https://doi.org/10.1088/17426596/1222/1/012032, 2019. a
Dimitrov, N., Kelly, M. C., Vignaroli, A., and Berg, J.: From wind to loads: wind turbine sitespecific load estimation with surrogate models trained on highfidelity load databases, Wind Energ. Sci., 3, 767–790, https://doi.org/10.5194/wes37672018, 2018. a, b, c, d
Dinwoodie, I., Quail, F., and McMillan, D.: Analysis of offshore wind turbine operation and maintenance using a novel time domain meteoocean modeling approach, in: ASME Turbo Expo 2012: Turbine Technical Conference and Exposition, American Society of Mechanical Engineers, 11–15 June 2012, Copenhagen, Denmark, 847–857, 2012. a
Dvorak, P.: Establishing failure modes for bearings in wind turbines, Tech. rep., Windpower Engineering and Development, https://www.windpowerengineering.com/establishingfailuremodesforbearingsinwindturbines/ (last access: 9 August 2020), 2013. a
Frandsen, S. T.: Turbulence and turbulencegenerated structural loading in wind turbine clusters, report number: RisøR No. 1188(EN), DTU Wind Energy, Roskilde, Denmark, ISBN 8755034586, 2007. a, b
Galinos, C., Dimitrov, N., Larsen, T. J., Natarajan, A., and Hansen, K. S.: Mapping Wind Farm Loads and Power Production – A Case Study on Horns Rev 1, J. Phys. Conf. Ser., 753, 032010, https://doi.org/10.1088/17426596/753/3/032010, 2016. a
Gonzalez, E., Reder, M., and Melero, J. J.: SCADA alarms processing for wind turbine component failure detection, J. Phys. Conf. Ser., 753, 072019, https://doi.org/10.1088/17426596/753/7/072019, 2016. a
Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT Press, Cambridge, MA, 2016. a
Hahn, B., Durstewitz, M., and Rohrig, K.: Reliability of wind turbines, experiences of 15 years with 1,500 WTs, in: Wind Energy, Proceedings of the Euromech Colloquium, Springer, Berlin, Heidelberg, Germany, 329–332, 2007. a
Harris, T. A.: Rolling bearing analysis, John Wiley & Sons, New York, NY, ISBN 9780471354574, 2001. a
Huang, H. and Chiang, C.: Reliability worth assessment of distribution system with large wind farm considering wake effect, in: 2006 IEEE Power India Conference, IEEE, 10–12 April 2006, New Delhi, India, 366–370, 2006. a
IEC: 614001 Ed. 3, Wind Turbines, Part 1: Design Requirements, Tech. rep., International Electrotechnical Commission, Geneva, 2019. a, b
Johannessen, K., Meling, T. S., Hayer, S., et al.: Joint distribution for wind and waves in the northern north sea, in: The Eleventh International Offshore and Polar Engineering Conference, International Society of Offshore and Polar Engineers, Vol. 12, 17–22 June 2001, Stavanger, Norway, ISSN 10535381, 2001. a, b
Kim, H., Singh, C., and Sprintson, A.: Simulation and estimation of reliability in a wind farm considering the wake effect, IEEE T. Sustain. Energ., 3, 274–282, 2012. a
Kingma, D. P. and Ba, J.: Adam: A method for stochastic optimization, preprint: arXiv, https://arxiv.org/abs/1412.6980 (last access: 9 August 2020), 2014. a
Larsen, G. C., Madsen, H. A., Thomsen, K., and Larsen, T. J.: Wake meandering: a pragmatic approach, Wind Energy, 11, 377–395, 2008. a, b
Larsen, T. J. and Hansen, A. M.: How 2 HAWC2, the user's manual, Target, 2, 2 RisøR1597 (ver. 12.7), 2019. a, b
Madsen, H. A., Larsen, T. J., Pirrung, G. R., Li, A., and Zahle, F.: Implementation of the blade element momentum model on a polar grid and its aeroelastic load impact, Wind Energ. Sci., 5, 1–27, https://doi.org/10.5194/wes512020, 2020. a
Matsuishi, M. and Endo, T.: Fatigue of metals subjected to varying stress, Japan Society of Mechanical Engineers, Fukuoka, Japan, 37–40, 1968. a
May, A., McMillan, D., and Thöns, S.: Economic analysis of condition monitoring systems for offshore wind turbine subsystems, IET Renew. Power Gen., 9, 900–907, 2015. a
Mudholkar, G. S. and Srivastava, D. K.: Exponentiated Weibull family for analyzing bathtub failurerate data, IEEE T. Reliab., 42, 299–302, 1993. a
Müller, K., Dazer, M., and Cheng, P. W.: Damage assessment of floating offshore wind turbines using response surface modeling, Enrgy. Proced., 137, 119–133, 2017. a
NTN: Ball and roller bearings, NTN corporation, available at: http://www.ntnamericas.com/en/website/documents/brochuresandliterature/catalogs/ntn_2202ixe.pdf (last access: 9 August 2020), 2009. a
Optis, M., PerrSauer, J., Philips, C., Craig, A. E., Lee, J. C. Y., Kemper, T., Sheng, S., Simley, E., Williams, L., Lunacek, M., Meissner, J., and Fields, M. J.: OpenOA: An OpenSource Code Base for Operational Analysis of Wind Power Plants, Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes201912, 2019. a
Reder, M. and Melero, J.: A Bayesian Approach for Predicting Wind Turbine Failures based on Meteorological Conditions, J. Phys. Conf. Ser., 1037, 062003, https://doi.org/10.1088/17426596/1037/6/062003, 2018. a
Reder, M. and Melero, J. J.: Assessing wind speed effects on wind turbine reliability, Wind Europe Summit, 27–29 September 2016, Hamburg, Germany, 2016. a
Rosenblatt, M.: Remarks on a multivariate transformation, Ann. Math. Stat., 23, 470–472, 1952. a
Schröder, L.: HAWC2 simulations for creating a wind farm surrogate model of a 5 MW offshore wind turbine, https://doi.org/10.11583/DTU.12245978, 2020. a
Schröder, L., Dimitrov, N. K., Verelst, D. R., and Sørensen, J. A.: Wind turbine sitespecific load estimation using artificial neural networks calibrated by means of highfidelity load simulations, J. Phys. Conf. Ser., 1037, 062027, https://doi.org/10.1088/17426596/1037/6/062027, 2018. a, b
Scott, K., Infield, D., Barltrop, N., Coultate, J., and Shahaj, A.: Effects of extreme and transient loads on wind turbine drive trains, in: 50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, 9–12 January 2012, Nashville, Tennessee, https://doi.org/10.2514/6.20121293, 2012. a, b
SKF Rolling bearing catalogue, Tech. rep., SKF Group, available at: https://www.skf.com/group/products/rollingbearings/erratapages/rbc17000 (last access: 10 August 2020), 2018. a, b
Stehly, T. J. and Beiter, P. C.: 2018 Cost of Wind Energy Review, Tech. rep., National Renewable Energy Laboratory (NREL), Golden, CO, USA, 2020. a
Tavner, P., Edwards, C., Brinkman, A., and Spinato, F.: Influence of wind speed on wind turbine reliability, Wind Eng., 30, 55–72, 2006. a
Teixeira, R., O'Connor, A., Nogal, M., Krishnan, N., and Nichols, J.: Analysis of the design of experiments of offshore wind turbine fatigue reliability design with Kriging surfaces, Procedia Struct. Integr., 5, 951–958, 2017. a
Toft, H. S., Svenningsen, L., Moser, W., Sørensen, J. D., and Thøgersen, M. L.: Assessment of wind turbine structural integrity using response surface methodology, Eng. Struct., 106, 471–483, 2016. a
Van Bussel, G. and Zaaijer, M.: Reliability, availability and maintenance aspects of largescale offshore wind farms, a concepts study, in: Vol. 113, Proceedings of MAREC, Marine Renewable Energies Conference (MAREC), Newcastle, UK, 119–126, ISBN 1902536436, 2001. a
Van Vledder, G. P.: On windwave misalignment, directional spreading and wave loads, in: ASME 2013 32nd International Conference on Ocean, Offshore and Arctic Engineering, V005T06A087–V005T06A087, American Society of Mechanical Engineers, 9–14 June 9 2013 Nantes, France, 2013. a, b, c, d
Vorpahl, F., Popko, W., and Kaufer, D.: Description of a basic model of the “UpWind reference jacket” for code comparison in the OC4 project under IEA Wind Annex XXX, Technical report, Fraunhofer Institute for Wind Energy and Energy System Technology (IWES), Bremerhaven, Germany, 2011. a