Development of a comprehensive data basis of scattering environmental conditions and simulation constraints for offshore wind turbines

For the design and optimisation of offshore wind turbines, the knowledge of realistic environmental conditions and utilisation of well-founded simulation constraints is very important, as both influence the structural behaviour and power output in numerical simulations. However, real high-quality data, especially for research purposes, is scarcely available. This is why, in this work, a comprehensive data basis of thirteen environmental conditions at wind turbine locations in the North and Baltic Sea is derived using data of the FINO research platforms. For simulation constraints, like the simulation length and start-up 5 time, well-founded recommendations in literature are also rare. Nevertheless, it is known that the choice of simulation lengths and start-up times fundamentally affects the quality and computing time of simulations. For this reason, studies of convergence for both parameters are conducted to determine adequate values depending on the type of substructure, the wind speed and the considered loading (fatigue or ultimate). As the main purpose of both the data basis and the simulation constraints is to compromise realistic data for probabilistic design approaches and to serve as a guidance for further studies in order to enable 10 more realistic and accurate simulations, all results are freely available and easy to apply.


Introduction
Although the share of offshore wind energy in overall energy production has been steadily growing over the last years, the cost of offshore wind energy is still high compared to other renewable energies (Kost et al., 2013). In order to achieve potential cost reductions of about 30 % in the next ten years (Prognos AG and Fichtner, 2013), a realistic and accurate simulation of offshore 15 wind turbines and their substructures is beneficial. On the one hand, for realistic simulations, the knowledge of scattering environmental conditions is a central point. In this context, scattering conditions are non-constant parameters that exhibit stochastic variations and aleatoric uncertainties, and therefore, should be modelled as statistically distributed. On the other hand, carefully chosen simulation constraints, like the simulation length or the time of initial transients, are essential to obtain accurate results.
Here, the simulation length is defined as the usable time for the post-processing. The time of initial transients is the time that 20 is removed from each simulation to exclude initial transients resulting from starting a calculation with a set of initial turbine conditions (like rotor speed). Simulation length plus initial transient time make up the overall length.
Regarding the first point, current guidelines (IEC, 2009) already define that simulations should mirror the changing environmental conditions at the precise site of a wind turbine. However, for academic research, real site data is rarely available, and even for industrial purposes data quality might be poor for some parameters or long-term data is missing. As a result, various research projects characterised environmental conditions at specific sites or entire areas, and published statistical distributions as a reference. Probably the most frequently used example is the UPWIND design basis . Further examples are the work of Stewart et al. (2015), the PSA-OWT project (Hansen et al., 2015), and the investigations by Häfele et al. (2017). All these reference conditions have some limitations. The design basis of Stewart et al. (2015) is only for deep water 5 sites off the coasts of the United States of America. The wave state of deep water sites are not comparable to shallow water conditions in the North Sea, as significant wave heights generally increase with the water depth (Hansen et al., 2015). Additionally, wind speeds are not measured at hub height, and therefore, have to be extrapolated, which increases uncertainties. For the UPWIND design basis, the wind speed is just given at a reference height of 10 m and not at hub height as well. Furthermore, no statistical distributions for conditional parameters (e.g. the wave height H s depends on the wind speed v s ) are given, but 10 only scatter plots. In the PSA-OWT project, data of the research platform FINO1 in the North Sea is used. Here, the wind speed is measured at hub height, but shadow effects can occur, if sensors are positioned behind the measuring mast. Häfele et al. use data of the research platform FINO3, which has several sensors at each height to reduce shadow effects. However, only five environmental parameters (wind speed and direction, wave height, period and direction) are analysed, and the data period is only five years. Hence, the need for a comprehensive data basis, covering several sites and the most important parameters, 15 becomes obvious in order to enable future research that is based on realistic data. Missing conditions are for example the turbulence intensity, the wind shear or ocean currents.
As to the second point, simulation constraints are frequently chosen based on experience, literature values or recommendations in current standards. However, considering the simulation length and time of initial transients, recommendations in the guidelines are mainly fairly vague (GL, 2012;IEC, 2009). Simulation lengths of 10 minutes for fatigue calculations (FLS), and one 20 hour or less for ultimate loads (ULS) are frequently recommended. For the initial transients, it is advised to discard 5 seconds or more. Literature values partly differ significantly. To reduce the effects of initial transients, the first 20, 30 or 60 seconds are discarded for example (Vemula et al. (2010); Jonkman and Musial (2010); Hübler et al. (2017)), and simulation lengths of 10 minutes and one hour are common practice (Jonkman and Musial, 2010;Popko et al., 2012;Cheng, 2002). However, longer simulation lengths are partly used as well, especially in the oil and gas industry or for floating substructures (DNV,25 2013). Still, all these recommendations are not underpinned with detailed analyses. For floating offshore wind turbines, such investigations were conducted for the simulation length by Stewart et al. (2015), Stewart et al. (2013) and Haid et al. (2013). It is shown that simulation lengths of 10 minutes are sufficient for ULS and FLS loads. The observation that ULS and FLS loads tend to be higher for longer simulations are not due to physical reasons, but to unclosed cycles in the Rainflow counting for the FLS case and a result of the averaging technique in case of ULS loads. Both can be handled by adapting the algorithms.

30
Concerning the time of initial transients, Haid et al. (2013) recommend 60 seconds and the utilisation of initial conditions. This recommendation is based on an analysis which has not been further specified. For a jacket foundation, Zwick and Muskulus (2015) conducted a study investigating lengths of simulations and initial transients and also concluded that 10 minutes are sufficient, as long as 10-minute time series are merged before the Rainflow counting is applied. The required time of initial transients is determined by checking the rotor speed to reach a steady state. However, neither initial conditions are applied, 35 nor does a steady rotor speed guarantee that all transients are damped out. Therefore, the need for well-founded guidance on simulation lengths and times of initial transients for bottom fixed substructures becomes clear. For the simulation length, useful preliminary work is available, but it is limited to jacket substructures. Concerning initial transients, extensive studies are rare, and do not concentrate on the convergence of the relevant loads (FLS and ULS). Furthermore, scattering environmental conditions are not taken into account. This is a simplification especially in case of the initial transients, as this variation might 5 lead to more pronounced resonance effects (e.g. rarely occurring low wave peak periods that are close to the natural frequency of the structure; cf. Sect. 2.4) and therefore to more pronounced initial transients.
After all, the listed shortcoming in state-of-the-art modelling assistance motivated the current work that focuses on the following aspects: (1) Derive an open access data basis for various scattering environmental conditions at different sites to enable more realistic 10 modelling.
(2) Give well-founded guidance on simulation length requirements and the time needed to exclude initial transients, when these realistic conditions are applied, to improve accuracy of numerical simulations.
In order to address these topics, firstly, a data basis for all significant environmental conditions is derived from real data of the FINO research platforms. In this work, the data source is introduced, the analysis is described, and the resulting distributions 15 and some interesting findings are presented. Secondly, required simulation lengths and times of initial transients are determined.
For this purpose, the probabilistic simulation approach and the simulation model are explained. Then, studies of convergence are conducted for the simulation length and the time of initial transients. A monopile and a jacket substructure, FLS and ULS loads, and different wind speeds are considered. Recommendations are summarised. Lastly, the benefits and limitations of the current approach are summarised, and a conclusion is drawn. 20 2 Comprehensive data basis 2.1 Raw data Environmental conditions can vary significantly among various turbine sites. As these states affect loads, and therefore, the design of offshore wind turbines, precise data of specific turbine location is valuable. Real site data is scarce, which is the reason for the formerly mentioned reference data bases Hansen et al., 2015;Stewart et al., 2015;Häfele et al., 25 2017). These data bases define conditional, statistical distributions for some of the most important environmental conditions: Wind speed and direction, wave height, direction and peak period. However, other conditions are fixed for each wind speed or are set completely constant. The states of the frequently used UPWIND design basis are summarised in Table 1 as an example.
In this study, scattering conditions are derived directly from offshore measurement data. The raw data is taken from the three FINO platforms, and conditional distributions for the following 13 environmental parameters are determined: wind speed and 30 direction, wave height, peak period and direction, turbulence intensity, wind shear exponent, speed and direction of the sub- Table 1. Environmental conditions (wind speed vs, significant wave height Hs, wave peak period Tp, and turbulence intensity TI) of the K13 shallow water site (UPWIND design basis ). The wind shear exponent is α = 0.14, and wind and wave directions are usually set to zero, but scatter plots are available. predictions. Data of incomplete years is not taken into account in order not to introduce bias due to seasonal effects.

5
In this work, raw data of the FINO measurement masts is used to set up a data base for correlated, scattering environmental conditions. As the post-processing of raw data is time-consuming and unnecessary to be repeated each time environmental conditions are used, conditional probability distributions (i.e. P (Y = y|X = x) with X being the independent random variable, Y the dependent one, and P the probability function) for environmental conditions are derived to make the data base easy to use. Firstly, post-processing is carried out to identify sensor failures (missing data) and measurement failures (outliers). Missing data is not interpolated, but left out, in order not to introduce any bias. As sufficient data of proper signal quality is available (e.g. more than 350 000 data points for the wind speed even for FINO3), this approach is practicable. Wind speed data is synchronised with the wind direction data. This enables a selection of the anemometer in front of the mast for FINO3. For FINO1 and 2, wind speed values are discarded, if the jib is located directly in the tower shadow. The turbulence intensity (TI) can be computed as the quotient of the standard deviation of the wind speed in a 10-minute interval (σ v ) and the mean wind 15 speed in this interval (v s ) according to Eq. (1): For the wind shear, Eq.
(2) applies according to the standard IEC 61400-1 (2005): where z is the height above mean sea level, z 0 is a reference height, v s (z) and v s (z 0 ) are wind speeds at the specified heights 20 and α is the wind shear exponent. At the FINO platforms, the wind speed is measured at eight different heights. Therefore, it is possible to determine the wind shear exponent for every 10-minute interval by assuming z 0 = 90 m and applying a nonlinear regression. The air density can be calculated using Avogadro's Law in Eq.
(3) and the measurements of humidity (φ), air pressure (p humid ), and temperature in degree Celsius (T air ):

25
As humid air can be regarded as a mixture of ideal gases, the following equation applies for R humid : where R dry = 287.1 J kg K is the specific gas constant for dry air, R vapour = 461.5 J kg K for water vapour, and p sat is the saturation vapour pressure that can, for example, be calculated using the August-Roche-Magnus formula: p sat = 6.1094 hPa × e 17.625×T air T air +243.04 .  For the water density, a semi-analytical approach by Millero and Poisson (1981) of the following form is applied: where S is the salinity, T water is the water temperature at the surface, A, B and C are polynomial functions of the water temperature and D is a constant. As constant salinity is assumed, the water density is a function of the water temperature. For all wave parameters, three-hour mean values are calculated, as wave conditions stay stationary for a duration of about three 5 hours (GL, 2012). For the speeds and directions of sub-and near-surface currents, measured current values (v m and θ m ) have to be converted in order to separate sub-and near-surface components. According to, for example, IEC (2009), the following two equations apply for sub-and near-surface currents respectively: (8) 10 Here, v SS (z) and v NS (z) are the sub-and near-surface current speeds at a position z below the water surface, and d is the water depth. For reasons of clarity, the following notation is introduced: v SS (z) = v SS,z . The velocity profiles are shown in Fig. 2.
Obviously, the near-surface current does not exist below a reference depth of 20 m. Hence, it is possible to use measurement data of a depth of 20 m (or more) to directly get the sub-surface direction (θ SS,20 = θ m,20 ) and to calculate the speed, for example For the near-surface current, measurements close to the surface (e.g. v m,2 ) can be used. However, these measurements include sub-and near-surface components, as shown in Fig. 3. Therefore, the sub-surface component at 2 m has to be calculated using Eq. (7), and the sub-surface direction is assumed to be constant over depth (θ SS,20 = θ SS,2 = θ SS,0 ). Then, trigonometrical relationships can be applied to calculate the near-surface current at 2 m: Lastly, the reference near-surface current v NS,0 is given by: A depth-independent near-surface direction is assumed, and therefore, θ NS,0 = θ NS,2 .
After having post-processed the measurement raw data, maximum likelihood estimations are applied to the processed data of the regarded 13 environmental conditions in order to fit several statistical distributions. In addition to unimodal distributions, and if several distinct peaks are distinguishable, multimodal distributions are fitted as well, as it is assumed that the peaks are 10 due to physical phenomena. However, as multimodal approaches have more degrees of freedom, they always fit the data better, even in case of a physically unimodal shape. Therefore, they have to be chosen with care in order not to fit physically unimodal distributions with multimodal approaches.
Considering the example of wind speed and wave height, it is self-evident that some environmental parameters are conditioned by others, and dependencies have to be defined. For example, the case of a calm sea during a storm is very unlikely. Analysing 15 scatter plots of the environmental inputs and taking a literature review into account, the dependencies in Table 2 are defined, although it is possible to define them differently (cf. Stewart (2016)), as mainly the correlation is significant, and the determination of cause and effect is secondary.
One of the most common ways to include dependencies in statistical distributions is to split up the data of the dependent parameters into several bins of the independent parameters (e.g. Stewart (2016); Johannessen et al. (2002); Li et al. 20 (2015)). To illustrate this approach, for example, the wave peak period is fitted in several bins of 0.5 m wave height (e.g. P (T p ) = P (T p |1.5 m ≤ H s < 2 m)). The bin widths for the dependent parameters are summarised in Table 2 as well. For highly correlated parameters, an alternative to the binning procedure is to model only the deviation between the parameters.
Here, the direction of the near-surface current that is highly dependent on the wind direction is an example. Therefore, by modelling the deviation ∆ NS according to Table 2, it applies: Visual inspections and objective criteria using Kolmogorov-Smirnov tests (KS tests) and chi-squared tests (χ 2 tests) are used to

Resulting distributions
In order to establish a full data basis, statistical distribution and their parameters for all thirteen environmental conditions, the three sites and all bins (if necessary) have to be provided. Furthermore, for non-parametric distributions the underlying data is needed. The main ideas are explained here, however, due to the comprehensiveness of the data, detailed and additional information is provided in an easily applicable form, in the supplementary material. At this point, only two examples are shown 5 in Fig. 4 and 5.

Special findings
In this section, some noteworthy findings of this data basis, mainly resulting from the consideration of scattering, are pointed out. Three examples are presented: the importance of wave peak periods, the high scattering of wind shear exponents, and the behaviour of the turbulence intensity. 10 Wave loads are of particular importance, if the wave frequency is close to the first natural frequency of the structure. Standard offshore wind turbines have first bending frequencies of about 0.25 to 0.3 Hz (Jonkman and Musial, 2010;Popko et al., 2012) corresponding to eigenperiods of less than 4 s. If state-of-the-art data bases are used (cf. Table 1), there will be no resonance.
However, real data suggests that resonance effects are problematic even for higher wind speeds, as wave peak periods of less than 4 s occur (see Fig. 6). 15 Concerning the wind shear exponent, in the standards and most current data bases (e.g. GL (2012); Fischer et al. (2010)), constant values for all wind speeds are proposed. However, this assumption is a massive simplification. Ernst and Seume (2012) showed that the wind shear exponent significantly depends on the wind speed. Here, it is shown (see Fig. 7) that it does not only vary between wind speeds, but scatters remarkably within each bin as well, and might even be negative.
For the turbulence intensity, this data basis reveals that state-of-the-art approaches are mainly conservative, as too high tur-    bulence intensities are assumed. This is shown in Fig. 8, where the turbulence intensity for all three sites is compared to a standard data basis  and to current standards (IEC, 2009). All three sites exhibit similar mean turbulence intensities and 90 % percentile values (Q 0.9 ). For the comparison with literature values, the 90 % percentile is of importance, as standards require simulations with this percentile value. However, even for the 90 % percentile, the UPWIND data basis is very conservative. The least conservative case (category C) in IEC (2009) fits the Q 0.9 -values relatively well, but predicts slightly 5 higher turbulence intensities for wind speeds above about 10 m s −1 . Considering the fact that using the 90 % percentile is a conservative assumption and that the measurements include some wake effects due to wind farms near to all measurement masts, it can be concluded that state-of-the-art assumptions for turbulence intensities are probably unnecessarily conservative. The wake effects are depicted in Fig. 9, where turbulence intensity measurements of FINO1 from 2011 to 2016 are shown. In this period, the wind farm Alpha Ventus was operating on the east side of FINO1. Therefore, west wind leads to free stream conditions and east wind to wake conditions. Obviously, free stream conditions lead to even lower turbulence intensities, whereas wake conditions increase the turbulence especially for smaller wind speeds, as also detected by Hansen et al. (2012).

Simulation assistance
In the previous section, a comprehensive data basis for scattering environmental offshore conditions was developed. However, even with realistic input parameters the accuracy of numerical simulations is significantly influenced by constraints like their lengths and the time eliminated to exclude initial transients. Therefore, in this section, efficient simulation lengths and times of initial transients for varying wind speeds and different types of loading and substructures are determined. This is achieved by 5 analysing the convergence of relevant quantities (i.e. FLS and ULS loads). Before conducting these studies, the overall probabilistic simulation approach is explained, as it differs from the approach in the standards. Subsequently, the utilised simulation model and the chosen environmental conditions are briefly presented. In this work, statistically scattering environmental conditions are applied, and therefore, a probabilistic simulation approach is used. This probabilistic approach differs from the deterministic load case based approach. For the probabilistic approach or "real-life" approach, it is not necessary to simulate any load cases of extreme environmental conditions (e.g. DLC1.3 to 1.6), but the use of scattering conditions leads directly to simulations that represent the real lifetime of the turbine (without fault, 20 start-up or other special situations). Hence, simulations (e.g. 10000 simulations) cover a realistic period of power production and idling, leading to about 2.3 months of turbine lifetime (for 10000 simulations). As environmental conditions scatter, effects like high turbulences, extreme wind shear, high waves, small wave periods, and others are covered, and do not have to be considered separately. Load cases are not simulated explicitly, but are cover implicitly by conducting probabilistic simulations.

Probabilistic simulation approach
That is why for FLS, the two approaches do not differ significantly. The "real-life" approach covers DLC 1.2 and 6.4. For ULS, 25 the "real-life" approach covers all power production cases (DLC 1.1-1.6) and DLC 6.1 by applying scattering environmental conditions. As the "real-life" approach cannot simulate 20 years of turbine lifetime (or even a return period of 50 years), a load extrapolation, as required for DLC 1.1, is needed in order to calculate an ULS design. However, this extrapolation is not needed here, as it does not influence the investigated simulation constraints.
As common in academia, only power production and idling is simulated. Fault cases, start-up, etc. is not taken into account 30 due to several reasons. Firstly, at least for the jacket, fault cases are less relevant (Vemula et al., 2010). Secondly, these load cases are very controller and design dependent and need special treatment (e.g. there is no need of removing initial transients for start-up load cases). And thirdly, this work is not intended to calculate exact fatigue damages or ultimate loads for the whole turbine lifetime, as no turbine design or optimisation is done. The exclusion of some load cases does not affect the recommendations on simulations constraints that are given for power production and idling conditions. As there is no need of exact FLS and ULS lifetime loads in this study, an assessment of the probabilistic approach concerning accordance with the standards is neither conducted nor needed, but would be valuable for further applications of probabilistic approaches.

Simulation setup 5
As environmental conditions vary for various turbine sites, a data basis being used for the studies of convergence has to be chosen. The basis developed in this work is appropriate, and the FINO3 site is chosen. Some conditions, like air and water density, are kept fixed, as it was shown that their variation is of minor importance . It is tried to keep the convergence study as simple as possible, and to focus on the most relevant parameters. Hence, for the probabilistic approach, statistically scattering values according to the determined distributions of wind speed and direction, wave height, direction and 10 period, turbulence intensity, and wind shear exponent are used in all simulations. In addition, the following assumptions are made for all simulations: -The turbulent wind field is computed according to the Kaimal model and using the software TurbSIM (Jonkman, 2009) with a different wind seed for each simulation.
-Irregular waves are calculated according to the Pierson-Moskowitz spectrum using varying wave seeds for all simula- 15 tions.
-Soil conditions of the OC3 model (Jonkman and Musial, 2010) are applied.
-The current, second-order and breaking waves, wave spreading effects, marine growth, local vibration effects of braces, joint stiffnesses, and degradation effects are neglected.
The time domain simulations of the convergence study are conducted using the aero-servo-hydro-elastic simulation framework 20 FASTv8 (Jonkman, 2013). A soil model (Häfele et al., 2016) applying linearised soil-structure interaction matrices enhances this code. The NREL 5 MW reference wind turbine (Jonkman et al., 2009) with two different substructures is investigated: Firstly, the OC3 monopile (Jonkman and Musial, 2010) and secondly, the OC4 jacket (Vorpahl et al., 2013). The outcomes of the FAST simulations are, inter alia, time series of forces, moments, and stresses for each element of the substructure.
Since the convergence of fatigue and ultimate loads is investigated in the next step, the calculation concept of these two loads 25 is briefly explained.  al., 2010). For all stresses, a Rainflow counting evaluates the stress cycles. As recommended by the current standards, the conservative damage accumulation according to the Palmgren-Miner rule is assumed using a slope of the S-N curve of three before and five after the fatigue limit for both substructures. The separated fatigue calculation (and summation over all simulations) for each connection of each joint is necessary, as damages in each connection and joint are different for each simulation, and the highest values do not always occur in the same joint (for example due to the probabilistic variation of the wind direction).

5
Finally, the decisive damage for the jacket is the highest accumulated value of all connections of all joints.
For the monopile, the fatigue procedure is similar, but is done according to Eurocode 3, part 1-9 (2010), where a detail of 71 MPa for transverse butt welds and an additional reduction due to the size effect (t > 25 mm) is recommended. Differing from the recommendations in Eurocode 3, part 1-9 (2010), the same slopes of the S-N-curves as for the jacket are used.
For the ULS analysis, maximum stresses are decisive and extracted from the time series. For the monopile, Eurocode 3, part 10 1-6 (2010), is used to analyse the plastic limit state, cyclic plasticity limit state, and buckling limit state (LS1-3). For the jacket, NORSOK N-004 is applied for tubular members and joints which takes combined axial, shear, bending and hydrostatic loadings into account. In both cases, the yield stress is 355 MPa.
Additionally, ultimate limit state proofs for the foundation piles are performed including axial and lateral soil proofs according to GEO2 (DIN 1054, 2010) and a plastic limit state proof (LS1) for the steel pile below mudline. Especially for the monopile, 15 the last proof might be decisive as the bending moment frequently reaches its maximum below mudline. For all ULS proofs, utilisation factors, being the percentage of the maximum loads, are the outcomes.

Simulation length
The simulation length significantly influences the overall computing time of the load assessment. However, there is no conclu-20 sive consensus concerning the length needed. Current standards recommend for example 10-minute or one-hour calculations.
The offshore oil and gas industry prefers simulation lengths of six hours to cover all low-frequency hydrodynamic effects.
The use of 10-minute simulations can potentially reduce the computing time by a factor of about 36 compared to six-hour simulations. Hence, a study of convergence for bottom fixed offshore wind turbines is conducted here. For floating wind turbines, it is referred to Stewart (2016), who showed that for floating structures all physical effects can be covered with 10-minute 25 simulations.
The presented outcomes of this study focus on the monopile substructure, but a jacket is analysed as well and results (not shown) are generally comparable. For several wind speed bins, 500 simulations with a total length of ten hours are conducted.
As the initial transient behaviour is analysed subsequently, a clearly sufficient time, being discarded to exclude the initial transients, of four hours is chosen. Eliminating these four hours of initial transients, the total length of 10 h reduces to a maximum 30 available length (simulation length) of 6 h for the convergence study. In a first step, the convergence of FLS loads is analysed.
Afterwards, the ULS case is investigated.
The procedure to calculate the mean fatigue damage for each wind speed bin is the following: From the basis of the 500 ten-hour simulations having different random seeds and varying environmental conditions, 500 cases are selected (with re-  placement). For each simulation, the fatigue damage is calculated and weighted with the simulation length. The mean value of all cases is calculated. This procedure is repeated 10 000 times (bootstrapping) to assess the associated uncertainty. Figure 10 displays the normalised mean fatigue damages for different wind speeds and simulation lengths between ten minutes and six hours. The values are normalised with the six-hour values, and error bars show the ±σ confidence intervals (68 %) that are estimated using a bootstrap procedure with 10 000 resamplings.

5
It is apparent that due to scattering environmental conditions and the limited number of simulations the uncertainty is relatively high. A detailed investigation of the fatigue load uncertainty, when scattering environmental conditions are applied, is valuable, but out of the scope of this work (cf. Sec. 4). Nevertheless, from Fig. 10 it is apparent that there are no pronounced trends for changing simulation lengths. A slight increase of fatigue loads for higher simulation lengths might be suspected given the fact that such behaviour was observed for floating substructures by Stewart (2016). In order to focus on the simulation length 10 effects, the variation of environmental conditions is neglected in a second step (only varying random seeds). This reduces the uncertainty making it possible to clearly identify a slight increase of FLS loads of about 5 % for higher simulations lengths (see Fig. 11, not merge case). However, as shown by Stewart (2016) for floating substructures, the increasing fatigue loads are not due to any physical effect (all important low-frequency effects of waves are already covered by 10-minute simulations), but can be explained by the effect of unclosed cycles in the Rainflow counting. Cycles that are not completed at the end of the 15 simulation are approximated by counting them as half cycles. The longer the simulation, the less influential is this approximation, as the number of half cycles compared to the number of full cycles reduces. A quite straightforward approach to reduce the problem of half cycles is to merge several shorter simulations (e.g. 10-minute simulations) to a longer one (e.g. six-hour simulation). This means fatigue damages are not calculated for each time series separately, but for longer time series consisting of several shorter ones that are just appended to each other. It is either possible to append different 10-minute time series to 20 each other, or each time series is duplicated and appended several times to itself. If scattering environmental conditions are assumed, in some simulations, fairly different load levels occur. In these cases, load levels of the simulations might not fit, and additional cycles can be introduced by merging different time series, leading to unreasonably increased fatigue damages.
Merging each time series with itself, guarantees fitting load levels. On the downside, the computing time of the post-processing is slightly increased. The effect of merging several shorter simulations with itself to generic and repetitive six-hour time series (e.g. each 10-minute time series is duplicated 36 times and is appended to itself to create a six-hour time series) is demonstrated in Fig. 11. It can be seen that the simulation error of about 5 % too low FLS loads for not merged 10-minute simulations can 5 be compensated by merging time series in the post-processing.
For the ULS loads, the calculation procedure is similar. From the basis of the 500 ten-hour simulations, 500 cases are selected (with replacement). The maximum value of all simulations is taken as decisive utilisation factor. This procedure is repeated 10 000 times (bootstrapping) to assess the associated uncertainty.
The convergence is shown in Fig. 12. Obviously, ULS loads are higher for longer simulations. Again, this increase is not due 10 to any physical phenomenon, but a result of different overall computing times. Clearly, 500 10-minute simulations should not be compared to 500 six-hour simulations, but to about 14 six-hour simulations . Therefore, in a second step, the ULS calculation procedure is slightly adapted. Now, 500 cases are only selected for 10-minute simulations. For all other simulations length, the number of cases is reduced to keep the over simulation length constant at 5 000 minutes (i.e. 250 cases for 20-minute simulation, etc.). This comparison is displayed in Fig. 13 and makes clear that ULS loads do not depend on the 15 simulation length but only on the overall computing time. A second fact being visible in Fig. 13 are the higher uncertainties for longer simulation lengths. Since 10-minute simulations lead to a higher number of cases than six-hour simulations for the same total length (i.e. 500 and 14), shorter simulations better cover rare cases, and therefore, scattering environmental conditions leading to less uncertainty.
After all, the investigations of this section suggest that simulations of ten minutes length are sufficient independent of the type 20 of load or investigated substructure, or wind speed. At this point, it has to be noted that only two types of substructures are analysed and environmental conditions typical for the North Sea. For significantly different substructures or locations, the validity might be limited. Notwithstanding the above, for ULS loads, the same overall time has to be compared in order to achieve reliable results. By keeping the simulation length short, more simulations can be conducted in the same overall computing time leading to a better convergence of ULS loads. For FLS loads, simulation errors due to the simulation length can be reduced by merging the time series.

Initial transients 5
For the analysis of the simulation length, the first four hours of each simulation were discarded to guarantee a steady state operation of the turbine. However, removing four hours of initial transients and only using ten minutes of simulation is computationally very expensive. Therefore, the convergence of FLS and ULS loads with respect to the time of initial transients is analysed. As initial conditions, like an initial rotor speed, influence the initial transient behaviour , initial rotor speeds and blade pitches depending on the wind speed are set here. These initial conditions are quasi-static states deter-10 mined using prior simulations.
As the initial transient behaviour is affected by the type of substructure and the load condition, the time that has to be removed is analysed in each wind speed bin for FLS and ULS loads and for both types of substructures separately. Commonly, time series are investigated to estimate times of initial transients (Zwick and Muskulus, 2015). Although this is a straightforward approach, here, it is considered to be not expedient. For a fatigue assessment, the convergence of the fatigue damage has to be 15 analysed, and for the ULS analysis, maximum loads or utilisation factors have be considered.
For each wind speed bin, 10 000 simulations for the monopile and 500 for the jacket were conducted according to the simulation setup in section 3.2. This means: Each simulation has its own random seed for irregular waves and turbulent wind, and   behaviour dominates, have shorter initial transients.
The convergence of ULS utilisation factors for both substructures is shown in Fig. 16 and 17. It becomes apparent that initial transients are short independent of the type of substructure and wind speed. The cycles with high amplitudes occurring at the beginning of each simulation are damped out within a few seconds, and hence, are not influencing the ULS behaviour. More problematic are less damped cycles with smaller amplitudes leading to the previously presented, higher times of initial tran-5 sients for FLS loads.
The recommended times that should be discarded to exclude initial transients for both substructures, being always a compromise between computing time and accuracy (here, errors below 5 %), are summarised in Table 3. It has to be mentioned that the general validity is limited, as these times of initial transient might vary for example for different aero-elastic codes, numerical solvers, time constants of the aero-elastic models, or substantially different substructures. For example, jackets for 10 10 MW turbines might behave differently due to larger diameters of legs and braces increasing wave effects. However, for similar applications (e.g. FASTv8, NREL 5 MW turbine, OC3 monopile or OC4 jacket, etc.) that are not rare in academia (e.g. Zwick and Muskulus (2015) or Morató et al. (2017)), the given values represent a well-founded guidance for simulation setups. Furthermore, these results shall sensitise the research community to the problem of initial transients especially in case of fatigue. For fatigue, the time of initial transients might be higher than frequently presumed in literature. This is due to weakly 15 damped cycles with small amplitudes that cannot directly be identified when looking at time series.

Benefits and limitations
The benefit of the current work is twofold. Firstly, a comprehensive data basis for scattering environmental conditions was set up, which is freely available and easy to use. Secondly, two simulation constraints (simulation length and time of initial   transients) were analysed, and well-founded recommendations are given.
The main advantages over existing data bases are the following: The data basis covers several different sites being situated in different oceans. It has to be admitted that the sites are fairly similar, as they are all in shallow water conditions. Additionally, the data basis contains statistical distribution for much more environmental conditions than existing ones. As was shown for example by Hübler et al. (2017) that not only main conditions like the wind speed are influencing the dynamic behaviour of 5 offshore wind turbines, knowledge of additional parameters is beneficial. Current data bases consist frequently of raw data that needs to be post-processed, which is a time-consuming process. Here, on the one hand, easily applicable statistical distributions are given. One the other hand, the complexity of dependent environmental conditions is still covered by utilising conditional distributions and multimodal and non-parametric approaches. In contrast to many existing data bases, the raw data is of good quality. For example, wind speeds are measured at heights comparable to hub heights of current turbines, and there is no need for extrapolations, as it is the case for buoy measurements. Still, more data would be valuable in order to achieve more reliable distributions in high wind speed bins that rarely occur. After all, the developed data basis is capable to improve offshore wind turbine modelling by providing more realistic inputs for simulations in academia where real site data is scarce. One example of improved offshore wind turbine modelling is given in Sec. 3.3 and 3.4. The inclusion of probabilistic inputs leads to a significant and realistic increase of fatigue damage scattering requiring high numbers of simulations. Hence, deterministic inputs underestimating this scattering can lead to biased fatigue values. Detailed analyses of the effect of scattering environmental conditions on fatigue damage, and therefore, of the needed number of simulations are part of upcoming work of the authors.

5
Concerning the second benefit, the simulation constraints, it has to be kept in mind that not only realistic modelling, but also small simulation errors are important in order to model accurately. In this context, the chosen simulation length and time of initial simulation transients matter. So far, these values are frequently chosen without profound knowledge. Some approaches to gain a deeper insight into these constraints (Stewart, 2016;Zwick and Muskulus, 2015) concentrate on simulation lengths or specific types of substructures and are not taking realistically scattering environmental conditions into account. In this work, the 10 scattering of the conditions is addressed and different bottom fixed substructures are analysed. This enables recommendations for simulation lengths and times of initial transients depending on the wind speed, the type of substructure and the considered load case (ULS or FLS). However, the general validity of the current results has to be slightly restricted, as only one design of each type of substructure was investigated. Therefore, the initial transient behaviour might be slightly different for significantly different designs. Furthermore, for the time being removed to exclude initial transients, the values might also differ between 15 different simulation codes and are only tested for the FASTv8 code. Different numerical solver or time constants of the aeroelastic models might also influence the time of initial transients. Nevertheless, even in these cases, the given recommendations can be regarded as a well-founded starting point for further investigations, and, even more important, clarify the challenge of a well-founded choice.

20
This work aims to help future simulation work to be more realistic and accurate. In order to achieve this objective, a freely available and comprehensive data basis for scattering environmental conditions was set up. This data basis consists of conditional statistical distribution for many parameters and can be applied without further post-processing. All needed information (statistical distribution and their parameters) is given in the supplementary material. In academia, this data basis enables simulations with probabilistic environmental conditions making them more realistic. For industry purposes, this work might lead to 25 a reconsideration of the current practice. This study shows that the use of deterministic values being either only dependent on the wind speed (e.g. turbulence intensity) or even totally constant (e.g. wind shear) does not represent realistic offshore conditions. However, for a well-founded reconsideration of the current practice, a detailed assessment of probabilistic approaches compared to deterministic load case based ones is needed.
Additionally, scientifically sound recommendations are given for the choice of simulation lengths and times to be removed to 30 exclude initial transients. Simulation lengths of 10 minutes are generally sufficient, and can even help to reduce uncertainties.
However, in case of FLS loads, times series should be merged, and for ULS situations, the overall computing time has to be kept constant. Recommendations concerning the initial transients have to be handled with care due to limitations of the general validity. The values are summarised in Table 3 and can help to improve the accuracy of simulations, and to reduce computing times. It should be noted that a partly significantly longer initial transient behaviour compared to values in literature, being mainly based on educated guesses, was detected.
An enlargement of the current data basis to include additional offshore sites, other types or designs of substructures or investigations for other simulation codes and numerical solver would be definitely valuable to increase the general validity.

5
Furthermore, even for the utilised FAST code, additional investigations concerning the amount of eigenmodes representing the substructure would be beneficial, as a reduction of retained eigenmodes might reduce the time of initial transients.
Data availability. The raw data is taken from the FINO platforms -operated on behalf of the Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety (BMUB) -and is freely available for research purposes (www.fino-offshore.de/en/). The derived data basis, consisting of statistical distribution for thirteen partly dependent environmental conditions and three offshore sites, is freely 10 available. All needed information concerning the statistical distribution and their parameters is given in the published supplementary material to this work.
Competing interests. The authors declare that they have no conflict of interest.