Understanding and mitigating the impact of data gaps on offshore wind resource estimates
 ^{1}Fraunhofer Institute for Wind Energy Systems IWES, Am Seedeich 45, 27572 Bremerhaven, Germany
 ^{2}Fraunhofer Institute for Wind Energy Systems IWES, Küpkersweg 70, 26129 Oldenburg, Germany
 ^{1}Fraunhofer Institute for Wind Energy Systems IWES, Am Seedeich 45, 27572 Bremerhaven, Germany
 ^{2}Fraunhofer Institute for Wind Energy Systems IWES, Küpkersweg 70, 26129 Oldenburg, Germany
Correspondence: Julia Gottschall (julia.gottschall@iwes.fraunhofer.de)
Hide author detailsCorrespondence: Julia Gottschall (julia.gottschall@iwes.fraunhofer.de)
Like almost all measurement datasets, wind energy siting data are subject to data gaps that can for instance originate from a failure of the measurement devices or data loggers. This is in particular true for offshore wind energy sites where the harsh climate can restrict the accessibility of the measurement platform, which can also lead to much longer gaps than onshore. In this study, we investigate the impact of data gaps, in terms of a bias in the estimation of siting parameters and its mitigation by correlation and filling with mesoscale model data. Investigations are performed for three offshore sites in Europe, considering 2 years of parallel measurement data at the sites, and based on typical wind energy siting statistics. We find a mitigation of the data gaps' impact, i.e. a reduction of the observed biases, by a factor of 10 on mean wind speed, direction and Weibull scale parameter and a factor of 3 on Weibull shape parameter. With increasing gap length, the gaps' impact increases linearly for the overall measurement period while this behaviour is more complex when investigated in terms of seasons. This considerable reduction of the impact of the gaps found for the statistics of the measurement time series almost vanishes when considering longterm corrected data, for which we refer to 30 years of reanalysis data.
A wind resource assessment is performed at the beginning of every wind energy project. The wind resource is estimated for the site that is preselected with respect to the expected lifetime of the project, i.e. for the 20–30 years in the future during which the wind turbines will be operated at the site (Rohrig et al., 2019). During the lifetime of a wind farm, reassessments are also typically done that can be based on wind turbine or further wind measurement data. Based on this estimate, an expected energy yield is derived which serves as a basis for any economic considerations of the project. Consequently, uncertainties and a possible bias in the wind resource estimate propagate up to the financing of a wind project with the percentage uncertainty value increasing from uncertainty in wind speed to uncertainty in wind farm production to uncertainty in the expected return on investment. Thus, to reduce these uncertainties, starting from the wind measurements is of high interest and relevance.
A wind resource assessment is typically based on a shortterm measurement on site, which is conducted several years prior to the installation of the wind farm and has a duration on the order of a year (MEASNET, 2016; FGW e.V., 2017). The campaign duration is in most cases a compromise between informative value – defined by the representativeness of the measurements for the lifetime of the wind farm, i.e. those 20–30 years in the future – and the costs of the measurement campaign. In a later step, shortterm measurements are longterm extrapolated making use of a reference dataset that is either a longer multiyear measurement in the surrounding area of the site or data from a reanalysis, sometimes downscaled with the use of a mesoscale model, with a resolution of several (tens of) kilometres around the site (Carta et al., 2013). In the case of complex terrain or differing measurement heights, horizontal and vertical interpolation is done using numerical computational fluid dynamics (CFD) and/or simplified engineering models (Rohrig et al., 2019).
Almost all measured time series have data gaps due to failures of the sensors themselves, a data logger or the power supply or due to adverse conditions such as a low aerosol concentration or unwanted fixed echoes for remote sensing devices (MEASNET, 2016). In case of offshore measurements, data gaps often further increase due to limited accessibility to the measurement installation in particular in high wind and wave conditions that may typically last for several weeks or even prevent access for a whole season. Additionally, many offshore wind measurements are not fully redundant due to their high costs, which is particularly the case for many floating lidar applications that prevail more and more in the offshore wind industry as a most costefficient alternative to fixed offshore meteorological (met) masts (Gottschall et al., 2017).
Up to a certain threshold of frequency and length of data gaps, the longterm extrapolation, which in the standard procedures involves some correlation of measured and reference data for the overlapping period, is often applied to the not fully continuous time series. MEASNET (2016), for instance, considers a measurement to be incomplete only when the availability of filtered data is less than 90 %. As an alternative, the time series can be “filled” before the application of the correlation analysis. Gapfilling procedures typically use reanalysis data, e.g. from MERRA2 (Donlon et al., 2012) or ERA5 (Hersbach and Dick, 2016), often downscaled with a mesoscale model such as WRF (Skamarock et al., 2019). Such a gap filling is, in particular, applied when the gap corresponds to a substantial discontinuity in a measurement time series of several days, weeks or even months, not just a few data points that can be filled by statistical approaches or even interpolation only.
Gap filling is a task that is not specific to the wind resource assessment application but can be of relevance for any measured time series or collected dataset where data gaps may significantly impact the outcome of the following data analysis. In the most general context, procedures to compensate for missing values in a dataset are referred to as imputation. There are a number of different imputation procedures that have in common that missing data are not simply ignored but instead replaced by plausible values. Specific gapfilling procedures for meteorological time series are discussed in Körner et al. (2018), Pappas et al. (2014) and the references herein, for example. These procedures include

linear interpolation from adjacent time steps (particularly for cases where only a few data points are missing);

autoregressive models (for longer periods of missing data and without adjacent sites as possible predictors);

different methods of spatial interpolation (in case adjacent sites are available);

datadriven methods like nearestneighbour approaches, linear or multiple linear regression, lookup tables, or artificial neural networks (Körner et al., 2018).
For the wind resource assessment application, linear regression methods are of particular interest since they are often already used for the longterm extrapolation in the context of measure–correlate–predict (MCP) approaches (MEASNET, 2016). Often further dimensions are introduced by considering separate wind direction sectors or wind speed bins. Generally, MCP methods are not limited to linear regressions (see Carta et al., 2013, for a broader overview); however, in practice they are most often implemented in this way. This is why we concentrate on this type of procedure – both for data gap filling and longterm extrapolation – in this contribution.
The overall scope of the study is as follows: before discussing a selected specific data gapfilling approach, we investigate how data gaps impact the standard wind resource estimates by deriving and evaluating bias and uncertainty measures for wind time series with artificial gaps of varying length and seasonal period of occurrence. We repeat this analysis for the time series where the gaps are filled and, with this, study to which extent the impact of the gaps can be mitigated. The study is applied to the statistics of the shortterm dataset, defined by the period of the measurements, as well as to the final longterm estimate, since both sets of results are relevant in the wind energy context. By deriving and comparing conclusions for three different offshore sites – in the German Bight, the Dutch North Sea and in the Baltic Sea – we also address the impact of the site and possible dependencies.
The article is structured as follows: in Sect. 2 we describe the data basis and in Sect. 3 the methods for this study. Section 4 presents the results for the impact of ignored and filled gaps on the shortterm and longterm wind statistics. In Sect. 5 we discuss our findings with the particular implications for future resource assessment studies. And, finally, in Sect. 6 we summarise the main conclusions of our study.
The data basis consists of measurement data from met masts over a measuring time which is characterised by a high availability on the basis of which the influence of measurement gaps and their filling is investigated (Sect. 2.1), as well as numerical data used for filling the gaps in the measurement data and the subsequent longterm extrapolation (Sect. 2.2).
2.1 Sites and measurement data
The analyses in this study are done independently for three different offshore met masts representing different typical sites for offshore wind energy utilisation in Europe. Two masts are located in the North Sea (FINO3 and IJmuiden) both about 50 km offshore from the nearest coastline with large wind direction sectors where the nearest coastline is several hundreds of kilometres upstream. The third mast (FINO2) is located in the central southern Baltic Sea and surrounded by land within 50 km or less except for a small wind direction sector. The sites were chosen to represent typical European offshore wind exploration areas with different distances to the coasts and varying atmospheric stability (see e.g. Dörenkämper, 2015; Kalverla et al., 2019).
In Fig. 1 the positions of the three met masts are given. The frames mark the area of the innermost domains of the mesoscale data used for the gap filling (see Sect. 2.2). The data of all three met masts are freely available for scientific purposes. All of the masts are equipped with cup or sonic anemometers and wind vanes up to a height of 100 m above the sea surface. For our study we consider the 10 min averages of horizontal wind speed and direction data provided by these sensors. A measurement height close to 90 m was chosen at all three masts (see description below for more details), representing a typical hub height of offshore wind turbines. The 24month period from 1 July 2012 to 30 June 2014 was selected based on the combined data availability for all three masts and other constraints such as limiting disturbance of wakes of nearby wind farms that were erected afterwards in the vicinity of the masts some years after the commission of the respective masts.
As the aim of this study is to investigate the impact of gaps on offshore windenergyrelevant wind statistics, a reference time series with a low amount of missing data was needed. Thus, besides the selection of the 2year period with a low number of gaps, further gaps were filled with measurement data from lower altitudes. To consider the wind speed dependence with height, a speedup factor (sup) is defined according to $\text{sup}=\text{WS}{\mathrm{90}}_{\text{mean}}/\text{WS}[X{]}_{\text{mean}}$, where WS90_{mean} is the average wind speed at the measurement altitude of the mast closest to 90 m and WS[X]_{mean} the average wind speed of the measurement at a lower height and applied to the wind speed measurement of the lower altitude. In the case of gaps, the wind direction measurements were filled by measurements at lower heights as well but without using any scaling or offset correction. Data gaps in the mast measurements filled by applying this preprocessing are shown in Fig. 2. A short description of the three met masts with references to more detailed information is given in the following.

The IJmuiden met mast^{1} is located about 85 km west of Den Helder in the Dutch part of the North Sea. The met mast was in operation between November 2011 and March 2016 and was decommissioned afterwards. The mast was used in several wind energy research studies (e.g. Baas et al., 2016; Kalverla et al., 2019). It provides measurements at several heights and is described in more detail in Poveda et al. (2015). For the analysis in this study, the mastcorrected wind speed measurement (cup anemometers) located at 92 m height was used together with the wind direction measurement at a height of 87 m. The availability of the wind speed measurement data at these heights is 99.5 % prior to and 99.7 % after the filling of gaps by lower measurement heights (see procedure above).

The FINO2 met mast^{2} is located in the central southern Baltic Sea close to the border triangle of Denmark, Germany and Sweden in the German part of the Baltic Sea. In contrast to the North Sea sites, the FINO2 measurements are affected by the surrounding lands with distances of less than 50 km for the majority of wind direction sectors. Only a narrow northeasterly sector is dominated by a long marine fetch. FINO2 has been in operation since August 2007, and the data were studied in several windenergyrelated studies (e.g. Gryning et al., 2014; Dörenkämper, 2015). FINO2 provides wind measurements at various heights between 32 and 102 m a.s.l., technically described in FINO2 (2007). In this study mainly the wind speed measurements from the cup anemometers at 92 m height were used in combination with the wind direction measurement (vane) at the same height on the boom of the opposite side of the mast. The availability of the wind speed time series was 86.4 % prior to and 95.5 % after the application of the gap filling from lower heights (see procedure above).

FINO3 is a met mast^{3} located in the northern part of the German Bight about 80 km northwest of the island of Sylt. Thus, the impact of upstream coastlines is very limited and a pure offshore climate is found in particular for the main wind direction sectors (south to northwest). FINO3 has been in operation since September 2009 and provides wind measurements each 10 m between 32 and 102 m a.s.l. as described in FINO3 (2012). The FINO3 wind measurements were part of several wind energy studies (Peña et al., 2015; Gryning et al., 2016). This study analyses the wind speed (cup) and wind direction (vane) data from the measurements at 92 m and 101 m a.m.s.l. These wind speed measurement data have an availability of 98.4 % prior to and 98.9 % after the application of gap filling from measurements at lower heights (see procedure above). A detailed overview of the measurements of the three FINO masts, their device types, accuracy and boom orientations is given in the Appendix of Leiding et al. (2012). Less than a kilometre west of the FINO3 platform, the wind farm DanTysk was constructed between February 2013 and April 2015. The erection of turbines did not start before April 2014, and operation did not start before December 2014. So, the wind statistics of FINO3 considered for this study should not be impacted by wakes of DanTysk.
Wind speed and wind direction distributions for the three datasets of measurements are shown in Fig. 3. Note that the measurement heights slightly differ for the three sites – as described above, wind speed measurements are recorded at 92 m on all three masts, while wind directions are recorded at 87 m (IJmuiden), 92 m (FINO2) and 101 m (FINO3). Derived wind statistics are summarised in Table 1. Here and for the following analysis we consider the parameters mean wind direction and mean wind speed and the parameters k (Weibull shape parameter) and A (Weibull scale parameter) that are obtained from fitting a Weibull distribution function to the wind speed distributions. The fitting procedure is implemented as a nonlinear leastsquares regression considering the complete wind speed range. All three masts represent typical midlatitude offshore wind climates. A shift from southwesterly to more westerly winds is found while moving from west to east, being in line with the typical track of cyclones when moving across central Europe (van Bebber, 1891).
2.2 Numerical data for gap filling and longterm extrapolation
The procedures applied for this study make use of regional mesoscale modelling data that are used for the gap filling, as well as longterm reanalysis data that are applied for longterm referencing of the wind measurements. These data sources are described separately below.
2.2.1 Reanalysis data
For the longterm extrapolation, the data from the ERA5 reanalysis (Hersbach et al., 2020) were used. ERA5 is the most recent generation of reanalysis data issued by the European Centre for MediumRange Weather Forecasts (ECMWF) since 2017. For wind energy applications it was shown to outperform other reanalyses (e.g. Olauson, 2018; Thøgersen et al., 2017). ERA5 provides reanalyses on all important atmospheric and oceanographic parameters at an hourly resolution in time and 0.25^{∘} (≈30 km for the atmospheric parameters; others differ) in zonal and meridional directions globally. Currently the period of 1979–ongoing is publicly available with a lag of a few days in time. For the longterm referencing in this study, the wind speed (zonal and meridional components, u and v at 100 m) from the socalled surfacelevel data of the ERA5 dataset were selected for the period 1983–2014 to cover a climatic period of 30 years. Most recently, mesoscale model datasets with a higher resolution on the order of a few kilometres were made available for longer periods (up to 30 years) and sometimes used for longterm referencing. However, as the industry at least partly still relies on classical lowerresolution reanalyses, we have applied this approach for our study. In addition, due to its comparatively high resolution ERA5 does not show major differences in the offshore wind speed climate statistics several tens of kilometres away from the coastal discontinuity (Dörenkämper et al., 2020).
2.2.2 Mesoscale modelling and data
The mesoscale model data in this study are used for filling the gaps that are artificially cut into the time series. In principle any mesoscale model data could be used, such as those from the publicly available New European Wind Atlas (NEWA) (Hahmann et al., 2020; Dörenkämper et al., 2020) or commercial products. However, these data are often not optimised for offshore wind energy applications or only available in lower resolution in time (e.g. 30 min instead of the desired 10 min data). Consequently, simulations were performed separately for this study, applying a setup that was optimised for offshore wind applications (Dörenkämper et al., 2015, 2017; Gottschall et al., 2018) and capable of resolving the most important flow features in offshore development regions.
The simulations were carried out using the Weather Research and Forecasting (WRF) model (Skamarock et al., 2019) in its version 4.0.1, which is generally well known and commonly used in the wind energy community (e.g. Dörenkämper et al., 2015; Hahmann et al., 2020). The mesoscale simulation setup was similar for all three sites, consisting of three domains with resolutions of 18, 6 and 2 km and a domain size of 150×150 GP (grid points) for each domain, centred around the site of interest (i.e. the respective offshore met mast). Figure 4 exemplarily shows the size of the three domains for the FINO3 site. The sizes of the innermost domains (D3) for all three sites are given in Fig. 1.
Boundary conditions for the model were prescribed by the ERA5 dataset for the atmospheric variables (Hersbach and Dick, 2016; Hersbach et al., 2020) and the OSTIA dataset for the sea surface variables (Donlon et al., 2012). An instantaneous output of the mesoscale model on 10 min intervals was chosen, being consistent with the 10 min means of the met masts.
Table 2 shows a summary of the most important model setup parameters and thus boundary conditions and model physics used in this study to drive the simulations. The output data on the WRF internal (Arakawa C, sigma terrain following) grid were converted to earthrelative quantities using the postprocessing script developed and verified in the framework of the NEWA project^{4} (Dörenkämper et al., 2020). The data were interpolated to the exact measurement heights from the WRF levels, and virtual met masts were extracted at the grid point closest to the location of the met masts investigated in this study.
The methods applied for our study are described in the following subsections and demonstrated on the basis of the FINO3 dataset.
3.1 Generation of artificial gaps
The first step of the analysis consists in generating artificial data gaps in the measured time series. This is demonstrated in Fig. 5 in the two upper plots. A data gap is defined by its length (in days or number of 10 min intervals) and its start. Figure 5 shows a gap of 30 d length starting on 30 September 2013, 00:00:00 UTC. For the results presented in Sect. 4 we have considered gap lengths between 6 and 90 d and start dates running through the 2year measurement duration in equal increments. The four windenergyrelevant statistical measures (mean wind speed and direction and Weibull shape and scale parameters) already presented in Sect. 2.1 are derived for the incomplete time series in the same way as for the original ones but ignoring the data in the gap. The deviations in these measures represent the impact of the gap on the wind statistics. Note that for this study we have only considered single gaps of varying lengths. Multiple gaps are briefly discussed in Sect. 5.
3.2 Gapfilling procedure
The artificial gaps are filled based on a measure–correlate–predict (MCP) procedure and with the WRF data introduced in 2.2 as input. The measured time series consist of the wind speed and direction data including the generated gap, respectively. These data are correlated with the numerical (WRF) data for the same period. That is, for the period of the gap no data are considered for the correlation step. The correlation defines a correction that is implemented slightly differently for the wind speed and direction time series.

For the wind speed data, we – first – bin the wind speeds every 0.5 m s^{−1} based on the modelled data and calculate the average measured values in every bin. Second, we fit two linear functions for the wind speed ranges [0,5) and [5,20] m s^{−1}. The resulting coefficients of the linear fits are then applied to correct the respective modelled wind speed and in this way account for the systematic error between measured and modelled data.

For the wind direction data, again first the mean deviation between measured and simulated wind direction per 10^{∘} bin is derived and then used directly as offset for the correction.
Note that the choice of this approach is more or less arbitrary but motivated by current practice for similar studies and applications. No further procedural steps such as, for example, sectorwise corrections are considered.
In addition to the correction factor – resulting from either the correction function or the binwise mean offset – a noise factor is derived as the standard deviation of the data per bin and combined with a whitenoise process in the prediction step. The noise factor ensures that the generated time series does not lose its physical consistency. For the prediction of the data in the gap period, numerical (WRF) data for this period are combined with the derived corrections. The resulting time series are inserted to the incomplete measurement time series.
Figure 5 (bottom plot) shows the outcome of the gapfilling procedure for the FINO3 time series used for demonstration.
As already mentioned above, the procedure consisting of generating artificial gaps in the measured time series and the filling with the outlined MCP approach is repeated for varying start dates of the gap that has a predefined length (in the example 30 d). This is demonstrated in Fig. 6, where the four derived statistical measures are shown for an unchanged gap length but systematically varying start date for the incomplete and filled time series (in red and blue). The wind statistics for the original time series are shown as a reference (as black lines).
This example demonstrates that the impact of the generated gap varies quite drastically depending on when the gap starts, similar patterns are observed for all four considered statistical measures and the gapfilling procedure is able to significantly reduce the impact in almost all cases. However, the performance of the gapfilling procedure also depends on the start date of the gap. This can be explained by the fact that the correlation between measured and model data found on the basis of the existing data describes the correlation for the data of the gap to varying degrees. The correlation depends to a certain extent on seasonal effects, for example. To quantify the observed variations in the wind statistics, the corresponding rootmeansquare error (RMSE) values are derived. For all four statistical measures, these reduce when the gaps are filled: for mean wind direction from 3.1 to 0.3^{∘}, for mean wind speed from 0.07 to 0.01 m s^{−1}, for the Weibull scale parameter A from 0.09 to 0.01 m s^{−1} and for the Weibull shape parameter k from 0.017 to 0.007.
Uncertainties or standard errors in the estimation of the parameters are not further considered here and in the following as they are small compared to the reduction of the gap impact, which is the focus of this study.
3.3 Longterm extrapolation
In a last step, the different time series are used as the basis for a longterm extrapolation of the wind time series and statistics. Therefore, the measured time series with or without data gaps are correlated with ERA5 reanalysis data that are available for a longterm period of 30 years in this case. The underlying MCP procedure is very similar to the one applied for the data gap filling in 3.2. But this time the correlation period corresponds to the total measurement period of 2 years in our case (for the incomplete time series shortened by the gap length), and the prediction horizon corresponds to the complete 30 years for which the reference data are available. The measured time series has to be resampled to 1 h data since the ERA5 data have no higher resolution. Otherwise, the same methods were used to derive and apply correction functions and offsets. As already pointed out above, we decided to use ERA5 data as longterm reference data (and not again WRF, which is also available for a period of 30 years in the New European Wind Atlas) because we believe this is a choice that still better corresponds to a typical case in a standard offshore wind resource assessment application, while the gap filling is still based on the mesoscale model data as before.
The overall workflow followed in the study is summarised in Fig. 7.
In this section we present in detail the results of the study, expanding on the impact of gaps on the wind resource estimate with varying start dates (Sect. 4.1) as well as varying lengths (Sect. 4.2) and the impact of the gaps on the longterm wind resource estimate (Sect. 4.3). Results are compared for the three considered sites IJmuiden (Dutch North Sea), FINO2 (Baltic Sea) and FINO3 (German Bight).
4.1 Impact of gaps with varying start dates
Figure 8 shows how a 30 d gap impacts the four considered wind statistics (mean wind direction, mean wind speed, and the Weibull parameters k and A) depending on the start date of the gap for all three studied sites. The results for FINO3, already presented in Fig. 6, are shown in grey and the results for IJmuiden and FINO2 as dashed and dotted lines, respectively. Again, the applied gap filling (results in blue) reduces the deviations in the measures from the reference (in black) due to the existent gap (results in red) to a considerable degree. These reductions, quantified in terms of an RMSE for the respective dataset of results, are summarised in Table 3.
For mean wind direction, mean wind speed and Weibull scale parameter A, the derived RMSE values, summarising the deviations in the wind statistics due to the gaps with varying start dates, reduce up to a factor of 10. For the Weibull shape parameter k this reduction is smaller (up to a factor of 3) which is explained by the nature of this parameter. Overall, the reductions are similar for all three considered sites. Also, the pattern of deviations in the wind statistics over the 2year course are pretty similar, except those for the Weibull k parameter. Beyond that, Fig. 8 clearly shows how the wind statistics for the two sites FINO3 and FINO2, although having a very close mean wind speed for the considered 2year period, differ with respect to their wind speed distributions and in particular the derived Weibull scale A and shape k parameters. The third site, IJmuiden, in comparison, is characterised by both the highest mean wind speed and Weibull A parameter and the lowest Weibull k parameter.
4.2 Impact of gaps of varying lengths
In the next step, we have repeated this analysis for different gap lengths between 6 and 90 d. Figure 9 shows the derived RMSE values for the four considered wind statistics (in four separate plots) and the three sites (as solid, dashed and dotted curves) plotted against the gap length, again for ignored and filled gaps (in red and blue).
In all considered cases, the impact of the gap (reflected by the derived RMSE) increases with gap length and is significantly reduced when applying the gapfilling procedure. For the mean wind speed, just like for the Weibull A parameter, the increase is more or less linear for the considered range up to 90 d, whereas it slightly flattens for the two other measures, mean wind direction and Weibull k.
Apart from these general agreements, the results for the three sites show some deviations. For instance, the impact of the data gaps on the mean wind speed are largest, when these are ignored, for the IJmuiden site but can be best compensated for. This is shown by the smallest RMSE values, compared to those for the two other sites, after gap filling. This observation may be explained by either the performance of the used numerical model for the respective site or a statistical effect that relates to the level of observed wind speeds. (Remember IJmuiden showed the highest measured mean wind speeds in the considered 2year period.)
The impact of (ignored) gaps in the wind direction time series is highest for the FINO3 dataset. This can be understood by looking again at the wind direction distributions in Fig. 1: mean wind directions are more pronounced for the FINO2 and IJmuiden sites, whereas the distribution for FINO3 is characterised by a kind of site maximum for northwesterly directions. A data gap may in this case remove data that correspond to a substantial part of one of the local maxima, having a larger impact on the overall distribution as for the case where the distribution has only one superior maximum range. After gap filling, the RMSE values are still larger for the FINO3 dataset than for the IJmuiden data, but the deviations are now much smaller. RMSE values for FINO2 and FINO3 lie almost on top of each other.
In a further step, we have studied how the impact of data gaps varies with the season in which the data gap occurs. For this, the ”season” is defined by the start date of a gap – a gap starting in the months of January to March is related to “season 1”, a gap starting between April and June to “season 2”, and so on. These seasons were selected with a shift of 1 month in comparison to the classical meteorological season definition of spring, summer, autumn and winter to consider the inertia of the heating–cooling of the sea surface that mainly drives the yearly cycle of the atmospheric stability which vice versa has an impact on the wind distribution.
Figure 10 shows the results for FINO3 and the statistics mean wind speed and mean wind direction only, but they are more or less also representative for the other cases. The plots clarify that the impact of ignored and filled gaps significantly depends on the assigned season whereby also the performance of the gap filling shows a certain dependency, but not always going in the same direction. Deviations are observed for not only the levels of derived RMSE values (i.e. how big is the impact) but also the shape of the curves (i.e. how does this change with gap length). This can be explained by the relation between gap length and the length of a season as defined above: a gap of a greater length is more likely to occur not just in the season it is assigned to. By this partly wrong assignment the seasonal effects are more mixed for the greater lengths.
4.3 Impact of gaps on longterm estimate
In a final step, we derive the impact of ignored and filled gaps in the measurement data on a longterm extrapolated mean wind speed. For this, we followed the procedure outlined in 3.3 and summarised in Fig. 7. Figure 11 shows how the mean wind speed that is derived based on 30 years of ERA5 data and corrected according to the 2yearlong measurements at the three considered sites varies with the start date of a 30 d data gap that is cut into the measurement time series. Again the data gaps are either ignored (results in red) or filled by applying the introduced gapfilling procedure (results in blue).
The following conclusions can be drawn from Fig. 11.

The longterm corrected mean wind speed is significantly different from the mean values of the 2 years of measurements for the three considered sites. (Mean wind speed values of the used 30yearlong ERA5 time series are 9.43, 9.91 and 9.26 m s^{−1} for FINO3, IJmuiden and FINO2, respectively.)

The impact of data gaps in the shortterm measurements is visible in the longterm estimates but is rather small with an RMSE (for ignored gaps, red curves) of 0.011 m s^{−1} (FINO3), 0.007 m s^{−1} (IJmuiden) and 0.014 m s^{−1} (FINO2).

This impact of the data gaps in the shortterm measurement on the longterm estimates is not really mitigated through the application of the gapfilling procedure; corresponding RMSE values (for filled gaps, blue curves) are in the same range or even slightly larger with 0.015 m s^{−1} (FINO3), 0.008 m s^{−1} (IJmuiden) and 0.014 m s^{−1} (FINO2).

Also, the reference values (black curves) show some variability, which is due to the noise process as part of the MCP procedure applied for the longterm extrapolation. The RMSE values reflecting these variations are equal for all three sites with 0.003 m s^{−1} where the mean value is considered as reference.
Our study proposes a methodology that allows us to quantify the impact of data gaps in (measured) time series on wind statistics. With the three studied sites, we have considered three possible reference datasets, which could be referred to for further sites where only incomplete time series are available but no suitable reference. The reference quantification can then be used to deduce an uncertainty associated with the inherent gaps, which could be related to the RMSE value derived for the variations in the wind statistics for different gap start dates for a fixed gap length. Alternatively, for a more conservative approach, the maximum deviations in the wind statistics observed in the reference study could be considered or, in case more details are available, the identified variations for a specific season could be considered.
For our study, we have analysed the four statistical measures mean wind direction, mean wind speed, and the two Weibull parameters k and A which are very common for siting applications – but in principle this selection has been arbitrary and can be further extended. Another common measure frequently considered in the wind energy context is the wind power density (WPD), as defined in Chang (2011), which integrates the two Weibull parameters (A and k) according to
with the air density ρ and the gamma function $\mathrm{\Gamma}\left(x\right)={\int}_{\mathrm{0}}^{\mathrm{\infty}}{e}^{t}\cdot {t}^{x\mathrm{1}}\phantom{\rule{0.125em}{0ex}}\mathrm{d}t$.
In Fig. 12, we show the results for WPD as for the other measures in Fig. 6. Again, it is clear to see how the gapfilling procedure reduces the impact of the data gaps. The reduction is smaller than for the Weibull parameter A but larger than for k with an RMSE value decreasing from 21.2 to 5.0 Wm^{−2}.
In the presented investigations, we have only considered isolated single gaps in a measured time series. But the approach followed can be extended, in a straightforward way, to more complex scenarios including multiple gaps that may be more realistic or may correspond to a specific case of interest. We would then recommend the following procedure: the present scenario would first be generalised to an extent so that the available reference study case is sufficiently informative. If we want to evaluate the impact of a 20 d gap in February of a certain year, for instance, it may not be sufficient to study the impact of such a gap in the reference data from another period only for the month of February. Instead the scenario may be broadened to a 20 d gap in the winter season. For making this decision, some background knowledge of the general wind climate at the studied sites is required, which can be gained from (numerical) longterm datasets. A similar approach is recommended for the consideration of multiple gaps, for which not only the lengths of the individual gaps need to be taken into account but also their distance in time and possible correlation effects.
With carrying out the study for three different offshore sites and showing the systematic similarities and some deviations between the results, we provided a basis for the selection of suitable reference sites and datasets. Again, some knowledge of the general wind climate at a site is required to evaluate whether a certain study site is suitable or not for the estimation of an uncertainty that is then used for the evaluation of the measurements from another site. In general, however, we believe that this transfer of observations is possible and suggest using the available sites and datasets for this purpose. An extension of our study to further sites, moreover, may help to better understand how the impact of data gaps on wind statistics may vary from site to site and to take such findings into account for an even more refined estimation of the associated uncertainties.
When looking at the mitigation of the impact of data gaps in the measured time series – explicitly, with the applied gapfilling procedure and the use of an MCP procedure in connection with wind data from a numerical model – we have only applied one specific method but not further studied how the results may change with the application of other approaches. In this context, it was important for us to have a procedure that is straightforward and easy to apply for all three sites in exactly the same way. But we definitely also believe that a refinement – e.g. by using more complex approaches and possibly also some finetuning for the individual sites – may show an optimised performance and with this less remaining impact of the data gaps on the wind statistics after gap filling. We believe that a specific gapfilling approach should be an integral part of the wind resource assessment process that is applied by a specific consultant for a specific site as it improves the wind statistics of the measured period and can potentially also reduce the uncertainty of the longterm assessment.
It should also be pointed out that it is possibly not the optimal approach to apply the same type of MCP procedure for both the gapfilling and the longterm extrapolation steps. Depending on whether the simulation of time series (i.e. for a point prediction or filling gaps) or the simulation of a wind distribution or wind statistics is of interest, socalled type I or type II MCP methods may be the better choice (Hanslian, 2017). In short, type I MCP methods are designed to simulate time series whereas type II methods generate wind distributions. Whichever method is selected for the specific MCP task, this method should also be applied in the reference study to quantify the gap impact and estimate the associated uncertainty that is of high relevance for the (here: wind energy) application in any case.
The fact that we have not optimised the MCP methods for our applications may also be the reason for the initially counterintuitive observation that the gapfilling procedure, applied to the shortterm measurements, has no positive effect on the longterm extrapolated results (see 4.3). Another reason is the relatively short gap of only 1 month which is still within the availability of >90 % accepted by MEASNET. Furthermore, if we compare the deviations in Figs. 8 and 11, we see that the fluctuations in the longterm average are on the order of magnitude of the gapcorrected values and not on that of the uncorrected values. This means that the longterm correction already averages out the effects of the data gaps to some extent, so that the gap filling is no longer significant.
Finally, it must also be kept in mind that the quantified uncertainty is – when looking at the complete wind resource assessment process – not the only uncertainty that is associated with the longterm extrapolation. Another substantial uncertainty component arises from the fact that the considered shortterm period for which onsite measurements are available has only a limited representativeness for the long term. Some of this is compensated for by the longterm extrapolation based on a “long” dataset itself, but it needs to be considered that a derived correction function always has some dependency on the available correlation period. This dependency and related variations in the results of the estimated wind statistics constitute another part of the uncertainty associated with the longterm extrapolation, not yet taken into account, in a wind resource assessment.
As any field experiment, wind measurements that are typically carried out for site assessment studies are subject to data gaps due to failures of measurement devices or data loggers. In the harsh offshore wind climate, wind and wave conditions can lead to considerable time windows of inaccessibility of the measurement platform no matter whether floating, e.g. buoy, or mast measurement. In our study we investigated the impact of these data gaps on typical statistical measures for wind energy siting applications such as mean wind speed and direction and the Weibull shape and scale parameters. The study was performed for three offshore sites with meteorological mast measurements available between July 2012 and June 2014 in the southern North Sea (FINO3 and IJmuiden) and the southern Baltic Sea (FINO2). We proposed a gapfilling procedure that uses data from mesoscale meteorological modelling and studied the benefit of the gap filling in terms of the RMSE of the siting statistics. The study reports the following key results.

A gap of 30 d in the dataset leads to an RMSE on the mean wind speed of up to about 0.1 m s^{−1} in the mean wind speed and the Weibull scale parameter A, an RMSE of about 0.02 on Weibull shape k and 3^{∘} in the mean wind direction.

The gap filling with mesoscale data can considerably reduce this impact up to a factor of 3 on the Weibull shape and a factor of 10 on the three other investigated siting parameters mean wind speed, direction and Weibull scale.

The impact of the data gaps is monotonically and almost linearly increasing with the length of the data gap when considering the full year wind climate and so is the reduction of the impact of the gap filling. However, when looking at different seasons, the skill of the gap filling differs.

The key conclusions are similar for the three investigated sites, although the impact of gaps differs with the highest impact on the data from the FINO3 mast that is the only mast with a prominent impact of northwesterly winds and also the mast that is located furthest offshore.

The impact of the gaps on the longterm estimate, expressed here in terms of a 30year wind climatology, is very small (around 0.01 m s^{−1} at all three sites) and cannot be substantially further reduced by the gap filling of the reference measurement dataset.
Our investigation focused on three European offshore sites in the North Sea and Baltic Sea and could in future studies be evaluated for other offshore exploration areas with more different wind distributions in speed and direction. We intentionally focussed on three commonly used key wind energy siting statistics. With the tendency of a gridloadbased renumeration of wind power, an investigation of the impact of data gaps on daily cycles might be interesting for future investigations.
The mesoscale model data are available upon request, and the mesoscale model itself is open source and can be obtained from NCAR (2021, https://doi.org/10.5065/D6MK6B4K). The WRF model simulations were initialized using ERA5 and OSTIA data downloaded from the Copernicus Climate Change Service Climate Data Store (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysisera5pressurelevels?tab=overview, Copernicus CDS, 2021), and Copernicus Marine Service (https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=SST_GLO_SST_L4_NRT_OBSERVATIONS_010_001, Copernicus CMS, 2021). The mast data are publicly available for scientific purposes via BSH and TNO.
JG performed the gap analysis and the implementation of filling and analysis procedures. MD prepared the measurement and reanalysis data and conducted the mesoscale model simulations for the gap filling. Both authors discussed the results and wrote and reviewed the manuscript.
The authors declare that they have no conflict of interest.
The simulations were performed at the HPC Cluster EDDY, located at the University of Oldenburg (Germany) and funded by BMWi (ref. no. 0324005). The study here was motivated by the results of two masters thesis projects: we acknowledge Bilke Engelbrecht and Christine Martens for their very valuable previous works. We thank BSH for providing access to the FINO2 and FINO3 data and TNO for the data of the IJmuiden met mast.
This research was partly carried out in the framework of the projects Digitale Windboje (ref. no. 03EE3024) and NEWA (ref. no. 0325832A) funded by the German Federal Ministry for Economic Affairs and Energy (BMWi) on the basis of a decision by the German Bundestag with further financial support from NEWA ERANET Plus, topic FP7ENERGY.2013.10.1.2, the latter only for NEWA.
This paper was edited by Jakob Mann and reviewed by two anonymous referees.
Baas, P., Bosveld, F. C., and Burgers, G.: The impact of atmospheric stability on the nearsurface wind over sea in storm conditions, Wind Energy, 19, 187–198, https://doi.org/10.1002/we.1825, 2016. a
Carta, J. A., Velázquez, S., and Cabrera, P.: A review of measurecorrelatepredict (MCP) methods used to estimate longterm wind characteristics at a target site, Renew. Sust. Energ. Rev., 27, 362–400, https://doi.org/10.1016/j.rser.2013.07.004, 2013. a, b
Chang, T. P.: Performance comparison of six numerical methods in estimating Weibull parameters for wind energy application, Appl. Energ., 88, 272–282, https://doi.org/10.1016/j.apenergy.2010.06.018, 2011. a
Copernicus CDS: Copernicus Climate Data Store, available at: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysisera5pressurelevels?tab=overview, last access: 8 April 2021. a
Copernicus CMS: Copernicus Marine Service, available at: https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=SST_GLO_SST_L4_NRT_OBSERVATIONS_010_001, last access: 8 April 2021. a
Donlon, C. J., Martin, M., Stark, J., RobertsJones, J., Fiedler, E., and Wimmer, W.: The Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) system, Remote Sens. Environ., 116, 140–158, https://doi.org/10.1016/j.rse.2010.10.017, 2012. a, b
Dörenkämper, M.: An investigation of the atmospheric influence on spatial and temporal power fluctuations in offshore wind farmss, Dissertation, Carl von Ossietzky Universität, Oldenburg, 2015. a, b
Dörenkämper, M., Optis, M., Monahan, A., and Steinfeld, G.: On the Offshore advection of BoundaryLayer Structures and the Influence on Offshore Wind Conditions, Bound.Lay. Meteorol., 155, 459–482, https://doi.org/10.1007/s105460150008x, 2015. a, b
Dörenkämper, M., Stoevesandt, B., and Heinemann, D.: Derivation of an offshore wind index for the German bight from highresolution mesoscale simulation data, Proceedings of DEWEK – German Offshore Wind Energy Conference, 5, 17–18 October 2017, available at: http://publica.fraunhofer.de/documents/N484817.html (last access: 8 April 2021), 2017. a
Dörenkämper, M., Olsen, B. T., Witha, B., Hahmann, A. N., Davis, N. N., Barcons, J., Ezber, Y., GarcíaBustamante, E., GonzálezRouco, J. F., Navarro, J., SastreMarugán, M., Sīle, T., Trei, W., Žagar, M., Badger, J., Gottschall, J., Sanz Rodrigo, J., and Mann, J.: The Making of the New European Wind Atlas – Part 2: Production and evaluation, Geosci. Model Dev., 13, 5079–5102, https://doi.org/10.5194/gmd1350792020, 2020. a, b, c, d
FGW e.V.: Technical Guidelines for Wind Turbines – Part 6 (TG6) Determination of Wind Potential and Energy Yield, Richtlinie, Fördergesellschaft Windenergie und andere Dezentrale Energien, Berlin, Germany, 2017. a
FINO2: FINO2 measurement platform – Installation Protocol, Tech. Rep., 152 pp., Wind Consult, Bargeshagen, Germany, 2007. a
FINO3: FINO3 measurement platform – Technical Note, Tech. Rep., 57 pp., GL – Garrad Hassan, GLGH4257 12 08840 266T0001A, Hamburg, Germany, 2012. a
Gottschall, J., Gribben, B., Stein, D., and Würth, I.: Floating lidar as an advanced offshore wind speed measurement technique: current technology status and gap analysis in regard to full maturity, WIRes. Energy Environ., 6, 5, https://doi.org/10.1002/wene.250, 2017. a
Gottschall, J., Catalano, E., Dörenkämper, M., and Witha, B.: The NEWA Ferry Lidar Experiment: Measuring Mesoscale Winds in the Southern Baltic Sea, Remote Sens., 10, 1620, https://doi.org/10.3390/rs10101620, 2018. a
Gryning, S.E., Badger, J., Hahmann, A. N., and Batchvarova, E.: Current Status and Challenges in Wind Energy Assessment, in: Weather Matters for Energy, edited by Troccoli, A., Dubus, L., and Haupt, S. E., pp. 275–293, Springer, New York, NY, https://doi.org/10.1007/9781461492214_13, 2014. a
Gryning, S.E., Floors, R., Peña, A., Batchvarova, E., and Brümmer, B.: Weibull WindSpeed Distribution Parameters Derived from a Combination of WindLidar and TallMast Measurements Over Land, Coastal and Marine Sites, Bound.Lay. Meteorol., 159, 329–348, https://doi.org/10.1007/s105460150113x, 2016. a
Hahmann, A. N., Sīle, T., Witha, B., Davis, N. N., Dörenkämper, M., Ezber, Y., GarcíaBustamante, E., GonzálezRouco, J. F., Navarro, J., Olsen, B. T., and Söderberg, S.: The making of the New European Wind Atlas – Part 1: Model sensitivity, Geosci. Model Dev., 13, 5053–5078, https://doi.org/10.5194/gmd1350532020, 2020. a, b, c
Hanslian, D.: The matrix of measurecorrelatepredict methods, Proceedings of ICEM 2017, 27–29 June 2017, Bari, Italy, available at: https://www.wemcouncil.org/wp/wpcontent/uploads/2017/10/icem_hanslian_20170628_1240_sala_2.pdf (last access: 8 April 2021), 2017. a
Hersbach, H. and Dick, D.: ERA5 reanalysis is in production, http://www.ecmwf.int/en/newsletter/147/news/era5reanalysisproduction (last access: 13 July 2020), 2016. a, b
Hersbach, H., Bell, B., Berrisford, P., et al.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, qj.3803, https://doi.org/10.1002/qj.3803, 2020. a, b
Kalverla, P., Steeneveld, G.J., Ronda, R., and Holtslag, A. A.: Evaluation of three mainstream numerical weather prediction models with observations from meteorological mast IJmuiden at the North Sea, Wind Energy, 22, 34–48, https://doi.org/10.1002/we.2267, 2019. a, b
Körner, P., Kronenberg, R., Genzel, S., and Bernhofer, C.: Introducing Gradient Boosting as a universal gap filling tool for meteorological time series, Meteorol. Z., 27, 369–376, https://doi.org/10.1127/metz/2018/0908, 2018. a, b
Leiding, T., Tinz, B., Gates, L., Rosenhagen, G., Herklotz, K., Senet, C., Outzen, O., Lindenthal, A., Neumann, T., Frühman, R., Wilts, F., Bégué, F., Schwenk, P., Stein, D., Bastigkeit, I., Lange, B., Hagemann, S., Müller, S., and Schwabe, J.: Standardisierung und vergleichende Analyse der meteorologischen FINOMessdaten (FINO123), Tech. Rep., Final Report – FINOWind Research Project, Hamburg, Germany, available at: https://www.dwd.de/DE/forschung/projekte/fino_wind/fino_wind_node.html (last access: 8 April 2021), 2012. a
MEASNET: Evaluation of Site Specific Wind Conditions, Tech. Rep., Measurement Network of Wind Energy Institutes, Madrid, Spain, available at: http://www.measnet.com/wpcontent/uploads/2016/05/Measnet_SiteAssessment_V2.0.pdf (last access: 25 October 2019), 2016. a, b, c, d
NCAR: WRF Model User’s Page, WRF Version 4.0.1, https://doi.org/10.5065/D6MK6B4K, 2021. a
Olauson, J.: ERA5: The new champion of wind power modelling?, Renew. Energ., 126, 322–331, https://doi.org/10.1016/j.renene.2018.03.056, 2018. a
Pappas, C., Papalexiou, S., and Koutsoyiannis, D.: A quick gap filling of missing hydrometeorological data, J. Geophys. Res.Atmos., 119, 9290–9300, https://doi.org/10.1127/metz/2018/0908, 2014. a
Peña, A., Gryning, S.E., and Floors, R.: Lidar observations of marine boundarylayer winds and heights: a preliminary study, Meteorol. Z., 24, 581–589, https://doi.org/10.1127/metz/2015/0636, 2015. a
Poveda, J. M., Wouters, D., and Nederland, S.: Wind measurements at meteorological mast IJmuiden, Tech. Rep., ECN – Energy Center of the Netherlands, Petten, the Netherlands, available at: https://publicaties.ecn.nl/PdfFetch.aspx?nr=ECNE14058 (last access: 25 October 2019), 2015. a
Rohrig, K., Berkhout, V., Callies, D., Durstewitz, M., Faulstich, S., Hahn, B., Jung, M., Pauscher, L., Seibel, A., Shan, M., Siefert, M., Steffen, J., Collmann, M., Czichon, S., Dörenkämper, M., Gottschall, J., Lange, B., Ruhle, A., Sayer, F., Stoevesandt, B., and Wenske, J.: Powering the 21st century by wind energy–Options, facts, figures, Appl. Phys. Rev., 6, 031 303, https://doi.org/10.1063/1.5089877, 2019. a, b
Skamarock, W., Klemp, J., Dudhia, J., Gill, D., Liu, Z., Berner, J., Wang, W., Powers, J., Duda, M. G., Barker, D., and Huang, X.Y.: A description of the advanced research WRF version 3, Technical Report, 162 pages NCAR/TN556+ STR, NCAR – National Center for Atmospheric Research, Boulder, CO, USA, https://doi.org/10.5065/1dfh6p97, 2019. a, b
Thøgersen, M., Svenningsen, L., and Sørensen, T.: ERA5 – The (Not So) Long Term Reference Wind Data – years 2010–2016, available at: http://www.emd.dk/files/windpro/20170829_ERA5_WindPRO_ReleaseNote.pdf (last access: 8 April 2021), 2017. a
van Bebber, W. J.: Die Zugstrassen der barometrischen Minima, Meteorol. Z., 8, 361–366, 1891. a
Wessel, P. and Smith, W. H. F.: A global, selfconsistent, hierarchical, highresolution shoreline database, J. Geophys. Res.Sol. Ea., 101, 8741–8743, https://doi.org/10.1029/96JB00104, 1996. a, b
WRF Users Page: WRF Model Physics Options and References, available at: https://www2.mmm.ucar.edu/wrf/users/physics/phys_references.html (last access: 8 April 2021), 2020. a
https://www.windopzee.net/en/locations/meteomastijmuidenmmij/ (last access: 8 April 2021).
https://www.fino2.de/en/ (last access: 8 April 2021).
https://www.fino3.de/en/ (last access: 8 April 2021).
https://github.com/newawind/Mesoscale/tree/master/postproc (last access: 13 July 2020).