Like almost all measurement datasets, wind energy siting data are subject to data gaps that can for instance originate from a failure of the measurement devices or data loggers. This is in particular true for offshore wind energy sites where the harsh climate can restrict the accessibility of the measurement platform, which can also lead to much longer gaps than onshore. In this study, we investigate the impact of data gaps, in terms of a bias in the estimation of siting parameters and its mitigation by correlation and filling with mesoscale model data. Investigations are performed for three offshore sites in Europe, considering 2 years of parallel measurement data at the sites, and based on typical wind energy siting statistics. We find a mitigation of the data gaps' impact, i.e. a reduction of the observed biases, by a factor of 10 on mean wind speed, direction and Weibull scale parameter and a factor of 3 on Weibull shape parameter. With increasing gap length, the gaps' impact increases linearly for the overall measurement period while this behaviour is more complex when investigated in terms of seasons. This considerable reduction of the impact of the gaps found for the statistics of the measurement time series almost vanishes when considering long-term corrected data, for which we refer to 30 years of reanalysis data.

A wind resource assessment is performed at the beginning of every wind energy project. The wind resource is estimated for the site that is pre-selected with respect to the expected lifetime of the project, i.e. for the 20–30 years in the future during which the wind turbines will be operated at the site

A wind resource assessment is typically based on a short-term measurement on site, which is conducted several years prior to the installation of the wind farm and has a duration on the order of a year

Almost all measured time series have data gaps due to failures of the sensors themselves, a data logger or the power supply or due to adverse conditions such as a low aerosol concentration or unwanted fixed echoes for remote sensing devices

Up to a certain threshold of frequency and length of data gaps, the long-term extrapolation, which in the standard procedures involves some correlation of measured and reference data for the overlapping period, is often applied to the not fully continuous time series.

Gap filling is a task that is not specific to the wind resource assessment application but can be of relevance for any measured time series or collected dataset where data gaps may significantly impact the outcome of the following data analysis. In the most general context, procedures to compensate for missing values in a dataset are referred to as imputation. There are a number of different imputation procedures that have in common that missing data are not simply ignored but instead replaced by plausible values.
Specific gap-filling procedures for meteorological time series are discussed in

linear interpolation from adjacent time steps (particularly for cases where only a few data points are missing);

autoregressive models (for longer periods of missing data and without adjacent sites as possible predictors);

different methods of spatial interpolation (in case adjacent sites are available);

data-driven methods like nearest-neighbour approaches, linear or multiple linear regression, look-up tables, or artificial neural networks

The overall scope of the study is as follows: before discussing a selected specific data gap-filling approach, we investigate how data gaps impact the standard wind resource estimates by deriving and evaluating bias and uncertainty measures for wind time series with artificial gaps of varying length and seasonal period of occurrence. We repeat this analysis for the time series where the gaps are filled and, with this, study to which extent the impact of the gaps can be mitigated. The study is applied to the statistics of the short-term dataset, defined by the period of the measurements, as well as to the final long-term estimate, since both sets of results are relevant in the wind energy context. By deriving and comparing conclusions for three different offshore sites – in the German Bight, the Dutch North Sea and in the Baltic Sea – we also address the impact of the site and possible dependencies.

The article is structured as follows: in Sect.

The data basis consists of measurement data from met masts over a measuring time which is characterised by a high availability on the basis of which the influence of measurement gaps and their filling is investigated (Sect.

The analyses in this study are done independently for three different offshore met masts representing different typical sites for offshore wind energy utilisation in Europe. Two masts are located in the North Sea (FINO3 and IJmuiden) both about 50 km offshore from the nearest coastline with large wind direction sectors where the nearest coastline is several hundreds of kilometres upstream. The third mast (FINO2) is located in the central southern Baltic Sea and surrounded by land within 50 km or less except for a small wind direction sector. The sites were chosen to represent typical European offshore wind exploration areas with different distances to the coasts and varying atmospheric stability

Position of the three sites (met masts) investigated in the framework of this study. The red boxes mark the sizes of the innermost domains used for the mesoscale modelling (see

In Fig.

As the aim of this study is to investigate the impact of gaps on offshore wind-energy-relevant wind statistics, a reference time series with a low amount of missing data was needed. Thus, besides the selection of the 2-year period with a low number of gaps, further gaps were filled with measurement data from lower altitudes. To consider the wind speed dependence with height, a speed-up factor (sup) is defined according to

The

The

Data availability at the three masts after filling in the pre-processing step. The light colours indicate values that were filled by measurements from lower heights as described above.

Wind speed and wind direction distributions for the three datasets of measurements are shown in Fig.

Wind speed

Derived statistics for wind direction and wind speed distributions for the three datasets of site measurements.

The procedures applied for this study make use of regional mesoscale modelling data that are used for the gap filling, as well as long-term reanalysis data that are applied for long-term referencing of the wind measurements. These data sources are described separately below.

For the long-term extrapolation, the data from the ERA5 reanalysis

The mesoscale model data in this study are used for filling the gaps that are artificially cut into the time series. In principle any mesoscale model data could be used, such as those from the publicly available New European Wind Atlas (NEWA)

The simulations were carried out using the Weather Research and Forecasting (WRF) model

Mesoscale model domain distribution around the FINO3 site. The red boxes mark the extension of the computing domains. The coastline and border data originate from the GSHHS dataset

Boundary conditions for the model were prescribed by the ERA5 dataset for the atmospheric variables

Relevant parameters of the setup for the mesoscale simulations applied in this study. The references for the different schemes and models are summarised in

Table

The methods applied for our study are described in the following subsections and demonstrated on the basis of the FINO3 dataset.

The first step of the analysis consists in generating artificial data gaps in the measured time series. This is demonstrated in Fig.

Generation and filling of artificial gaps – here demonstrated on the basis of the FINO3 wind speed time series and for a gap of 30 d length starting on 30 September 2013. Original time series (only an excerpt is shown) are in black, incomplete time series with a generated gap are in red and time series with a filled gap (see procedure described in

The artificial gaps are filled based on a measure–correlate–predict (MCP) procedure and with the WRF data introduced in

For the wind speed data, we – first – bin the wind speeds every 0.5

For the wind direction data, again first the mean deviation between measured and simulated wind direction per 10

In addition to the correction factor – resulting from either the correction function or the bin-wise mean offset – a noise factor is derived as the standard deviation of the data per bin and combined with a white-noise process in the prediction step. The noise factor ensures that the generated time series does not lose its physical consistency. For the prediction of the data in the gap period, numerical (WRF) data for this period are combined with the derived corrections. The resulting time series are inserted to the incomplete measurement time series.

Figure

As already mentioned above, the procedure consisting of generating artificial gaps in the measured time series and the filling with the outlined MCP approach is repeated for varying start dates of the gap that has a pre-defined length (in the example 30 d). This is demonstrated in Fig.

This example demonstrates that the impact of the generated gap varies quite drastically depending on when the gap starts, similar patterns are observed for all four considered statistical measures and the gap-filling procedure is able to significantly reduce the impact in almost all cases. However, the performance of the gap-filling procedure also depends on the start date of the gap. This can be explained by the fact that the correlation between measured and model data found on the basis of the existing data describes the correlation for the data of the gap to varying degrees. The correlation depends to a certain extent on seasonal effects, for example. To quantify the observed variations in the wind statistics, the corresponding root-mean-square error (RMSE) values are derived. For all four statistical measures, these reduce when the gaps are filled: for mean wind direction from 3.1 to 0.3

Variation in statistical measures –

Uncertainties or standard errors in the estimation of the parameters are not further considered here and in the following as they are small compared to the reduction of the gap impact, which is the focus of this study.

In a last step, the different time series are used as the basis for a long-term extrapolation of the wind time series and statistics. Therefore, the measured time series with or without data gaps are correlated with ERA5 reanalysis data that are available for a long-term period of 30 years in this case. The underlying MCP procedure is very similar to the one applied for the data gap filling in

The overall workflow followed in the study is summarised in Fig.

Workflow followed in our study including the MCP approaches for the gap-filling and long-term extrapolation procedures.

In this section we present in detail the results of the study, expanding on the impact of gaps on the wind resource estimate with varying start dates (Sect.

Figure

Variation in statistical measures –

RMSE derived for the four statistics considered in this study for time series with gaps and gap-filled time series of wind speed and direction for the three investigated sites.

For mean wind direction, mean wind speed and Weibull scale parameter

In the next step, we have repeated this analysis for different gap lengths between 6 and 90 d. Figure

Dependency of impact of data gaps, quantified as RMSE of the four statistical measures –

In all considered cases, the impact of the gap (reflected by the derived RMSE) increases with gap length and is significantly reduced when applying the gap-filling procedure. For the mean wind speed, just like for the Weibull

Apart from these general agreements, the results for the three sites show some deviations. For instance, the impact of the data gaps on the mean wind speed are largest, when these are ignored, for the IJmuiden site but can be best compensated for. This is shown by the smallest RMSE values, compared to those for the two other sites, after gap filling. This observation may be explained by either the performance of the used numerical model for the respective site or a statistical effect that relates to the level of observed wind speeds. (Remember IJmuiden showed the highest measured mean wind speeds in the considered 2-year period.)

The impact of (ignored) gaps in the wind direction time series is highest for the FINO3 dataset. This can be understood by looking again at the wind direction distributions in Fig.

Dependency of impact of data gaps, quantified as RMSE, on the length of the data gaps and season (defined according to the gap start date) – as in Fig.

In a further step, we have studied how the impact of data gaps varies with the season in which the data gap occurs. For this, the ”season” is defined by the start date of a gap – a gap starting in the months of January to March is related to “season 1”, a gap starting between April and June to “season 2”, and so on. These seasons were selected with a shift of 1 month in comparison to the classical meteorological season definition of spring, summer, autumn and winter to consider the inertia of the heating–cooling of the sea surface that mainly drives the yearly cycle of the atmospheric stability which vice versa has an impact on the wind distribution.

Figure

In a final step, we derive the impact of ignored and filled gaps in the measurement data on a long-term extrapolated mean wind speed. For this, we followed the procedure outlined in

Variation in long-term corrected mean wind speed depending on start date of the artificial 30 d gap in short-term measurements. Results for FINO3 as solid lines, for IJmuiden as dashed lines and for FINO2 as dotted lines (again for incomplete short-term measurement time series in red and for filled time series in blue and for measurements without gap in black as reference).

The following conclusions can be drawn from Fig.

The long-term corrected mean wind speed is significantly different from the mean values of the 2 years of measurements for the three considered sites. (Mean wind speed values of the used 30-year-long ERA5 time series are 9.43, 9.91 and 9.26

The impact of data gaps in the short-term measurements is visible in the long-term estimates but is rather small with an RMSE (for ignored gaps, red curves) of 0.011

This impact of the data gaps in the short-term measurement on the long-term estimates is not really mitigated through the application of the gap-filling procedure; corresponding RMSE values (for filled gaps, blue curves) are in the same range or even slightly larger with 0.015

Also, the reference values (black curves) show some variability, which is due to the noise process as part of the MCP procedure applied for the long-term extrapolation. The RMSE values reflecting these variations are equal for all three sites with 0.003

Our study proposes a methodology that allows us to quantify the impact of data gaps in (measured) time series on wind statistics. With the three studied sites, we have considered three possible reference datasets, which could be referred to for further sites where only incomplete time series are available but no suitable reference. The reference quantification can then be used to deduce an uncertainty associated with the inherent gaps, which could be related to the RMSE value derived for the variations in the wind statistics for different gap start dates for a fixed gap length. Alternatively, for a more conservative approach, the maximum deviations in the wind statistics observed in the reference study could be considered or, in case more details are available, the identified variations for a specific season could be considered.

For our study, we have analysed the four statistical measures

In Fig.

Variation in wind power density (WPD), as an alternative statistical measure to those shown already in Fig.

In the presented investigations, we have only considered isolated single gaps in a measured time series. But the approach followed can be extended, in a straightforward way, to more complex scenarios including multiple gaps that may be more realistic or may correspond to a specific case of interest. We would then recommend the following procedure: the present scenario would first be generalised to an extent so that the available reference study case is sufficiently informative. If we want to evaluate the impact of a 20 d gap in February of a certain year, for instance, it may not be sufficient to study the impact of such a gap in the reference data from another period only for the month of February. Instead the scenario may be broadened to a 20 d gap in the winter season. For making this decision, some background knowledge of the general wind climate at the studied sites is required, which can be gained from (numerical) long-term datasets. A similar approach is recommended for the consideration of multiple gaps, for which not only the lengths of the individual gaps need to be taken into account but also their distance in time and possible correlation effects.

With carrying out the study for three different offshore sites and showing the systematic similarities and some deviations between the results, we provided a basis for the selection of suitable reference sites and datasets. Again, some knowledge of the general wind climate at a site is required to evaluate whether a certain study site is suitable or not for the estimation of an uncertainty that is then used for the evaluation of the measurements from another site. In general, however, we believe that this transfer of observations is possible and suggest using the available sites and datasets for this purpose. An extension of our study to further sites, moreover, may help to better understand how the impact of data gaps on wind statistics may vary from site to site and to take such findings into account for an even more refined estimation of the associated uncertainties.

When looking at the mitigation of the impact of data gaps in the measured time series – explicitly, with the applied gap-filling procedure and the use of an MCP procedure in connection with wind data from a numerical model – we have only applied one specific method but not further studied how the results may change with the application of other approaches. In this context, it was important for us to have a procedure that is straightforward and easy to apply for all three sites in exactly the same way. But we definitely also believe that a refinement – e.g. by using more complex approaches and possibly also some fine-tuning for the individual sites – may show an optimised performance and with this less remaining impact of the data gaps on the wind statistics after gap filling. We believe that a specific gap-filling approach should be an integral part of the wind resource assessment process that is applied by a specific consultant for a specific site as it improves the wind statistics of the measured period and can potentially also reduce the uncertainty of the long-term assessment.

It should also be pointed out that it is possibly not the optimal approach to apply the same type of MCP procedure for both the gap-filling and the long-term extrapolation steps. Depending on whether the simulation of time series (i.e. for a point prediction or filling gaps) or the simulation of a wind distribution or wind statistics is of interest, so-called type I or type II MCP methods may be the better choice

The fact that we have not optimised the MCP methods for our applications may also be the reason for the initially counter-intuitive observation that the gap-filling procedure, applied to the short-term measurements, has no positive effect on the long-term extrapolated results (see

Finally, it must also be kept in mind that the quantified uncertainty is – when looking at the complete wind resource assessment process – not the only uncertainty that is associated with the long-term extrapolation. Another substantial uncertainty component arises from the fact that the considered short-term period for which on-site measurements are available has only a limited representativeness for the long term. Some of this is compensated for by the long-term extrapolation based on a “long” dataset itself, but it needs to be considered that a derived correction function always has some dependency on the available correlation period. This dependency and related variations in the results of the estimated wind statistics constitute another part of the uncertainty associated with the long-term extrapolation, not yet taken into account, in a wind resource assessment.

As any field experiment, wind measurements that are typically carried out for site assessment studies are subject to data gaps due to failures of measurement devices or data loggers. In the harsh offshore wind climate, wind and wave conditions can lead to considerable time windows of inaccessibility of the measurement platform no matter whether floating, e.g. buoy, or mast measurement. In our study we investigated the impact of these data gaps on typical statistical measures for wind energy siting applications such as mean wind speed and direction and the Weibull shape and scale parameters. The study was performed for three offshore sites with meteorological mast measurements available between July 2012 and June 2014 in the southern North Sea (FINO3 and IJmuiden) and the southern Baltic Sea (FINO2). We proposed a gap-filling procedure that uses data from mesoscale meteorological modelling and studied the benefit of the gap filling in terms of the RMSE of the siting statistics. The study reports the following key results.

A gap of 30 d in the dataset leads to an RMSE on the mean wind speed of up to about 0.1

The gap filling with mesoscale data can considerably reduce this impact up to a factor of 3 on the Weibull shape and a factor of 10 on the three other investigated siting parameters mean wind speed, direction and Weibull scale.

The impact of the data gaps is monotonically and almost linearly increasing with the length of the data gap when considering the full year wind climate and so is the reduction of the impact of the gap filling. However, when looking at different seasons, the skill of the gap filling differs.

The key conclusions are similar for the three investigated sites, although the impact of gaps differs with the highest impact on the data from the FINO3 mast that is the only mast with a prominent impact of northwesterly winds and also the mast that is located furthest offshore.

The impact of the gaps on the long-term estimate, expressed here in terms of a 30-year wind climatology, is very small (around 0.01

Our investigation focused on three European offshore sites in the North Sea and Baltic Sea and could in future studies be evaluated for other offshore exploration areas with more different wind distributions in speed and direction. We intentionally focussed on three commonly used key wind energy siting statistics. With the tendency of a grid-load-based renumeration of wind power, an investigation of the impact of data gaps on daily cycles might be interesting for future investigations.

The mesoscale model data are available upon request, and the mesoscale model itself is open source and can be obtained from

JG performed the gap analysis and the implementation of filling and analysis procedures. MD prepared the measurement and reanalysis data and conducted the mesoscale model simulations for the gap filling. Both authors discussed the results and wrote and reviewed the manuscript.

The authors declare that they have no conflict of interest.

The simulations were performed at the HPC Cluster EDDY, located at the University of Oldenburg (Germany) and funded by BMWi (ref. no. 0324005). The study here was motivated by the results of two masters thesis projects: we acknowledge Bilke Engelbrecht and Christine Martens for their very valuable previous works. We thank BSH for providing access to the FINO2 and FINO3 data and TNO for the data of the IJmuiden met mast.

This research was partly carried out in the framework of the projects Digitale Windboje (ref. no. 03EE3024) and NEWA (ref. no. 0325832A) funded by the German Federal Ministry for Economic Affairs and Energy (BMWi) on the basis of a decision by the German Bundestag with further financial support from NEWA ERA-NET Plus, topic FP7-ENERGY.2013.10.1.2, the latter only for NEWA.

This paper was edited by Jakob Mann and reviewed by two anonymous referees.