The correct representation of wind speeds at hub height (e.g.,

In the second step, a statistical correction is performed using a neural network, as well as a generalized linear model based on different variables of the reanalysis. Although only a few measurements by masts or lidars are available at hub height, an improvement of the wind speed in the root-mean-squared error of almost 30 % can be achieved. A final comparison with radiosonde observations confirms the added value of combining the physical and statistical approaches in post-processing the wind speed.

The expansion of wind energy power production is expected to further continue in the context of the ongoing transition towards renewable energies. In order to assess the potential of new sites for wind turbines, reliable estimates of past wind speeds and their variability, i.e., high-quality spatiotemporal climatologies, are needed at hub heights (around

Several studies show that some reanalysis data sets have a good fit to verifying mast or lidar observations at hub heights

Post-processing of wind speed is commonly applied to numerical weather prediction (NWP) but almost exclusively for the

Another method to improve the horizontal and vertical resolution of wind speed from existing data is to implement a diagnostic mass-consistent wind model

In this study, we combine a diagnostic wind model and statistical post-processing to improve the representation of wind speeds at 100 m above ground despite the low measurement density. Based on the COSMO-REA6 reanalysis

Does the introduction of the diagnostic wind model represent an added value?

Can we perform a profitable statistical post-processing despite the heterogeneity of the domain and the few measurement sites?

The remainder of the paper is structured as follows. In the following section, we first provide an overview of the observation sites used, as well as the COSMO-REA6 regional reanalysis. Then, we describe the wind model and the statistical post-processing utilizing artificial neural networks in Sect.

Our study is based on a data set of wind profile measurements of the lower boundary layer over Germany and the North and Baltic seas. Long-term observations of lower-boundary-layer wind speeds in Germany are only freely available at four measuring masts over land and three platforms on the sea. The land-based masts are located in Hamburg (HAM;

All measurements are well distributed over the domain (Fig.

Elevation in the study domain (colors) with observation sites (red dots) and radiosondes (blue dots). Dashed lines indicate subdomains of the diagnostic wind model.

Overview of mast and lidar observations.

Another source of observation data in the height range of wind turbines can be obtained from vertical soundings. The German Meteorological Service (DWD) operates 11 regular radiosondes as shown in Fig.

Overview of radiosonde observations. The last four columns show the number of observations between 2014 and 2018 at synoptic main times.

In addition to the observations, our wind speed post-processing relies on gridded estimates of the atmospheric state in the form of the regional reanalysis COSMO-REA6 developed in the context of the Hans Ertel Centre for Weather Research

COSMO-REA6 and wind model variables used in the statistical models with 2, 5, 18 and 21 predictors.

High-resolution terrain data are freely available through NASA's Shuttle Radar Topography Mission (SRTM). We use the gap-filled version of the SRTM data provided by

COSMO-REA6's horizontal resolution of approximately

Based on a variational approach

To solve Eq. (

As our focus is on Germany and adjacent regions, we first extract a subdomain of

Consequently, the matrix

To model different degrees of atmospheric stability, we choose

While the downscaled wind fields might be better in line with the orography, the data still have inherent uncertainties (e.g., fit of the COSMO-REA6 input to the orography, errors in COSMO-REA6, assumptions in the wind model) and thus may still deviate considerably from the truth, i.e., verifying observations. In order to correct the output of the diagnostic wind model, we apply a simple artificial neural network (ANN) to its output. The ANN consists of an input layer, two dense hidden layers with 50 nodes and a linear activated output layer. For the input and both hidden layers we use the rectified linear activation function. The number of nodes in the input layer varies with the number of input variables. The input variables are scaled in order to set a mean of 0 and a standard deviation of 1 for all parameters. As the target variable we choose the deviation between the observed and COSMO-REA6 estimates of wind speed. The error of COSMO-REA6 should be more normally distributed than the wind speed itself, which allows us to use the mean squared error as the loss function. The optimizer is Adam with a learning rate of 0.001 and a batch size of 256. While we also tried various other configurations for the ANN, e.g., with respect to the number of layers and nodes, as well as the different batch sizes, we found the differences in results to be only marginal. Therefore, we here focus on the ANN settings described above, while results for the other configurations are provided in the appendix. For comparison to standard post-processing methods, we also run a generalized linear model (GLM).

We first look at the potential benefit of applying a diagnostic wind model to the reanalysis output. As an example, Fig.

Snapshot of COSMO-REA6 wind speed (colors) and direction (arrows) on 21 February 2015 at 12:00 UTC in western Germany

When we interpolate the COSMO-REA6 wind field onto the high-resolution grid and then run the diagnostic wind model, the differences in horizontal wind speed in this example are up to

The spatial pattern of the wind field is similar for all three configurations of the wind model. In the hilly terrain west and east of the Rhine Valley we see an increase in wind speeds compared to the reanalysis, while in the valley the wind speed is reduced. East of the Siebengebirge, i.e., downstream, the wind speed is also lower. In the lowlands, the adjustments are negligible.

Analyzing the wind direction, two interesting features are observed for the stable case (

Next, we evaluate the quality of the wind field from the diagnostic wind model with measurements. We employ the standard metric root-mean-squared error (RMSE), which is defined as the sum of the squared wind speed difference in the model, i.e., COSMO-REA6 (

Figure

Diurnal cycle of the improvement in RMSE (PI

Figure

The plot shows the change in RMSE compared to COSMO-REA6 for all 12 observation sites with positive values indicating an improvement over the reanalysis. Light green, green and dark green boxplots show the improvement against the COSMO-REA6 from the diagnostic wind model with

It can be seen that the RMSE for the three diagnostic wind models is close to that of COSMO-REA6. The GLMs and ANNs lead to a significant reduction in RMSE at all sites regardless of the number of input variables. For the offshore stations (FI1, FI2, F3) the improvement is at least 5 %, while over land the values reach from about 10 % for flat terrain (LIN) up to 30 % for hilly terrain (BW2). Further, the RMSE reduction becomes more pronounced for the GLMs and ANNs, as the number of input variables increases, with the ANNs mostly outperforming the GLMs. It should be noted that the addition of the three wind speed estimates from the diagnostic wind model leads to a significant improvement especially for hilly terrain (e.g., at sites BW1 and BW3), while the effect is smaller at offshore or flat terrain locations (e.g., BW4). Overall, the post-processing, especially with ANNs, seems to be capable of achieving a better representation of wind speed compared to COSMO-REA6 regardless of the location.

While the previous post-processing approach is station-specific, it is desirable that such a procedure would be applicable to any random location. Therefore, we now apply a cross-validation approach; i.e., we train the GLMs and ANNs on 11 of the 12 locations and use the measurements from the omitted site as validation (50 %) and test data set (50 %). Thus, the estimated models are evaluated on data from a location not included in the training data.

The effects on the RMSE performance compared to COSMO-REA6 are presented in Fig.

As Fig.

So far, we have estimated 12 different models by splitting the training and testing data set depending on the observation site. Our final model includes training data from all 12 sites. To prevent the model from being trained primarily on locations with the most data (due to the different lengths of the time series), the training data cover 2953 time steps for each location, i.e., 75 % of the shortest time series. These data are randomly sampled from the complete time series at each location. In total, we obtain a training (validation) data set with 35 436 (8844) time steps.

To evaluate the results, we use observations from radiosondes at 11 sites in Germany. Please note that the radiosonde data have been assimilated into COSMO-REA6 and are only available at certain time steps during the day (see Table

Improvement of RMSE (in %) of the ANNs trained over all 12 stations compared to COSMO-REA6 with

Figure

Difference of mean wind speed in 2017 between ANN_021 and COSMO-REA6. Purple (green) colors indicate a decrease (increase) in post-processed wind speeds compared to COSMO-REA6.

The aim of this study is to enhance the representation of wind speed estimates from reanalysis data around common wind turbine hub heights. By employing a diagnostic wind model to the reanalysis data and using it as additional predictors in a statistical post-processing approach, we are able to provide a better estimator for wind speed at 100 m above ground compared to the COSMO-REA6 regional reanalysis.

We find that the diagnostic wind model alone does not constitute a meaningful improvement on the reanalysis, since it does not take into account the actual stability of the atmosphere but rather corrects wind speeds using three constant vertical atmospheric stability configurations. The added value of the diagnostic wind model only becomes apparent in combination with the employment of a statistical post-processing approach which combines information from the diagnostic wind model with parameter estimates from the COSMO-REA6 reanalysis (vertical temperature gradient being one of these parameters). We test a generalized linear model, as well as different complex neural networks, as the statistical modeling framework. In almost all cases, the neural network outperforms the generalized linear model presumably due to the neural network's ability to include more complex and nonlinear interactions between the input parameters.

Further, we have adopted two different types of statistical post-processing models for the wind speed. Specifically, (1) we estimate a separate model for each site, trained on data from the same location only, and (2) we train a model on all other 11 sites and then evaluate it at the current site (which is unknown to the model). Both approaches lead to a significant improvement in wind speed estimates. However, the former approach provides better results, as local characteristics can only be represented if training data from this location are used. In order to provide estimates at arbitrary locations where no observations are present, approach (1) is not applicable.

With the encouraging results of the statistical post-processing approach (2), we estimate our final model using data from all 12 observation sites. The estimates are evaluated against radiosonde ascents at 11 locations in Germany. This model yields considerable improvements at most locations (about 8 % reduction of RMSE on average), especially when considering that the radiosonde data are already included in the COSMO-REA6 reanalysis. Thus, the combined additional information from the diagnostic wind model and the statistical post-processing is able to further improve the reanalysis even at locations where COSMO-REA6 is expected to be close to the true state.

As these results are very promising, we now plan to explore the expansion of the current setup to also estimate wind speeds at height levels above 100 m. Further, we expect that more improvement might be gained by additional tuning of the statistical model, by adding more variables from the reanalysis as predictors and through more observational data including longer time series. Additional improvement could also be achieved by a more complex diagnostic wind model with more vertical levels and stability parameters.

Nevertheless, our study shows that by combining a physics-motivated approach (i.e., the diagnostic wind model) and a statistical post-processing method (e.g., using artificial intelligence), the process can be performed at low cost compared to running expensive higher-resolution numerical models. Therefore, the method and derived data sets represent a valuable tool especially for the wind energy sector, e.g., for yield forecasting or site assessment.

RMSE improvements against COSMO-REA6 for all 12 models with five repetitions grouped by

Figure

Selected parameter of the regional reanalysis COSMO-REA6 (

SB prepared the data, designed the methodology and carried out the analysis under the supervision of JDK. SB prepared the manuscript. SB and JDK reviewed it throughout.

The contact author has declared that neither of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been conducted in the framework of the mFund program FAIR funded by the German Federal Ministry for Transportation and Digital Infrastructure. The authors want to thank Nicole Ritzhaupt for the support regarding the diagnostic wind model and BayWa r.e. GmbH (

This research has been supported by the Bundesministerium für Verkehr und Digitale Infrastruktur (grant no. 19F2103C).

This paper was edited by Rebecca Barthelmie and reviewed by Michael Mifsud and one anonymous referee.