Articles | Volume 7, issue 5
Wind Energ. Sci., 7, 1905–1918, 2022
Wind Energ. Sci., 7, 1905–1918, 2022
Research article
16 Sep 2022
Research article | 16 Sep 2022

Statistical post-processing of reanalysis wind speeds at hub heights using a diagnostic wind model and neural networks

Statistical post-processing of reanalysis wind speeds at hub heights using a diagnostic wind model and neural networks
Sebastian Brune and Jan D. Keller Sebastian Brune and Jan D. Keller
  • Research and Development, Deutscher Wetterdienst, Offenbach, Germany

Correspondence: Sebastian Brune (


The correct representation of wind speeds at hub height (e.g., 100 m above ground) is becoming more and more important with respect to the expansion of renewable energy. In this study, a post-processing of the wind speed of the regional reanalysis COSMO-REA6 in Central Europe is performed based on a combined physical and statistical approach. The physical basis is provided by downscaling wind speeds with the help of a diagnostic wind model, which reduces the horizontal grid point spacing by a factor of 8 compared to COSMO-REA6 and considers different vertical atmospheric stabilities.

In the second step, a statistical correction is performed using a neural network, as well as a generalized linear model based on different variables of the reanalysis. Although only a few measurements by masts or lidars are available at hub height, an improvement of the wind speed in the root-mean-squared error of almost 30 % can be achieved. A final comparison with radiosonde observations confirms the added value of combining the physical and statistical approaches in post-processing the wind speed.

1 Introduction

The expansion of wind energy power production is expected to further continue in the context of the ongoing transition towards renewable energies. In order to assess the potential of new sites for wind turbines, reliable estimates of past wind speeds and their variability, i.e., high-quality spatiotemporal climatologies, are needed at hub heights (around 100 m above ground, Rohrig et al.2019). However, deriving a locally meaningful climatology from observations is difficult, as (a) wind speeds have a strong spatial variability and depend on a lot of local characteristics, (b) only a few long-term measurements exist in Europe around 100 m above ground, and (c) extrapolating hub height wind-speeds from the more abundant 10 m wind measurements is prone to errors. In this respect, reanalyses provide physically consistent estimates of the atmospheric dynamics over long periods (i.e., decades). Thus, reanalyses represent a valuable option for assessing wind turbine sites. For this purpose, regional reanalyses might be better suited, as they usually use finer horizontal grids which are essential in the description of local effects such as channeling or exposure. Nevertheless, even in such data sets with a horizontal grid spacing of 5–10 km, small-scale flows are not always well captured.

Several studies show that some reanalysis data sets have a good fit to verifying mast or lidar observations at hub heights (Frank et al.2020b; Brune et al.2021), although larger deviations may occur depending on the location. Further, the underlying physical models may have systematic errors; e.g., low-level jets are not well represented in the 6 km regional reanalysis COSMO-REA6 (Heppelmann et al.2017). Therefore, improvements on reanalysis data can be made through statistical post-processing.

Post-processing of wind speed is commonly applied to numerical weather prediction (NWP) but almost exclusively for the 10 m wind, which is generally well represented in reanalyses (Kaiser-Weiss et al.2015). Due to the dense measurement network for 10 m wind speed, local effects, as well as synoptic characteristics, can be detected and corrected (Jung and Schindler2019). With regard to the wind speed at hub heights of wind turbines, atmospheric stability and turbulent mixing also play an important role. Brahimi (2019) shows that statistical post-processing of daily wind speeds at hub height using artificial intelligence can lead to better wind speed estimates.

Another method to improve the horizontal and vertical resolution of wind speed from existing data is to implement a diagnostic mass-consistent wind model (Dickerson1978; Sherman1978; Ratto et al.1994; Homicz2002). The advantage of this physical approach is that it is able to better describe the effects of orography on the wind field for a given vertical stability compared to the coarser representation of a NWP model or a reanalysis.

In this study, we combine a diagnostic wind model and statistical post-processing to improve the representation of wind speeds at 100 m above ground despite the low measurement density. Based on the COSMO-REA6 reanalysis (Bollmeyer et al.2015) we consider a Central European domain, which includes various different levels of complexity in terrain, e.g., ocean, flatlands, mid-mountain ranges and alpine mountains. Specifically, we aim to answer the following questions.

  • Does the introduction of the diagnostic wind model represent an added value?

  • Can we perform a profitable statistical post-processing despite the heterogeneity of the domain and the few measurement sites?

The remainder of the paper is structured as follows. In the following section, we first provide an overview of the observation sites used, as well as the COSMO-REA6 regional reanalysis. Then, we describe the wind model and the statistical post-processing utilizing artificial neural networks in Sect. 3. Our results section begins with an analysis of the effects of the wind model, followed by the results of the statistical post-processing. We conclude this study with a brief summary and outlook.

2 Data

2.1 Mast and lidar data

Our study is based on a data set of wind profile measurements of the lower boundary layer over Germany and the North and Baltic seas. Long-term observations of lower-boundary-layer wind speeds in Germany are only freely available at four measuring masts over land and three platforms on the sea. The land-based masts are located in Hamburg (HAM; Brümmer et al.2012)1, Lindenberg (LIN; Beyrich2009)2, Karlsruhe (KAR; Kohler et al.2018)3 and Jülich (JUL; Löhnert et al.2015; SAMD2021)4, providing data for several decades at heights of up to 280 m (Table 1). For the North and Baltic seas, we use the FINO5 observations (FI1, FI2, FI3) provided by the German Federal Maritime and Hydrographic Agency (Bundesamt für Seeschifffahrt und Hydrographie2021). All three offshore masts capture the complete observation period from 2014 to 2018. The third part of our data set consists of five shorter time series (6 to 12 months) performed by lidars (BW1…BW4) and one meteorological mast (BW5) courtesy of the company BayWa r.e. GmbH. These data are exclusively shared with us within the FAIR project (Frank et al.2020a).

All measurements are well distributed over the domain (Fig. 1) and represent conditions with offshore (FI1, FI2, FI3), flat terrain (HAM, BW4, LIN) and complex hilly (BW1, BW2, BW3, BW5, KAR, JUL) characteristics. The temporal resolution of all measurements is 10 min. Additional details on the measurements are provided in Table 1.

Figure 1Elevation in the study domain (colors) with observation sites (red dots) and radiosondes (blue dots). Dashed lines indicate subdomains of the diagnostic wind model.

Table 1Overview of mast and lidar observations.

Download Print Version | Download XLSX

2.2 Radiosondes

Another source of observation data in the height range of wind turbines can be obtained from vertical soundings. The German Meteorological Service (DWD) operates 11 regular radiosondes as shown in Fig. 1 and Table 2. All observations cover the complete period between 2014 and 2018, however, at a much coarser temporal resolution. The radiosondes in Bergen, Idar-Oberstein, Kuemmersbruck and Lindenberg start four times per day at synoptic main times 00:00, 06:00, 12:00 and 18:00 UTC, while observations at other locations arise only twice per day. Note that most radiosondes start approximately 75 min before the synoptic main times and that the height of 100 m above surface is already reached after approximately 30 s. Thus, we compare the sounding observations with the closest hourly time step of the reanalysis data.

Table 2Overview of radiosonde observations. The last four columns show the number of observations between 2014 and 2018 at synoptic main times.

Download Print Version | Download XLSX


In addition to the observations, our wind speed post-processing relies on gridded estimates of the atmospheric state in the form of the regional reanalysis COSMO-REA6 developed in the context of the Hans Ertel Centre for Weather Research (Simmer et al.2016). COSMO-REA6 covers Europe at a horizontal grid spacing of 6.2 km. The vertical structure is described by a height-based terrain-following coordinate with grid spacing of a few decameters in the lower atmosphere (Bollmeyer et al.2015). The six lowest levels of 3D data such as temperature, humidity or wind components, as well as 2D data, are provided through DWD's open data portal (Deutscher Wetterdienst/Hans-Ertel Centre for Weather Research2021). The hourly output files are available between 1 January 1995 and 31 August 2019. Besides both horizontal wind components, we use a set of 16 output variables (Table 3), as well as the derived vertical temperature gradient within the lowest 100 m.

Table 3COSMO-REA6 and wind model variables used in the statistical models with 2, 5, 18 and 21 predictors.

Download Print Version | Download XLSX

2.4 Digital elevation data

High-resolution terrain data are freely available through NASA's Shuttle Radar Topography Mission (SRTM). We use the gap-filled version of the SRTM data provided by Jarvis et al. (2008) with a resolution of approximately 90 m.

3 Methods

3.1 Downscaling of COSMO-REA6 wind speed

COSMO-REA6's horizontal resolution of approximately 6 km is too low to sufficiently represent orographic effects on the wind field. Therefore, we use a diagnostic mass-consistent wind model which is described in the following.

3.1.1 Theoretical background of diagnostic wind modeling

Based on a variational approach (Sasaki1958, 1970a, b) the wind model minimizes the variance (kinetic energy) of the difference between the 3D initial wind field v0=u0ix+v0iy+w0iz and the adjusted wind field v=uix+viy+wiz over the volume V as

(1) V 1 2 ( v - v 0 ) 2 ρ d V = ! min .

u,v,w and u0,v0,w0 are the components of the 3D adjusted and initial wind field in zonal direction ix, meridional direction iy and vertical direction iz, respectively. The air density ρ is treated as constant in the lower atmosphere, and the divergence of the adjusted wind field v should be zero:

(2) v = x i x + y i y + z i z v = 0 .

If we introduce a Lagrange multiplier λ=λ(x,y,z) in Eq. (1) under the strong constraint of mass conservation, the following cost function J has to be minimized:

(3) J ( u , v , w ; λ ) = 1 2 V u - u 0 2 σ u 2 + v - v 0 2 σ v 2 + w - w 0 - h x u - u 0 - h y v - v 0 2 σ w 2 d V + V λ u x + v y + w z d V = ! min .

The terms hx(uu0) and hy(vv0) result from the coordinate transformation into a system with a terrain-following vertical coordinate. hx and hy are the first derivatives of the topography in x and y direction, respectively. The weights σu-2,σv-2 and σw-2 are known as Gaussian precision moduli and describe the ratio between the adjustments of the three wind velocity components for the whole domain. Since horizontal wind speeds are generally at least an order of magnitude higher, it is assumed in the literature that σu-2=σv-2σw-2 (e.g., Dickerson1978; Sherman1978; Bhumralkar et al.1980; Endlich et al.1982; Guo and Palutikof1990; Wang et al.2005). The ratio α=σw/σu determines whether the adjustments are predominantly in the vertical direction (α≫1) or in the horizontal direction (α≪1). In an unstable atmosphere, air motions tend to be vertical, while under stable conditions, adjustments occur predominantly in the horizontal wind field. There are many approaches to determine the exact value of α, e.g., using the Froude number (Moussiopoulos et al.1988; Ross et al.1988) or determining the ratio of w and u wind (Sherman1978; Kitada et al.1983; Davis et al.1984; Mathur and Peters1990).

To solve Eq. (3), the first variation of J must be zero. This results in a set of three Euler–Lagrange equations, which can be written as

(4) v - v 0 = A - 1 λ ,


(5) A - 1 = σ u 2 0 h x σ u 2 0 σ v 2 h y σ v 2 h x σ u 2 h y σ v 2 h x 2 σ u 2 + σ w 2 + h y 2 σ v 2 .

Applying to Eq. (4) leads to the following Poisson equation for λ:

(6) - v 0 = A - 1 λ = M λ .

Equation (6) is discretized by using centered differences with lateral-flow-through boundary conditions (Dirichlet) and no-flow-through boundary conditions at the surface (Neumann conditions). The discretized matrix M=A-1 contains only entries on the main diagonal and some sub-diagonals, depending on the discrete number of horizontal and vertical grid points. A sparse solver can be used to calculate λ and finally the adjusted wind speed v using Eq. (4):

(7) v = v 0 + A - 1 λ .

Thus, the main task is to compute λ from matrix M, whose dimension is rapidly increasing with the number of horizontal and vertical grid points. Because M depends only on the Gaussian precision moduli and the topography, the matrix is constant in time, and its inverse has to be computed via a sparse factorization once at the beginning. Afterwards the factorized form is used to calculated the adjusted wind field for all time steps.

3.1.2 Wind model configuration

As our focus is on Germany and adjacent regions, we first extract a subdomain of 130×170 grid points from the COSMO-REA6 data set. The wind model then uses the same domain albeit at a resolution increased by factor of 8, resulting in a target grid of 1041×1361 grid points. In the vertical, our wind model uses 11 terrain-following levels (70, 100, 130, 160, 190, 220, 250, 350, 500, 700 and 1000 m above the surface). Since the COSMO-REA6 boundary layer winds are strongly influenced by the model orography at the lower two levels (about 10 and 35 m above surface), we set the lowest layer in our diagnostic wind model at 70 m, which is slightly above the third lowest layer in COSMO-REA6. The COSMO-REA6 wind field is interpolated first vertically and then horizontally to obtain the initial wind field for the wind model.

Consequently, the matrix M would have a dimension of 15 584 811×15 584 811, which is too big to handle for the available computing systems. Therefore, we divide the domain into 12 subdomains, each with 401×401×11 grid points (see Fig. 1), which results in a matrix M of size 1 768 811×1 768 811 for each subdomain. The outer 81 points of the subdomains are considered to be the border area. In the transition area between two subdomains, blending of the u and v component is performed; i.e., the influence of the subdomain decreases linearly until the end of the border area. If a border area lies at the edge of the domain, it is truncated so that the final domain has a size of 879×1199×11 grid points.

To model different degrees of atmospheric stability, we choose σu=σv=1 and let σw vary. After some testing, we settled on three settings, specifically σw=0.0001 (stable atmosphere, mainly horizontal flow), σw=0.1000 (relatively neutral atmosphere, similar strong horizontal and vertical flow), and σw=5.0000 (unstable atmosphere, mainly vertical flow), which is in line with the configuration of Guo and Palutikof (1990).

3.2 Statistical modeling using machine learning

While the downscaled wind fields might be better in line with the orography, the data still have inherent uncertainties (e.g., fit of the COSMO-REA6 input to the orography, errors in COSMO-REA6, assumptions in the wind model) and thus may still deviate considerably from the truth, i.e., verifying observations. In order to correct the output of the diagnostic wind model, we apply a simple artificial neural network (ANN) to its output. The ANN consists of an input layer, two dense hidden layers with 50 nodes and a linear activated output layer. For the input and both hidden layers we use the rectified linear activation function. The number of nodes in the input layer varies with the number of input variables. The input variables are scaled in order to set a mean of 0 and a standard deviation of 1 for all parameters. As the target variable we choose the deviation between the observed and COSMO-REA6 estimates of wind speed. The error of COSMO-REA6 should be more normally distributed than the wind speed itself, which allows us to use the mean squared error as the loss function. The optimizer is Adam with a learning rate of 0.001 and a batch size of 256. While we also tried various other configurations for the ANN, e.g., with respect to the number of layers and nodes, as well as the different batch sizes, we found the differences in results to be only marginal. Therefore, we here focus on the ANN settings described above, while results for the other configurations are provided in the appendix. For comparison to standard post-processing methods, we also run a generalized linear model (GLM).

4 Results

4.1 Diagnostic wind model

We first look at the potential benefit of applying a diagnostic wind model to the reanalysis output. As an example, Fig. 2 shows the wind representation around the city of Bonn in western Germany at noon on 21 February 2015 for COSMO-REA6 (a) and the corrections achieved by the wind model for the three different stability settings (d–f). The plots show a region of 3×3 COSMO-REA6 grid points (about 19 km ×19 km). Both COSMO-REA6 horizontal wind components are first linearly interpolated vertically to 100 m above ground and then interpolated from the edges of the grid box to the center. COSMO-REA6 shows uniform wind speeds around 6 m s−1 from west-northwest directions over the entire region. The underlying orography in the regional reanalysis (Fig. 2b) indicates a comparatively flat terrain, while the more complex actual terrain structure around Bonn is described by the high-resolution orography of the diagnostic wind model (Fig. 2c). In the northern parts and along the Middle Rhine Valley, which extends from southeast to northwest, the elevation is about 50 to 60 m above sea level. To the west and east of the valley lie the foothills of the Eifel and Siebengebirge mountains, respectively. The highest elevation in this region is the Ölberg at 460 m, which is represented in the wind model at about 410 m, while the corresponding pixel in COSMO-REA6 has only a height of about 200 m.

Figure 2Snapshot of COSMO-REA6 wind speed (colors) and direction (arrows) on 21 February 2015 at 12:00 UTC in western Germany (a). Both wind components are vertically and horizontally interpolated from the native grid to 100 m above surface and the grid box centers. Representation of the topography in COSMO-REA6 (b), the diagnostic wind model (c) and in SRTM (d). (e–g) Difference of the wind field from the diagnostic wind model with σw values of 0.0001 (e), 0.1000 (f) and 5.0000 (g) to COSMO-REA6. Red (blue) colors indicate higher (lower) wind speeds in the wind model compared to COSMO-REA6. Arrows show the differences between the wind components in the wind model and COSMO-REA6. The reference vector (top right) represents a difference of 0.5 m s−1. The grey contour lines represent the topography in the diagnostic wind model.


When we interpolate the COSMO-REA6 wind field onto the high-resolution grid and then run the diagnostic wind model, the differences in horizontal wind speed in this example are up to ±0.5 m s−1 at 100 m height (Fig. 2d–f) depending on the stability setting. This is close to 10 % of the COSMO-REA6 wind speed input. The adjustments in the horizontal wind field are strongest for σw=0.0001 and decrease with increasing σw. This is consistent with the expectation, since the adjustments in the wind field for small σw are almost exclusively horizontal, while for large σw vertical exchange between model layers is possible.

The spatial pattern of the wind field is similar for all three configurations of the wind model. In the hilly terrain west and east of the Rhine Valley we see an increase in wind speeds compared to the reanalysis, while in the valley the wind speed is reduced. East of the Siebengebirge, i.e., downstream, the wind speed is also lower. In the lowlands, the adjustments are negligible.

Analyzing the wind direction, two interesting features are observed for the stable case (σw=0.0001). First, there is a flow around the north and south of the Ölberg, which may be superimposed by channeling effects in the southeastern part. Second, the adjustments of the wind field follow the small valley which runs from the lower left corner of the region into the Rhine valley. Both effects can also be found for the case of the relatively neutral boundary layer (σw=0.1000) but are absent in the unstable boundary layer (σw=5.0000). This indicates that the diagnostic wind model can provide added value for hilly terrain.

Next, we evaluate the quality of the wind field from the diagnostic wind model with measurements. We employ the standard metric root-mean-squared error (RMSE), which is defined as the sum of the squared wind speed difference in the model, i.e., COSMO-REA6 (c) or diagnostic wind model (w), and the observations (o):

(8) RMSE c = 1 N i = 1 N ( c i - o i ) 2 , RMSE w = 1 N i = 1 N ( w i - o i ) 2 .

N indicates the number of all wind speed measurements. The percentage improvement PIw of each wind model w against COSMO-REA6 is then given by

(9) PI w = 100 1 - RMSE w RMSE c .

A smaller RMSE in the wind model compared to COSMO-REA6 leads to PIw>0, which indicates an improvement in the diagnostic wind model.

Figure 3 shows the improvement by the wind model with the three configurations for a consistently stable (σw=0.0001), neutral (σw=0.1000) and unstable (σw=5.0000) atmosphere against COSMO-REA6. At the offshore observation sites (FI1, FI2, FI3) and in the lowlands (BW4, HAM, LIN), the wind speeds from the wind model mostly agree with the COSMO-REA6, since only a few adjustments are made by the model due to the relatively flat terrain. Larger differences in RMSE between COSMO-REA6 and the wind model can be observed for hilly terrain (BW1, BW2, BW3, BW5, KAR, JUL). With higher instability in the wind model, i.e., increasing σw, the differences in the horizontal wind field are reduced, since the compensating motions are mainly made in the vertical. Thus, the largest differences between wind model and COSMO-REA6 occur for σw=0.0001, where the response of the flow is mainly horizontal. An improvement in RMSE is achieved especially with stable and neutral configurations between 21:00 and 06:00 UTC. This could be an indication that the wind model is able to at least partly correct for the well-known underestimation of nocturnal low-level jets in COSMO-REA6. During the day, COSMO-REA6 exhibits a better performance compared to the diagnostic wind model, especially for the stable and neutral configurations. While COSMO-REA6 performs better than the wind model in about 60 % of the cases, improvement can still be found 40 % of the time. In order to make use of the additional information, a statistical post-processing is performed using COSMO-REA6 and the outcome of the diagnostic wind model configurations as input.

Figure 3Diurnal cycle of the improvement in RMSE (PIw) of the diagnostic wind model for (a) stable configuration (σw=0.0001), (b) neutral configuration (σw=0.1000) and (c) unstable configuration (σw=5.0000) against COSMO-REA6. Positive (negative) values indicate better (worse) performance in terms of RMSE of the diagnostic wind model.


4.2 Statistical post-processing of wind speeds at individual locations

Figure 4 shows the enhancement of the post-processing on the RMSE for the diagnostic wind model with the three different stability indices, four GLMs and four ANNs with 2, 5, 18 and 21 input variables at all 12 observation sites. Here, the models are estimated separately for each site. For this purpose, the complete measurement series is randomly divided into 60 % training, 20 % validation and 20 % test. Our results do not depend on the training–validation–test splitting, as we found in analogous experiments with 70 %–15 %–15 % (not shown). The splitting and estimation of the models is repeated five times to also quantify the uncertainty of the models (indicated with the box plot).

Figure 4The plot shows the change in RMSE compared to COSMO-REA6 for all 12 observation sites with positive values indicating an improvement over the reanalysis. Light green, green and dark green boxplots show the improvement against the COSMO-REA6 from the diagnostic wind model with σw=0.0001, σw=0.1000 and σw=5.000. Yellow and purple boxplots indicate the improvement for the GLMs and ANNs, respectively. Each boxplot represents five estimated models obtained by randomly splitting the data set into training, validation and testing. The numbers on the x axis (2, 5, 18, 21) show the number of input variables for each model. Positive percentages represent an improvement regarding the RMSE against COSMO-REA6. Numbers inside the panels show the sample sizes used for training, testing and validation at each observation site.


It can be seen that the RMSE for the three diagnostic wind models is close to that of COSMO-REA6. The GLMs and ANNs lead to a significant reduction in RMSE at all sites regardless of the number of input variables. For the offshore stations (FI1, FI2, F3) the improvement is at least 5 %, while over land the values reach from about 10 % for flat terrain (LIN) up to 30 % for hilly terrain (BW2). Further, the RMSE reduction becomes more pronounced for the GLMs and ANNs, as the number of input variables increases, with the ANNs mostly outperforming the GLMs. It should be noted that the addition of the three wind speed estimates from the diagnostic wind model leads to a significant improvement especially for hilly terrain (e.g., at sites BW1 and BW3), while the effect is smaller at offshore or flat terrain locations (e.g., BW4). Overall, the post-processing, especially with ANNs, seems to be capable of achieving a better representation of wind speed compared to COSMO-REA6 regardless of the location.

4.3 Statistical post-processing of wind speeds over all locations

While the previous post-processing approach is station-specific, it is desirable that such a procedure would be applicable to any random location. Therefore, we now apply a cross-validation approach; i.e., we train the GLMs and ANNs on 11 of the 12 locations and use the measurements from the omitted site as validation (50 %) and test data set (50 %). Thus, the estimated models are evaluated on data from a location not included in the training data.

The effects on the RMSE performance compared to COSMO-REA6 are presented in Fig. 5. Naturally, the improvements are smaller in comparison to the site-specific post-processing, as the local characteristics are not included in the cross-validation approach. In this setting, there are now more distinct differences between the performances of the GLMs and ANNs. For many stations, the GLMs mostly achieve only a small improvement or even lead to a degradation of the quality of the estimates (e.g., FI3). In contrast, the ANNs consistently provide better representations of the wind speed compared to COSMO-REA6, as well as the GLMs. Especially the ANNs with 18 or 21 predictors achieve an improvement of at least 10 % (FI1, FI2, FI3) up to about 20 % (e.g., BW2, BW4, JUL). The ANNs with five predictors are almost always performing better than those with two predictors, indicating the importance of the inclusion of the diagnostic wind model output. However, the 18-predictor version (without the diagnostic wind model data) is outperforming the 21-predictor model at almost half of the observation sites. In conclusion, the diagnostic wind model can add valuable information to the post-processing when only a wind speed and vertical temperature gradient are used as predictors. However, it seems that the lack of additional information from the diagnostic wind model could be compensated for by using a wider set of input variables from COSMO-REA6.

Figure 5As Fig. 4 but now with the training data set of 11 observation sites and the test and validation data set of the site left out.


4.4 Verification with radiosondes

So far, we have estimated 12 different models by splitting the training and testing data set depending on the observation site. Our final model includes training data from all 12 sites. To prevent the model from being trained primarily on locations with the most data (due to the different lengths of the time series), the training data cover 2953 time steps for each location, i.e., 75 % of the shortest time series. These data are randomly sampled from the complete time series at each location. In total, we obtain a training (validation) data set with 35 436 (8844) time steps.

To evaluate the results, we use observations from radiosondes at 11 sites in Germany. Please note that the radiosonde data have been assimilated into COSMO-REA6 and are only available at certain time steps during the day (see Table 2). Figure 6 shows that the post-processing leads to improvements in terms of RMSE at almost all locations and times, regardless of the number of input variables. While for flat terrain the improvements are smaller, for hilly terrain the skill of the post-processed estimates improves considerably with the number of variables in part due to the added value of the diagnostic wind model. The model including 21 variables performs particularly well at Essen (01303, almost 20 % improvement at night) and Lindenberg (03015, 10 %–15 %, depending on the time of day). For the latter, it should be noted that one of the mast locations used to train the ANNs is in proximity to the radiosonde launch site. The ANNs seem to have slight difficulties during nighttime for the island of Norderney (03631, −8 %) and in Oberschleissheim (03715, −3 %). Both are possibly due to the location of the observation site directly on the North Sea coast and in the mountains, respectively. Apart from this, the most complex model represents an improvement of an approximately 8 % lower RMSE over all locations and times compared to the COSMO-REA6 reanalysis. Considering that the radiosonde ascents are already assimilated in COSMO-REA6 and the reanalysis is therefore believed to perform best at these locations, the results of the post-processing are very encouraging especially with respect to a performance at locations other then the measurement sites.

Figure 6Improvement of RMSE (in %) of the ANNs trained over all 12 stations compared to COSMO-REA6 with (a) 2, (b) 5, (c) 18 and (d) 21 input variables. Green (red) colors indicate an improvement (degradation).


Figure 7 shows the difference of mean wind speed in 2017 for the best post-processing model including 21 variables compared to COSMO-REA6. The corrections by ANN_021 result in increased wind speeds over the Alps of more than 1.0 m s−1 on an annual average. The situation is similar for mid-range mountain peaks in Germany, where the corrections are also positive but somewhat smaller at 0.6 to 0.9 m s−1. This is related to the fact that the small-scale structures of the orography can be better represented by the considerably higher resolution of the wind model. In the northern German lowlands, the mean wind speed is only about 0.3 m s−1 below the reanalysis, while the deviations on the North Sea and Baltic Sea coasts are up to −1.0 m s−1. Since the measurement locations in this study are either offshore (FINO stations) or far inland (all other stations), specific phenomena such as land–sea wind circulation can not be trained by the neural network. Therefore, uncertainties might be quite large in this area, and it may not be possible for the neural network to correctly represent the flow directly along the coast.

Figure 7Difference of mean wind speed in 2017 between ANN_021 and COSMO-REA6. Purple (green) colors indicate a decrease (increase) in post-processed wind speeds compared to COSMO-REA6.

5 Conclusions

The aim of this study is to enhance the representation of wind speed estimates from reanalysis data around common wind turbine hub heights. By employing a diagnostic wind model to the reanalysis data and using it as additional predictors in a statistical post-processing approach, we are able to provide a better estimator for wind speed at 100 m above ground compared to the COSMO-REA6 regional reanalysis.

We find that the diagnostic wind model alone does not constitute a meaningful improvement on the reanalysis, since it does not take into account the actual stability of the atmosphere but rather corrects wind speeds using three constant vertical atmospheric stability configurations. The added value of the diagnostic wind model only becomes apparent in combination with the employment of a statistical post-processing approach which combines information from the diagnostic wind model with parameter estimates from the COSMO-REA6 reanalysis (vertical temperature gradient being one of these parameters). We test a generalized linear model, as well as different complex neural networks, as the statistical modeling framework. In almost all cases, the neural network outperforms the generalized linear model presumably due to the neural network's ability to include more complex and nonlinear interactions between the input parameters.

Further, we have adopted two different types of statistical post-processing models for the wind speed. Specifically, (1) we estimate a separate model for each site, trained on data from the same location only, and (2) we train a model on all other 11 sites and then evaluate it at the current site (which is unknown to the model). Both approaches lead to a significant improvement in wind speed estimates. However, the former approach provides better results, as local characteristics can only be represented if training data from this location are used. In order to provide estimates at arbitrary locations where no observations are present, approach (1) is not applicable.

With the encouraging results of the statistical post-processing approach (2), we estimate our final model using data from all 12 observation sites. The estimates are evaluated against radiosonde ascents at 11 locations in Germany. This model yields considerable improvements at most locations (about 8 % reduction of RMSE on average), especially when considering that the radiosonde data are already included in the COSMO-REA6 reanalysis. Thus, the combined additional information from the diagnostic wind model and the statistical post-processing is able to further improve the reanalysis even at locations where COSMO-REA6 is expected to be close to the true state.

As these results are very promising, we now plan to explore the expansion of the current setup to also estimate wind speeds at height levels above 100 m. Further, we expect that more improvement might be gained by additional tuning of the statistical model, by adding more variables from the reanalysis as predictors and through more observational data including longer time series. Additional improvement could also be achieved by a more complex diagnostic wind model with more vertical levels and stability parameters.

Nevertheless, our study shows that by combining a physics-motivated approach (i.e., the diagnostic wind model) and a statistical post-processing method (e.g., using artificial intelligence), the process can be performed at low cost compared to running expensive higher-resolution numerical models. Therefore, the method and derived data sets represent a valuable tool especially for the wind energy sector, e.g., for yield forecasting or site assessment.

Appendix A: Comparison of ANN configurations

Figure A1RMSE improvements against COSMO-REA6 for all 12 models with five repetitions grouped by (a) number of hidden layer, (b) units per layer, (c) input variables, (d) training epochs and (e) batch size.


Figure A1 shows the RMSE improvement compared to COSMO-REA6 for all tested configurations grouped by the number of hidden layer, units per hidden layer, number of input variables, training epochs and batch size for all stations. Increasing the number of hidden layers has no significant effect. The number of units per layer should be 25 or even 50, batch size 500 or even lower, and the number of training epochs should be at least 50. However, the strongest improvement is achieved by adding more variables, so the exact structure of the neural network is not crucial in the end.

A1 List of abbreviations

Data availability

Selected parameter of the regional reanalysis COSMO-REA6 (, Deutscher Wetterdienst/Hans-Ertel Centre for Weather Research2021) and radiosonde data (, Deutscher Wetterdienst2021) are freely available via DWD's Climate Data Center. Observations of the FINO masts are provided by the German Federal Maritime and Hydrographic Agency (, Bundesamt für Seeschifffahrt und Hydrographie2021). Mast observations from Jülich are available within the SAMD archive (, SAMD2021). Terrain data used in this study are online available (, Jarvis et al.2008).

Author contributions

SB prepared the data, designed the methodology and carried out the analysis under the supervision of JDK. SB prepared the manuscript. SB and JDK reviewed it throughout.

Competing interests

The contact author has declared that neither of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


This work has been conducted in the framework of the mFund program FAIR funded by the German Federal Ministry for Transportation and Digital Infrastructure. The authors want to thank Nicole Ritzhaupt for the support regarding the diagnostic wind model and BayWa r.e. GmbH (, last access: 22 November 2021) for the generous provision of their data.

Financial support

This research has been supported by the Bundesministerium für Verkehr und Digitale Infrastruktur (grant no. 19F2103C).

Review statement

This paper was edited by Rebecca Barthelmie and reviewed by Michael Mifsud and one anonymous referee.


Beyrich, F.: The Lindenberg reference site data set metadata information, National Center for Atmospheric Research, Boulder, Colorado, USA, (last access: 22 November 2021), 2009. a

Bhumralkar, C. M., Mancuso, R. L., Ludwig, F. L., and Renné, D. S.: A practical and economic method for estimating wind characteristics at potential wind energy conversion sites, Sol. Energy, 25, 55–65, 1980. a

Bollmeyer, C., Keller, J. D., Ohlwein, C., Wahl, S., Crewell, S., Friederichs, P., Hense, A., Keune, J., Kneifel, S., Pscheidt, I., Redl, S., and Steinke, S.: Towards a high-resolution regional reanalysis for the European CORDEX domain, Q. J. Roy. Meteor. Soc., 141, 1–15,, 2015. a, b

Brahimi, T.: Using Artificial Intelligence to Predict Wind Speed for Energy Application in Saudi Arabia, Energies, 12, 4669,, 2019. a

Brümmer, B., Lange, I., and Konow, H.: Atmospheric boundary layer measurements at the 280 m high Hamburg weather mast 1995–2011: mean annual and diurnal cycles, Meteorol. Z., 21, 319–335,, 2012. a

Brune, S., Keller, J. D., and Wahl, S.: Evaluation of wind speed estimates in reanalyses for wind energy applications, Adv. Sci. Res., 18, 115–126,, 2021. a

Bundesamt für Seeschifffahrt und Hydrographie: FINO-Datenbank, BSH [data set],, last access: 8 October 2021. a, b

Davis, C., Bunker, S., and Mutschlecner, J.: Atmospheric transport models for complex terrain, J. Clim. Appl. Meteorol., 23, 235–238, 1984. a

Deutscher Wetterdienst: High resolution radiosonde data, DWD – Climate Data Center [data set],, last access: 22 November 2021. a

Deutscher Wetterdienst/Hans-Ertel Centre for Weather Research: COSMO-REA6 regional reanalysis, DWD/HErZ – Climate Data Center/Hans-Ertel Centre for Weather Research [data set],, last access: 10 October 2021. a, b

Dickerson, M. H.: MASCON – A mass consistent atmospheric flux model for regions with complex terrain, J. Appl. Meteorol. Clim., 17, 241–253, 1978. a, b

Endlich, R., Ludwig, F., Bhumralkar, C., and Estoque, M.: A diagnostic model for estimating winds at potential sites for wind turbines, J. Appl. Meteorol. Clim., 21, 1441–1454, 1982. a

Frank, C. W., Kaspar, F., Keller, J. D., Adams, T., Felkers, M., Fischer, B., Handte, M., Marrón, P. J., Paulsen, H., Neteler, M., Schiewe, J., Schuchert, M., Nickel, C., Wacker, R., and Figura, R.: FAIR: a project to realize a user-friendly exchange of open weather data, Adv. Sci. Res., 17, 183–190,, 2020a. a

Frank, C. W., Pospichal, B., Wahl, S., Keller, J. D., Hense, A., and Crewell, S.: The added value of high resolution regional reanalyses for wind power applications, Renew. Energ., 148, 1094–1109,, 2020b. a

Guo, X. and Palutikof, J.: A study of two mass-consistent models: problems and possible solutions, Bound.-Lay. Meteorol., 53, 303–332, 1990. a, b

Heppelmann, T., Steiner, A., and Vogt, S.: Application of numerical weather prediction in wind power forecasting: Assessment of the diurnal cycle, Meteorol. Z., 26, 319–331,, 2017. a

Homicz, G. F.: Three-dimensional wind field modeling: a review, Sandia National Laboratories, SAND Report, 2597, (last access: 8 October 2021), 2002. a

Jarvis, A., Reuter, H., Nelson, A., and Guevara, E.: Hole-filled seamless SRTM data V4, International Centre for Tropical Agriculture (CIAT) [data set], (last access: 8 December 2021), 2008. a, b

Jung, C. and Schindler, D.: Wind speed distribution selection – A review of recent development and progress, Renew. Sust. Energ. Rev., 114, 109290,, 2019. a

Kaiser-Weiss, A. K., Kaspar, F., Heene, V., Borsche, M., Tan, D. G. H., Poli, P., Obregon, A., and Gregow, H.: Comparison of regional and global reanalysis near-surface winds with station observations over Germany, Adv. Sci. Res., 12, 187–198,, 2015. a

Kitada, T., Kaki, A., Ueda, H., and Peters, L. K.: Estimation of vertical air motion from limited horizontal wind data – a numerical experiment, Atmos. Environ., 17, 2181–2192, 1983. a

Kohler, M., Metzger, J., and Kalthoff, N.: Trends in temperature and wind speed from 40 years of observations at a 200-m high meteorological tower in Southwest Germany, Int. J. Climatol., 38, 23–34,, 2018. a

Löhnert, U., Schween, J., Acquistapace, C., Ebell, K., Maahn, M., Barrera-Verdejo, M., Hirsikko, A., Bohn, B., Knaps, A., O’connor, E., Simmer, C., Wahner, A., and Crewell, S.: JOYCE: Jülich observatory for cloud evolution, B. Am. Meteorol. Soc., 96, 1157–1174, 2015. a

Mathur, R. and Peters, L. K.: Adjustment of wind fields for application in air pollution modeling, Atmos. Environ., 24, 1095–1106, 1990. a

Moussiopoulos, N., Flassak, T., and Knittel, G.: A refined diagnostic wind model, Environ. Softw., 3, 85–94, 1988. a

Ratto, C., Festa, R., Romeo, C., Frumento, O., and Galluzzi, M.: Mass-consistent models for wind fields over complex terrain: the state of the art, Environ. Softw., 9, 247–268, 1994. a

Rohrig, K., Berkhout, V., Callies, D., Durstewitz, M., Faulstich, S., Hahn, B., Jung, M., Pauscher, L., Seibel, A., Shan, M., Siefert, M., Steffen, J., Collmann, M., Czichon, S., Dörenkämper, M., Gottschall, J., Lange, B., Ruhle, A., Sayer, F., Stoevesandt, B., and Wenske, J.: Powering the 21st century by wind energy—Options, facts, figures, Appl. Phys. Rev., 6, 031303,, 2019. a

Ross, D., Smith, I. N., Manins, P., and Fox, D.: Diagnostic wind field modeling for complex terrain: model development and testing, J. Appl. Meteorol., 27, 785–796, 1988. a

SAMD: HD(CP)2 long term observations, data of Meteorological tower data (no. 00), by Supersite JOYCE, data version 00, Research Center Juelich, Institute for Energy and Climate research (IEK-8) [data set],, last access: 22 November 2021. a, b

Sasaki, Y.: An objective analysis based on the variational method, J. Meteorol. Soc. Jpn., 36, 77–88, 1958. a

Sasaki, Y.: Some basic formalisms in numerical variational analysis, Mon. Weather Rev., 98, 875–883, 1970a. a

Sasaki, Y.: Numerical variational analysis formulated under the constraints as determined by longwave equations and a low-pass filter, Mon. Weather Rev., 98, 884–898, 1970b. a

Sherman, C. A.: A mass-consistent model for wind fields over complex terrain, J. Appl. Meteorol. Clim., 17, 312–319, 1978.  a, b, c

Simmer, C., Adrian, G., Jones, S., Wirth, V., Göber, M., Hohenegger, C., Janjic´, T., Keller, J., Ohlwein, C., Seifert, A., Trömel, S., Ulbrich, T., Wapler, K., Weissmann, M., Keller, J., Masbou, M., Meilinger, S., Riß, N., Schomburg, A., Vormann, A., and Weingärtner, C.: HErZ: The German Hans-Ertel Centre for Weather Research, B. Am. Meteorol. Soc., 97, 1057–1068,, 2016. a

Wang, Y., Williamson, C., Garvey, D., Chang, S., and Cogan, J.: Application of a multigrid method to a mass-consistent diagnostic wind model, J. Appl. Meteorol. Clim., 44, 1078–1089, 2005. a

Short summary
A post-processing of the wind speed of the regional reanalysis COSMO-REA6 in Central Europe is performed based on a combined physical and statistical approach. The physical basis is provided by downscaling wind speeds with the help of a diagnostic wind model, which reduces the horizontal grid point spacing by a factor of 8. The statistical correction using a neural network based on different variables of the reanalysis leads to an improvement of 30 % in RMSE compared to COSMO-REA6.