Uncertainty quantification of long-term modeled wind speed is essential to ensure stakeholders can best leverage wind resource numerical data sets. Offshore, this need is even stronger given the limited availability of observations of wind speed at heights relevant for wind energy purposes and the resulting heavier relative weight of numerical data sets for wind energy planning and operational projects. In this analysis, we consider the National Renewable Energy Laboratory's 21-year updated numerical offshore data set for the US East Coast and provide a methodological framework to leverage both floating lidar and near-surface buoy observations in the region to quantify uncertainty in the modeled hub-height wind resource. We first show how using a numerical ensemble to quantify the uncertainty in modeled wind speed is insufficient to fully capture the model deviation from real-world observations. Next, we train and validate a random forest to vertically extrapolate near-surface wind speed to hub height using the available short-term lidar data sets in the region. We then apply this model to vertically extrapolate the long-term near-surface buoy wind speed observations to hub height so that they can be directly compared to the long-term numerical data set. We find that the mean 21-year uncertainty in 140

This work was authored in part by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the US Department of Energy (DOE) under contract no. DE-AC36-08GO28308. Funding was provided by the US Department of Energy Office of Energy Efficiency and Renewable Energy Wind Energy Technologies Office. Support for the work was also provided by the National Offshore Wind Research and Development Consortium under agreement no. CRD-19-16351. The views expressed in the article do not necessarily represent the views of the DOE or the US Government. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for US Government purposes.

The offshore wind energy industry has been growing at an unprecedented pace worldwide

Given the high stakes at play connected to the planned future growth of offshore wind energy, it is essential that data sets such as NREL's quantify
and communicate the uncertainty that comes with the modeled wind resource. In fact, previous studies showed how even a small uncertainty change in the
modeled mean wind speed translates into an almost double uncertainty for the long-term prediction of the annual energy production of a wind plant

A somewhat conventional approach to quantify uncertainty from NWP models is to consider the variability of the quantity of interest – in our case
wind speed – across a number of numerical ensemble members, which are different realizations of the numerical model obtained by tweaking the
numerical model setup. Many different setup choices can affect the wind speed predicted by an NWP model: which planetary boundary layer (PBL) scheme
to adapt in the simulations

However, quantifying only the uncertainty connected to the possible choices in model setup presents several limitations. In fact, the magnitude of the
uncertainty that can be quantified from the NWP ensemble variability is strictly connected to the limited number of choices sampled within the
considered model setups. NWP model ensembles tend to lead to an underdispersive behavior

It is important to remember that, while observational data sets are essential for a proper quantification of numerical uncertainty, they also come
with their own uncertainty, which should therefore be considered in the overall numerical uncertainty quantification. Observational uncertainty stems
from a variety of factors

In our analysis, we present and validate a novel methodology to assess long-term (in our case, 21 years) uncertainty quantification for modeled wind
speed in the mid-Atlantic region of the United States by leveraging available observations of offshore wind. While in our analysis we focus on the
US mid-Atlantic domain, the methodology could be applied in other offshore regions as well. In Sect.

Key attributes of the 21-year WRF simulations used in this study.

We use NREL's WRF-modeled long-term wind speed data in the mid-Atlantic region

An ideal uncertainty quantification over the 21-year extent of our offshore wind resource numerical data set would require concurrent 21-year time series of observed winds at a height relevant for wind energy purposes and at as many locations as possible within the modeled domain. In reality, such extensive observations do not exist. We therefore consider two sets of observations and apply a machine-learning-based approach to leverage the advantages of each. On one hand, we use lidar observations in the region, which provide measurements at hub height but only over a handful of months. On the other hand, we consider observations from National Data Buoy Center (NDBC) buoys, which are available over much longer time periods but only provide observations close to the sea surface.

Map of the observational data sets used in the analysis. Lidar locations are shown as diamonds, NDBC buoys are shown as dots. Wind lease areas are shown in white and wind planning areas in gray. The distance between the two NYSERDA lidars is about 75

We consider four sets of lidar measurements taken from three lidars in the region (Fig.

Some of the considered floating lidar platforms were not operational for part of their overall deployment period. Figure

Data availability chart for the four lidar data sets. We kept only hourly time stamps for which we could calculate hourly average values for all the variables considered in this analysis, as detailed in the text.

Mean 140

List of NDBC buoys used in this analysis.

Finally, we consider long-term near-surface observations from eight buoys managed by the NDBC (locations in Fig.

To be able to leverage the long-term time series of the NDBC buoys for an uncertainty quantification that is relevant to offshore wind energy
purposes, the buoy observations need to be vertically extrapolated to a height of interest for commercial wind energy development. Several techniques
exist to vertically extrapolate wind speeds. Traditional approaches include using a power law relationship

We use a random forest machine learning model, a robust ensemble regression algorithm that has been successfully applied to similar applications. In
this work, we use the

near-surface wind speed;

near-surface wind direction;

To preserve the cyclical nature of this variable, we calculate and include as inputs its sine
and cosine. We note that both sine and cosine are needed to identify a specific value of the cyclical variable, because each value of sine only (or cosine only) is linked to two different values of the cyclical variable. For example, the sine of wind direction is 0 for both 90 and
270

air temperature;

sea surface temperature;

difference between air temperature and SST;

time of day;

month.

Algorithm hyperparameters considered for the random forest, their sampled values in the cross validation, and chosen value in the final version of the model used in Sect.

We use a 5-fold cross validation, where we build the testing set using a consecutive 20 % of the observations from each calendar month in the
period of record to ensure that the learning algorithm can be (trained and) tested on a set of data that captures the seasonal variability at each
site well. Also, we consider the hyperparameter ranges shown in Table

The chosen splitting approach in the cross validation ensures that short-term autocorrelation in the data does not artificially increase the measured skill of the algorithm (as would happen if training and testing data sets were randomly chosen without imposing a consecutive data requirement). However, potential lag correlation in the data could still play a role. Therefore, we tested whether using a single, consecutive 20 % of the data for testing leads to significantly different results in terms of model accuracy. We tested this on the two NYSERDA lidars (because they both span a period of record longer than 1 year and therefore can still capture a full seasonal cycle in their training phase even when a single 20 % of the data is kept aside for testing) and found no significant difference in the model performance.

As detailed in Sects.

If the standard deviation of the biases is smaller than the typical single-site uncertainty (i.e., the average of each site's standard deviation of the residuals), then the latter is a good measure of the model uncertainty.

If the standard deviation of the biases exceeds the typical single-site uncertainty, then the model uncertainty is dominated by the unpredictable bias and can be estimated from the standard deviation of the biases itself.

Finally, when estimating model uncertainty from measurements, it is important to remember that the measurements themselves have an uncertainty. In our
case, we need to consider both the actual instrumental uncertainty (

As detailed in the following sections, we use constant values across the whole considered region for both the instrumental
uncertainty

Before diving deep into the uncertainty quantification using the approach outlined in the previous section, we are interested in confirming the
limitations of using an ensemble-derived uncertainty as a way to fully capture an NWP model uncertainty, as discussed in the introduction. To do so,
we run a 1-year (September 2019 to August 2020) WRF ensemble across the mid-Atlantic region and calculate the (temporal) mean of the modeled
140

For this exercise, we consider 16 ensemble members, obtained by considering all the possible combinations of setups resulting from the following four
variations.

Table

Comparison between ensemble-derived uncertainty and total model uncertainty in 140

Given the inappropriate uncertainty quantification resulting from a numerical ensemble, we are now ready to start working on our machine learning vertical extrapolation approach to be able to apply our proposed pipeline for a broader uncertainty quantification. For the long-term uncertainty quantification, the random forest algorithm needs to be applied at each buoy location to derive a long-term time series of extrapolated winds, which will be compared to the WRF-modeled wind resource. However, before doing so, the regression model first needs to be trained at the floating lidar sites so that it can learn how to model hub-height wind speed in the region from a set of near-surface data. Also, the generalization skill of the model needs to be quantified, as a proper uncertainty quantification needs to also account for the uncertainty of the approach used to obtain the observation-based long-term time series of hub-height winds at each buoy location.

First, we verify that the learning algorithm we have chosen does not overfit the data. To do so, we compare the training and testing RMSE at the four
lidar data sets in Table

Comparison of training and testing RMSE (in modeling hourly average wind speed at 140

Testing root-mean-square error (in

We validate the machine learning extrapolation model using a “round-robin” approach. In fact, it is neither fair nor practically useful to assess
the skill of the regression algorithm when it is trained and tested at the same lidar location, as that is not our actual application of the
model. Instead, one should assess the performance of the extrapolation approach when the random forest is trained at one lidar and then used to
extrapolate wind speed at a different lidar, where the model has no prior knowledge (or, better yet, limited prior knowledge since the training site
is still in the vicinity) of the wind conditions at the site. Figure

Overall, we find that the random forest provides accurate results, with RMSE always equal to or lower than 1.51

Scatter plot of observed and machine-learning-predicted 140

When interpreting these results, it is important to also consider the correlation existing between the considered data sets. In fact, the

The application of the random forest model also allows for a quantification of the relative importance of the various input variables used to feed the
model. Table

Predictor importance for the random forest used to extrapolate winds at 140

The fact that only two of the variables used as inputs to the random forest have a relatively large importance introduces the question of whether a
simpler algorithm could be used to vertically extrapolate wind speeds. We test this aspect by considering two additional algorithm setups (for
simplicity, we only consider the same-site approach) and summarize our results in Table

First, we test whether a comparable model accuracy can be achieved when considering a random forest that uses only the two most important
variables (near-surface wind speed and difference between air temperature and SST) as inputs. We use the same range of hyperparameter and
cross-validation setup used in the main analysis. We find that while the model does not overfit the data in a significant way, it has lower skills,
with testing RMSE values between 0.15–0.40

Next, we test whether using a simpler regression algorithm with the whole set of input variables considered in the original random forest can
lead to a comparable skill. We consider a multivariate linear regression and use a ridge algorithm

Therefore, we conclude that the random forest model considered in the main analysis is an appropriate choice given the complexity of the task of wind
speed vertical extrapolation, despite the limited number of variables showing large values of relative importance (likely due to some correlation
effects, as described above). Finally, we note that several constraints have been applied to the complexity of the random forest used in the main
analysis, in terms of the hyperparameters listed in Table

Comparison of training and testing RMSE (in modeling hourly average wind speed at 140

After properly validating and assessing the generalization skills of the machine-learning-based vertical extrapolation model by leveraging the
short-term lidar data, we can now apply it to extrapolate the long-term observations collected by the NDBC buoys. To do so, we train a random forest
using all the lidar data sets combined to optimize the amount of training data for the model (the hyperparameters selected for this final model are
listed in the leftmost column in Table

Scatter plot of 21-year WRF-modeled and machine-learning-predicted 140

We finally compute the modeled wind speed uncertainty, following the steps detailed in Sect.

Twenty-year model bias and model uncertainty in 140

We find that at all but one buoy, the total uncertainty in modeled 140

Finally, we focus on the variability of the quantified uncertainty and segregate results by time of day (09:00–16:00 LT (local time) vs. 21:00–04:00 LT), season (June, July, August vs. December, January, February), wind direction (180–270

Box plot showing how the modeled 140

The National Renewable Energy Laboratory has released a state-of-the-art 21-year wind resource assessment product for all the offshore regions in the
United States. Because of its numerical nature, this data set has inherent uncertainty, the quantification of which is of primary importance for
stakeholders aiming to use this data set to contribute to offshore wind energy growth. In our analysis, we have shown the limits of quantifying model
uncertainty in terms of the variability of a model ensemble, which in our case captured only roughly half of the total model uncertainty. Instead, we
recommend leveraging observations to fully capture NWP model uncertainty. In the absence of long-term observed wind speeds at hub height, we have
proposed a methodological pipeline to vertically extrapolate near-surface winds from long-term buoy observations using machine learning. We adopt a
random forest model using a number of atmospheric variables measured near the surface as inputs to the regression algorithm. Our approach was well
validated across the mid-Atlantic region, and we showed that using a significantly simpler model (either in terms of the regression algorithm itself
or the number of input variables used) would significantly reduce the accuracy of the extrapolated winds. The total model uncertainty we observed in
hub-height hourly wind speed was, on average, just below 3

This analysis is one of many examples of the synergy between NWP models and observations, which points to the multiple interconnections between the two. A larger number of long-term observations are needed to both quantify and, in the long term, reduce the inherent uncertainty of numerical models. In fact, we observe that the uncertainty in the modeled data increases as we move away from the observational data sets used to train the machine learning algorithm. Having a larger number of sites with available hub-height observations covering a variety of atmospheric conditions would allow for the machine learning model to more accurately represent hub-height conditions across a wider region. In this context, the sharing of additional proprietary observational data sets should be considered, and the long-term advantages resulting from better numerical modeling should be kept in mind when assessing the overall balance between costs and benefits of such data-sharing initiatives. In the future, the choice of the learning algorithm as well as of the input variables can be explored in more detail, for example by testing a larger number of regression models than considered here. Also, a similar analysis can be performed for other offshore regions where both a long-term numerical wind resource assessment product and enough observations to assess uncertainty are available.

NREL's long-term wind resource data sets can be found at

NB: conceptualization, methodology, formal analysis, writing (original draft), visualization, supervision, and project administration. SC: formal analysis, writing (review), and editing. MO: conceptualization and funding acquisition.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank the members of the NOWRDC project advisory board, and in particular Nicolai Nygaard, for the constructive feedback that helped shape the analysis. A portion of this research was performed using computational resources sponsored by the US Department of Energy's Office of Energy Efficiency and Renewable Energy and located at the National Renewable Energy Laboratory.

This paper was edited by Andrea Hahmann and reviewed by two anonymous referees.