Long-term uncertainty quantiﬁcation in WRF-modeled offshore wind resource off the US Atlantic coast

. Uncertainty quantiﬁcation of long-term modeled wind speed is essential to ensure stakeholders can best leverage wind resource numerical data sets. Offshore, this need is even stronger given the limited availability of observations of wind speed at heights relevant for wind energy purposes and the resulting heavier relative weight of numerical data sets for wind energy planning and operational projects. In this analysis, we consider the National Renewable Energy Laboratory’s 21-year updated numerical offshore data set for the US East Coast and provide a methodological framework to leverage both ﬂoating lidar and near-surface buoy observations in the region to quantify uncertainty in the modeled hub-height wind resource. We ﬁrst show how using a numerical ensemble to quantify the uncertainty in modeled wind speed is insufﬁcient to fully capture the model deviation from real-world observations. Next, we train and validate a random forest to vertically extrapolate near-surface wind speed to hub height using the available short-term lidar data sets in the region. We then apply this model to vertically extrapolate the long-term near-surface buoy wind speed observations to hub height so that they can be directly compared to the long-term numerical data set. We ﬁnd that the mean 21-year uncertainty in 140 m hourly average wind speed is slightly lower than 3 ms − 1 (roughly 30 % of the mean observed wind speed) across the considered region. Atmospheric stability is


Introduction
The offshore wind energy industry has been growing at an unprecedented pace worldwide (Musial et al., 2022).While only a single 30 MW offshore wind power plant currently exists in the United States (Deepwater Wind, 2016), many more are planned to be built in the coming years, with a target of at least 30 GW of installed capacity by 2030 (Room, 2021).With a total offshore technical resource potential thought to be about twice the current national energy demand (Musial et al., 2016), offshore wind 1 https://doi.org/10.5194/wes-2023-13Preprint.Discussion started: 17 February 2023 c Author(s) 2023.CC BY 4.0 License.energy represents a valuable clean source of energy to meet future needs.Such growth requires the existence of accurate long-term wind resource data sets to help interested stakeholders in their preconstruction energy evaluations (Brower, 2012).
Given the technical, logistical, and economical challenges in deploying instruments capable of characterizing the offshore wind resource at heights relevant for wind energy purposes, numerical weather prediction (NWP) models are often used to provide continuous (in space and time), high-resolution wind resource assessment.The National Renewable Energy Laboratory (NREL) recently released a state-of-the-art offshore wind resource assessment product based on 20-year-long simulations using the Weather Research and Forecasting (WRF) model (Skamarock et al., 2008) for all U.S. offshore waters.This updated data set is intended to replace the offshore component of the WIND Toolkit (Draxl et al., 2015).
Given the high stakes at play connected to the planned future growth of offshore wind energy, it is essential that data sets such as NREL's quantify and communicate the uncertainty that comes with the modeled wind resource.In fact, previous studies showed how even a small uncertainty change in the modeled mean wind speed translates into an almost double uncertainty for the long-term prediction of the annual energy production of a wind plant (Johnson et al., 2008;White, 2008;Holstag, 2013;Truepower, 2014), which is associated with significantly higher interest rates for new wind project financing.
A somewhat conventional approach to quantify uncertainty from NWP models is to consider the variability of the quantity of interest -in our case wind speed -across a number of numerical ensemble members, which are different realizations of the numerical model obtained by tweaking the numerical model setup.Many different setup choices can affect the wind speed predicted by an NWP model: which planetary boundary layer (PBL) scheme to adapt in the simulations (Ruiz et al., 2010;Carvalho et al., 2014a;Hahmann et al., 2015;Olsen et al., 2017), which large-scale atmospheric product to use to force the model runs (Carvalho et al., 2014b;Siuta et al., 2017), the model horizontal resolution (Hahmann et al., 2015;Olsen et al., 2017), the model spin-up time (Hahmann et al., 2015), and data assimilation techniques (Ulazia et al., 2016) are some of the main contributing factors to wind speed variability across different model runs.Running a numerical ensemble can quantify what we call the boundary condition and parametric uncertainty, and Bodini et al. (2021) showed how using machine learning approaches can reduce the temporal extent of the computationally expensive ensemble runs necessary to quantify this type of uncertainty over a long-term period.
However, quantifying only the uncertainty connected to the possible choices in model setup presents several limitations.
In fact, the magnitude of the boundary condition and parametric uncertainty that can be quantified from the NWP ensemble variability is strictly connected to the limited number of choices sampled within the considered model setups.NWP model ensembles tend to lead to an underdispersive behavior (Buizza et al., 2008;Alessandrini et al., 2013), so that only a limited component of the actual wind speed error with respect to observations can be quantified from them.The proper, full uncertainty in NWP-model-predicted wind speed can only be quantified when leveraging direct observations of the wind resource, concurrent with the modeled period.In this ideal scenario, the residuals between modeled and observed wind speed can be calculated, and the model error can be quantified both in terms of its bias (i.e., the mean of the residuals) and uncertainty (or, in simple terms that will be refined later in the paper, the standard deviation of the residuals).
In our analysis, we present a 20-year uncertainty quantification for the mid-Atlantic region of the United States.In Sect. 2 we describe the numerical and observational data sets used, and in Sect. 3 we describe the approach used to complete our https://doi.org/10.5194/wes-2023-13Preprint.Discussion started: 17 February 2023 c Author(s) 2023.CC BY 4.0 License.long-term uncertainty quantification.In Sect. 4 we dive deeper into the already mentioned topic of using numerical ensembles to quantify uncertainty and provide a demonstration of the limits of such an approach.We accurately validate our uncertainty quantification approach in Sect.5, present the main results of our long-term uncertainty quantification in Sect.6, and conclude our analysis in Sect.7.

Numerical data
We use NREL's WRF-modeled long-term wind speed data in the mid-Atlantic region (Bodini et al., 2020).The model is run from January 2000 to December 2020 using the model setup illustrated in Table 1.Multiple model setups (obtained by tweaking the reanalysis forcing, PBL scheme, sea surface temperature product, and land surface model) were considered, and the model setup described here was chosen, as it could best be validated against available lidar observations in the region (Pronk et al., 2022).The WRF simulations are run separately for each month and then concatenated into a single, 20-year time series at each location.We use a 2-day spin-up period at the beginning of each simulated month (e.g., July simulations started on 29 June) to allow the model to develop sufficiently from the initial conditions and stabilize.We apply atmospheric nudging to the outer domain every 6 hours, and find that the accuracy of simulated winds is not impacted by the length of the 1-month simulation periods (i.e., the model errors at the beginning of each month are not lower than at the end of the month, on average).

Observations
An ideal uncertainty quantification over the 20-year extent of our offshore wind resource numerical data set would require concurrent 20-year time series of observed winds at a height relevant for wind energy purposes and at as many locations as possible within the modeled domain.In reality, such extensive observations do not exist.We therefore consider two sets of observations and apply a machine-learning-based approach to leverage the advantages of each.On one hand, we use lidar observations in the region, which provide measurements at hub height but only over a handful of months.On the other hand, we consider observations from National Data Buoy Center (NDBC) buoys, which are available over much longer time periods but only provide observations close to the sea surface.

Lidar observations
We consider four sets of lidar measurements taken from three lidars in the region (Fig. 1): -The New York State Energy Research and Development Authority (NYSERDA) E05 North data set (OceanTech Services/DNV GL, 2020), collected by a ZephIR ZX300M unit, from 12 August 2019 to 19 September 2021.Most observations from the lidar and other instruments on the lidar buoy are provided as 5-minute averages, after proprietary quality checks are applied to the data.We use wind speed and wind direction, which are available at 3.1 m and then every 20 m  (Kain and Fritsch, 1993) Sea surface temperature product Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) (Donlon et al., 2012) from 20 m to 200 m above sea level, and air temperature.Sea surface temperature is provided as hourly average values.
To be consistent, we calculate hourly averages for all the variables considered in the analysis.
-The NYSERDA E06 South data set, collected by a second ZephIR ZX300M unit, from 4 September 2019 to 27 March 2022.The same data considerations listed above for the E05 instrument apply to this unit as well.For this unit, data availability statistics, as defined by the proprietary quality controls applied to the instrument, were released and show that the lidar data availability decreases with height from 83 % to 76 %, while near-surface measurements have an availability greater than 96 %.
-The Atlantic Shores consortium 06 data set, collected by a third ZephIR ZX300M unit, from 26 February 2020 to 14 Some of the considered floating lidar platforms were not operational for part of their overall deployment period.Figure 2 shows the monthly coverage for each buoy.We kept only hourly time stamps where 140 m wind speed, near-surface wind speed, near-surface wind direction, air temperature, and sea surface temperature were all available. 105

NDBC buoy observations
Finally, we consider long-term near-surface observations from eight buoys managed by the NDBC (locations in Fig. 1).At each buoy, we consider observations of air and sea surface temperatures, and wind speed and direction.Table 2 shows the heights at which each variable is recorded.One buoy (ID 44009) provides observations at slightly different heights than all the other buoys, but we determined that this minor difference would have a minimal impact on our results.Whenever available, we 110 take data from the full 20-year period that is modeled in our WRF runs.If the full 20-year period is not available, we consider observations from the start of each buoy's period of record to the end of 2020.Data are provided at 10-minute resolution for the most recent years, and 1-hour resolution for the first few years at the beginning of the century.To be consistent, we calculate 1-hour averages across the whole 20-year period.

115
To be able to leverage the long-term time series of the NDBC buoys for an uncertainty quantification that is relevant to offshore wind energy purposes, the buoy observations need to be vertically extrapolated to a height of interest for commercial wind energy development.Several techniques exist to vertically extrapolate wind speeds.Traditional approaches include using a power law relationship (Peterson and Hennessey Jr, 1978) or a logarithmic profile more firmly based on the Monin-Obukhov Similarity Theory (Monin and Obukhov (1954)).However, recent research has shown how machine-learning-based techniques outperform these conventional extrapolation approaches, both onshore (Vassallo et al., 2020;Bodini and Optis, 2020b, a) and offshore (Optis et al., 2021).

Machine learning algorithm for wind speed vertical extrapolation
We use a random forest machine learning model, a robust ensemble regression algorithm that has been successfully applied to similar applications.In this work, we use the RandomForestRegressor module in Python's Scikit-learn (Pedregosa et al., 2011).Additional details on random forests can be found in machine learning textbooks (e.g., Hastie et al. (2005)).We train the regression model to predict hourly average wind speed at 140 m.We use the following variables as inputs to the model, all as hourly averages: - We use a 5-fold cross validation, where we build the testing set using a consecutive 20 % of the observations from each calendar month in the period of record to ensure that the learning algorithm can be (trained and) tested on a set of data that captures the seasonal variability at each site well.Also, we consider the same hyperparameter ranges shown in Bodini and Optis (2020b) and sample 20 randomly selected combinations of them during the cross-validation process.The combination of hyperparameters that leads to the lowest root-mean-square error (RMSE) between the observed and random-forest-predicted 140 m wind speed is selected and used in the final model.

Uncertainty quantification
As detailed in Sects.5 and 6, we apply the random forest algorithm to vertically extrapolate wind speed up to 140 m at the location of the eight NDBC buoys.Then, to assess the uncertainty in WRF-modeled long-term wind speed at each buoy location, we first calculate the time series of the residuals between 140 m modeled winds and 140 m extrapolated winds.Then, we calculate the average and the standard deviation of each residual time series, which represent the bias and uncertainty components of the model error at each location, respectively (Fig. 3).Next, we compare the biases across all the measurement locations (in our case, the eight buoys): -If the standard deviation of the biases is smaller than the typical single-site uncertainty, then the latter is a good measure of the model uncertainty.
-If the standard deviation of the biases exceeds the typical single-site uncertainty, then the model uncertainty is dominated by the unpredictable bias and can be estimated from the standard deviation of the biases itself.
Finally, when estimating model uncertainty from measurements, it is important to remember that the measurements themselves have an uncertainty.In our case, we need to consider both the actual measurement uncertainty (σ obs ) and the uncertainty connected to the vertical extrapolation approach (σ ML ).Both these uncertainty components are passed on to the model and should be added in quadrature to the model uncertainty σ WRF estimated using the steps above, to obtain a total uncertainty quantification (JCGM 100:2008(JCGM 100: , 2008)): 4 Limits of using an ensemble-based approach for uncertainty quantification Before diving deep into the uncertainty quantification using the approach outlined in the previous section, we are interested in confirming the limitations of using the boundary condition and parametric uncertainty as a way to fully capture an NWP model uncertainty, as discussed in the Introduction.To do so, we run a 1-year (September 2019 to August 2020) WRF ensemble across the mid-Atlantic region, and calculate the (temporal) mean of the modeled 140 m wind speed standard deviation calculated across the ensemble at each time stamp at the location of the two NYSERDA lidars.These values quantify the model boundary condition and parametric uncertainty (sampled within the considered numerical ensemble, at the two lidar locations).We then compare these values with the total model uncertainty, calculated using Eq. ( 1).We compute σ WRF as the standard deviation of the 1-year time series of the residuals between 140 m wind speed from the main WRF run (i.e., the one with the setup used for the full 20-year period) and concurrent observations from the two NYSERDA lidars.We assume the uncertainty in the lidar observations σ obs to be 3 % of the reported lidar 140 m wind speed across the considered period following what was reported in the NYSERDA lidar documentation (OceanTech Services/DNV GL, 2020), and therefore equal to 0.31 m s −1 .Finally, in this case, σ ML = 0 because we are not applying any vertical extrapolation approach.We perform both calculations from hourly average time series of modeled and observed wind speed.
For this exercise, we consider 16 ensemble members, obtained by considering all the possible combinations of setups resulting from the following four variations: -Reanalysis forcing: We consider the state-of-the-art ERA5 reanalysis product developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Hersbach et al., 2020) and the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) (Gelaro et al., 2017), developed by the National Aeronautics and Space Administration (NASA).Both these reanalysis products have been widely used in applications related to wind energy and represent the most advanced reanalysis products available to date.
-Planetary boundary layer scheme: We consider the Mellor-Yamada-Nakanishi-Niino (MYNN) (Nakanishi and Niino, 2009) and the Yonsei University (YSU) (Hong et al., 2006) PBL schemes.These two models are widely considered the two most popular PBL schemes in WRF, especially when considering wind-related applications: YSU was used in the WIND Toolkit (Draxl et al., 2015), and MYNN was used in the New European Wind Atlas (Hahmann et al., 2020;Dörenkämper et al., 2020).
-Sea surface temperature product: We consider the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) data set produced by the UK Met Office (Donlon et al., 2012) and the National Center for Environmental Prediction (NCEP) Real-Time Global (RTG) SST product (Grumbine, 2020).-Land surface model (LSM): We consider the Noah LSM and the updated Noah-Multiparameterization (Noah-MP) LSM (Niu et al., 2011).
Table 3 summarizes the result of this comparison.We find that, while the boundary condition and parametric uncertainty at

Machine learning wind speed vertical extrapolation validation
Given the inappropriate uncertainty quantification resulting from a numerical ensemble, we are now ready to start working on our machine learning vertical extrapolation approach to be able to apply our proposed pipeline for a broader uncertainty quantification.For the long-term uncertainty quantification, the random forest algorithm needs to be applied at each buoy location to derive a long-term time series of extrapolated winds, which will be compared to the WRF-modeled wind resource.
However, before doing so, the regression model first needs to be trained at the floating lidar sites so that it can learn how to model hub-height wind speed in the region from a set of near-surface data.Also, the generalization skill of the model needs to be quantified, as a proper uncertainty quantification needs to also account for the uncertainty of the approach used to obtain the observation-based long-term time series of hub-height winds at each buoy location.
We validate the machine learning extrapolation model using a "round-robin" approach.In fact, it is neither fair nor practically useful to assess the skill of the regression algorithm when it is trained and tested at the same lidar location, as that is not our actual application of the model.Instead, one should assess the performance of the extrapolation approach when the random forest is trained at one lidar and then used to extrapolate wind speed at a different lidar, where the model has no prior knowledge (or, better yet, limited prior knowledge since the training site is still in the vicinity) of the wind conditions at the site.Figure 4 shows the result of such a round-robin validation; we compare the RMSE of the random forest using all possible combinations of training and testing lidar data sets.
Overall, we find that the random forest provides accurate results, with RMSE always lower than 1.5 m s −1 .Also, we see that the model generalizes well when comparing the RMSE obtained under a round-robin scenario to the RMSE values found when using the same site for training and testing; on average, we find a 12.5 % increase in RMSE compared to the same-site scenario.Notably, for the two NYSERDA lidars, which have the longest period of record, we find little to no degradation in performance when the random forest is trained at one lidar and then tested at the other one, which is more than 80 km away.
To better visualize the good performance of the extrapolation model, Fig. 5 shows an example of a scatter plot of observed and machine-learning-predicted hub-height winds when the random forest is trained at Atlantic Shores 04 and applied at Atlantic Shores 06.
Finally, the application of the random forest model also allows for a quantification of the relative importance of the various input variables used to feed the model.Figure 6 shows a chart of the feature importance at the Atlantic Shores 06 lidar site.With no surprise, we find that wind speed close to the surface is the most influential variable, followed by the difference between air temperature and sea surface temperature, which is a proxy for atmospheric stability.Similar results are observed at the other lidar sites (not shown).model, and then apply the trained model at each buoy location.We then compare the long-term extrapolated winds against the WRF-modeled data at 140 m above sea level (Fig. 7) at each NDBC buoy location.
We finally compute the modeled wind speed uncertainty, following the steps detailed in Sect.3.2.Table 4 shows bias and uncertainty in the observations (σ obs ) and of the machine learning model used to vertically extrapolate the buoy data (σ ML ).We find that at all but one buoy, the total uncertainty in modeled 140 m wind speed is slightly lower than 3 m s   4).machine learning model.Also, we note how the total uncertainty values obtained here are about 1 m s −1 higher than what was found from the short-term direct comparison between lidar observations and WRF modeled data in Sect. 4. While the impact of different lengths of analysis cannot be ruled out, this comparison shows how having access to the long-term lidar observations would be extremely beneficial in allowing a more direct quantification (leading to lower values) of the model uncertainty for long-term wind resource assessment purposes.
Finally, we focus on the variability of the quantified uncertainty and segregate results by time of day (9 a.m.-4 p.m. local time vs. 9 p.m.-4 a.m.local time), season (June, July, August vs.December, January, February), wind direction (180 • -270 • vs. 270 • -360 • , which are the two dominant wind direction regimes in the region (Pronk et al., 2022)), and atmospheric stability conditions (quantified in terms of the modeled inverse Obukhov length L −1 at 2 m above sea level, where we simply consider stable conditions for L −1 > 0 m −1 and unstable conditions for L −1 < 0 m −1 ).We summarize our results in the box plots in Fig. 8.The largest difference in modeled wind speed uncertainty is for stable conditions, which are generally more challenging to numerically model compared to unstable conditions.Pronk et al. (2022) showed that stable conditions in this region are dominant in the summer, and Bodini et al. (2019) showed that southwesterly winds are dominant in the summer months.In fact, we find a larger wind speed uncertainty for southwesterly winds and in the summer (although winter shows a significant scatter among the buoys).Finally, nighttime uncertainty is larger than daytime, although the difference between the two is limited.
The National Renewable Energy Laboratory has released a state-of-the-art 20-year wind resource assessment product for all the offshore regions in the United States.Because of its numerical nature, this data set has inherent uncertainty, the quantification of which is of primary importance for stakeholders aiming to use this data set to contribute to offshore wind energy growth.
In our analysis, we have shown the limits of quantifying model uncertainty in terms of the variability of a model ensemble, which in our case captured only roughly half of the total model uncertainty.Instead, we recommend leveraging observations to fully capture NWP model uncertainty.In the absence of long-term observed wind speeds at hub height, we have proposed a methodological pipeline to vertically extrapolate near-surface winds from long-term buoy observations using machine learning.
Our approach was well validated across the mid-Atlantic region.The total model uncertainty we observed in hub-height wind speed was, on average, just below 3 m s −1 .This number is not negligible, especially considering that wind energy is roughly related to the cube of wind speed, but several opportunities exist to reduce this uncertainty in the future.This analysis is one of many examples of the synergy between NWP models and observations, which points to the multiple interconnections between the two.A larger number of long-term observations are needed to both quantify and, in the long term, reduce the inherent uncertainty of numerical models.In this context, the sharing of proprietary observational data sets should be considered, and the long-term advantages resulting from better numerical modeling should be kept in mind when assessing the overall balance between costs and benefits of such data-sharing initiatives.In the future, a similar analysis can be performed for other offshore regions where both the 20-year numerical data set and enough observations to assess uncertainty are available.

May 2021 .Figure 1 .
Figure 1.Map of the observational data sets used in the analysis.Wind lease areas are shown in white; wind planning areas in gray.

Figure 2 .
Figure 2. Data availability chart for the four lidar data sets.Only hourly time stamps were kept for which all the variables considered in this analysis were valid.

Figure 3 .
Figure 3. Sketch showing how model bias and model uncertainty are defined in our analysis.
either lidar are lower than 1 m s −1 , the actual model uncertainty is instead closer to 2 m s −1 .This comparison clearly confirms how an NWP model's boundary condition and parametric uncertainty, which can be quantified from the variability across a numerical ensemble, can only quantify a limited component of the full model uncertainty -in our specific case for hub-height wind speed -with a relative difference of about 50 %.

Figure 4 .
Figure 4. Testing root-mean-square error in predicting hourly average wind speed at 140 m above sea level for the different lidar data sets, as a function of the data set used to train the random forest.

6
Modeled long-term wind resource uncertainty quantification After properly validating and assessing the generalization skills of the machine-learning-based vertical extrapolation model by leveraging the short-term lidar data, we can now apply it to extrapolate the long-term observations collected by the NDBC buoys.To do so, we train a random forest using all the lidar data sets combined to optimize the amount of training data for the https://doi.org/10.5194/wes-2023-13Preprint.Discussion started: 17 February 2023 c Author(s) 2023.CC BY 4.0 License.

Figure 5 .
Figure 5. Scatter plot of observed and machine-learning-predicted 140 m wind speed at the Atlantic Shores 06 lidar when the learning algorithm is trained at the Atlantic Shores 04 lidar.

Figure 6 .
Figure 6.Predictor importance for the random forest used to extrapolate winds at 140 m above sea level at lidar Atlantic Shores 06."WS" stands for wind speed, "WD" stands for wind direction.

230Figure 7 .
Figure 7. Scatter plot of 20-year WRF-modeled and machine-learning-predicted 140 m wind speed at the location of the 44025 NDBC buoy.

Figure 8 .
Figure 8. Box plot showing how the modeled 140 m wind speed uncertainty varies as a function of wind direction, time of day, season, and atmospheric stability conditions.For each buoy location, results are expressed as percent difference from the mean uncertainty values (rightmost column in Table4).

Table 1 .
Key attributes of the 20-year WRF simulations used in this study.

Table 2 .
List of NDBC buoys used in this analysis.

Table 3 .
Comparison between boundary condition and parametric uncertainty and total model uncertainty in 140 m wind speed at the locations of the two NYSERDA lidars.

Table 4 .
Twenty-year model bias and model uncertainty in 140 m wind speed at the location of the NDBC buoys considered in this study.