Generating wind power scenarios for probabilistic ramp event prediction using multivariate statistical post-processing

Wind power forecasting is gaining international significance as more regions promote policies to increase the use of renewable energy. Wind ramps, large variations in wind power production during a period of minutes to hours, challenge utilities and electrical balancing authorities. A sudden decrease in wind-energy production must be balanced by other power generators to meet energy demands, while a sharp increase in unexpected production results in excess power that may not be used in the power grid, leading to a loss of potential profits. In this study, we compare different methods to generate probabilistic ramp forecasts from the High Resolution Rapid Refresh (HRRR) numerical weather prediction model with up to 12 h of lead time at two tall-tower locations in the United States. We validate model performance using 21 months of 80 m wind speed observations from towers in Boulder, Colorado, and near the Columbia River gorge in eastern Oregon. We employ four statistical post-processing methods, three of which are not currently used in the literature for wind forecasting. These procedures correct biases in the model and generate short-term wind speed scenarios which are then converted to power scenarios. This probabilistic enhancement of HRRR point forecasts provides valuable uncertainty information of ramp events and improves the skill of predicting ramp events over the raw forecasts. We compute Brier skill scores for each method with regard to predicting upand down-ramps to determine which method provides the best prediction. We find that the Standard Schaake shuffle method yields the highest skill at predicting ramp events for these datasets, especially for up-ramp events at the Oregon site. Increased skill for ramp prediction is limited at the Boulder, CO, site using any of the multivariate methods because of the poor initial forecasts in this area of complex terrain. These statistical methods can be implemented by wind farm operators to generate a range of possible wind speed and power scenarios to aid and optimize decisions before ramp events occur. Published by Copernicus Publications on behalf of the European Academy of Wind Energy e.V. 372 R. P. Worsnop et al.: Generating wind power scenarios for probabilistic ramp event prediction


Introduction
Global wind-energy installation reached 486 GW in 2016; the total installed generation capacity in the US alone reached > 82 GW by the end of 2016 and has experienced a rapid rise since then (GWEC, 2017). Increased interest in 35 alternatives to fossil-fuel-based energy to mitigate greenhouse gas emissions as outlined in the international Paris Agreement (UNFCCC, 2015) has propelled the global wind-energy sector even further. Because of increased interest and deployment of wind energy in the US and worldwide, accurate wind speed and power forecasts are becoming increasingly important for successful power-grid operation. In particular, the prediction of specific wind situations such as power ramps is key to effective operation and control of wind farms (Kuik et al., 2016). 40 Power ramp events are challenging to forecast because these abrupt and large increases or decreases in wind speedand thus power -happen on time scales of minutes to hours making it difficult for wind farm operators and the power grid to respond. Up-ramps, or sharp increases in wind-farm power can lead to an overload of electricity generation. Sometimes the additional electricity is sold to nearby utility companies, but frequently wind farms must curtail, or stop power production if there is not enough time to make the sale. Conversely, down-ramps, or sharp decreases in power production over short time 45 periods, also have serious implications for the power grid. If power generation from the wind farm does not meet contractual expectations, then power must be generated by another source to "balance the load" and avoid brownouts and blackouts.
Additionally, the wind farm owners may have to pay costly fees for not meeting their quota.
Improving the accuracy of ramp forecasts can help avoid the situations described above. The overall effects of ramps on the grid can be reduced in several ways. The development of a geographically aggregated power grid which 50 connects many wind farms and diverse renewable sources such as solar, hydro, and nuclear power (Budischak et al., 2013) can help minimize the effects of sudden gusts and lulls of wind speed on the power grid. Additionally, optimized wind farm locations and layouts  could reduce fluctuation on the grid caused by individual wind farms. Directly improving ramp forecasts is also a viable option to reduce stress on the power grid and make wind energy even more reliable. Increased reliability may be realized in the form of decision-making. A wind farm operator may make conservative 55 estimates of how much power their wind turbines can generate during times with an elevated probability of a down-ramp event. In practice, a persistence forecast of wind speed and power generation over a 1-h or 30-min time interval is commonly used (Milligan et al., 2003). Persistence forecasts are generally reasonable on these time scales, because local weather conditions usually do not change drastically during these lengths of times except during certain weather events, such as fronts, convective outflow, etc. that often cause ramps. However, persistence forecasts are poor at predicting ramps; a ramp 60 identified in the previous 30 min to an hour can change magnitude or even sign (i.e., up-or down-ramp) in a short period and therefore lead to large forecast errors. In recent years, there has been a growing interest in information regarding the uncertainty of wind power forecasts to make energy decisions (Nielsen et al., 2006b). Typical single (i.e., point) forecasts cannot provide this necessary uncertainty information, but probabilistic forecasts can.
Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci. Discussion started: 14 February 2018 c Author(s) 2018. CC BY 4.0 License. However, the ensembles themselves are under-dispersive and lack small-scale variability in time and space so that not all possible scenarios are captured (Nielsen et al., 2006a;Bossavy et al., 2013). Others have used analogs of past forecasts based 75 on weighted atmospheric predictors to quantify forecast uncertainty (Delle Monache et al., 2013;Junk et al., 2015).
In the research to be discussed here, we will correct biases in wind speed point forecasts produced by the HRRR NWP model using univariate post-processing techniques and parametric distributions. We will then test four multivariate statistical post-processing methods to generate forecast scenarios of wind speed, representing the prediction uncertainty for a 12-h forecast horizon. We then compare the skills of the methods at predicting up-and down-ramp events. Three of the four 80 methods, (the standard Schaake Shuffle (StSS), Minimum Divergence Schaake Shuffle (MDSS), and the enhanced version of MDSS (MDSS+)) are not currently discussed within the wind-forecasting literature and are offered as new forecasting tools for short-term ramp events. The fourth method, the Gaussian copula, has been assessed for short-term wind and power forecasting, so we use this method as a benchmark of performance for the new methods. For all of our analyses, we physically compute wind power production via a turbine power curve, which relates the power that would be generated by a 85 turbine to wind speed through the turbine rotor layer as well as turbine-specific characteristics.
The wind speed observations from tall meteorological towers and forecasts from the HRRR model used in this study are discussed in Sect. 2. Up-and down-ramp events are formally defined in Sect. 3.1. Univariate post-processing of the raw HRRR forecasts is described in Sect. 3.2. The multivariate methods for generating probabilistic forecast scenarios are discussed in Sect. 3.3. In Sect. 4, we evaluate the performance of each probabilistic forecast method and the raw HRRR 90 forecasts focusing on the prediction of up-and down-ramp events. Specifically, we compare the relative frequency of upand down-ramp events produced from each forecast. We also provide Brier skill scores to compare each multivariate method and to show the performance relative to climatology. In Sect. 5, we offer concluding remarks, uses for the probabilistic methods in the wind-energy sector, and advice for operational implementation.

Wind measurements from tall meteorological towers
We use wind speed and direction measurements from two meteorological towers. The first tower is the 135-m M5 tower located south of Boulder, Colorado and ≈ 5 km east of the Colorado Front Range at the US Department of Energy's (DOE) National Wind Technology Centre (NWTC) (Clifton et al., 2013). Wind speed and direction measurements from the M5 100 tower were collected at 80 m and 87 m above ground level (a.g.l), respectively, from a cup anemometer and wind vane. The instruments were mounted on tower booms aligned at 278°, the prevailing wind direction at the NWTC based on a 15-yr climatology (Clifton et al., 2013). We remove wind speed measurements that are associated with wind directions between 75°−135° to ensure that the measurements are not contaminated from the flow passing through and around the tower or waked by a nearby wind turbine before reaching the instrument sensors. We also remove data flagged by quality-control 105 methods such as testing for constant values during a measurement interval (which indicates icing events during cold months), and checking for standard deviation values < 0.01% of the mean (which indicates instrument malfunction) among other measures described by Clifton et al. (2013) and St. Martin et al. (2016). After filtering, 81% of the data were retained.
The M5 tower data that we use are measured at a 20-Hz rate and averaged over ten minutes for the period from 31 August 2012 to 28 February 2017. 110 The second tower is an 80-m tall proprietary tower located near the Columbia River Gorge, which divides the southern boundary of Washington and the northern boundary of Oregon. Herein, we refer to this tower as the Pacific Northwest (PNW) tower. The wind speed and direction measurements are collected from a heated cup anemometer and wind vane at 79 m and 76 m a.g.l, respectively and averaged to 1 min. We perform quality-control measures on the data to remove suspect data using similar quality-control processes as for the M5 tower. We also remove unrealistic wind speed values, such 115 as negative numbers, and remove data associated with waked flow from the PNW tower or nearby turbines. After filtering, 73% of the data were retained. Data from the PNW tower were made available as part of the DOE-funded second Wind Forecast Improvement Project (WFIPII) that took place from fall 2015 to spring 2017 (A2E, 2017). We use data from this tower for all available dates between 18 March 2015 − 06 March 2017. 120

Wind forecasts from HRRR system
Deterministic forecast data are obtained from the second experimental version of NOAA's real-time, High-Resolution Rapid Refresh assimilation and model forecast system (HRRRv2). The HRRRv2 domain covers the contiguous US at 3-km horizontal resolution. HRRRv2 is updated hourly with initial conditions from the 13-km Rapid Refresh model and observations via data assimilation. Detailed model physics for HRRRv2 is discussed by Benjamin et al. (2015) performed on this period of interest which overlaps with the observation availability. For comparison of the 80-m wind speed forecasts to the tower observations, the HRRR forecast values at each tower location are from the nearest model grid cell to the tower latitude and longitude. In addition, since the HRRRv2 forecasts are output hourly, we apply our analyses to the observations that occur at the top of the hour to match the forecast availability. For the observations and model output, 130 we only analyse dates that have a continuous 12-h segment of data from the 00Z and 12Z forecast initialization times to encompass an entire day. These criteria yield ≈ 80-150 continuous 12-h forecast segments that overlap with available observations for each initialization time and tower location. The criteria also yield ≈ 300-400 continuous 12-h segments of observations for each initialization time and tower location that will be used for forecast verification and the multivariate methods discussed in Sect. 3.3. 135

Ramp definition
Wind power ramps are large changes in power production over short time periods. Despite the significant influence of ramp events on the electric grid and a clear need for accurate forecasts of these events, there is not a commonly accepted method to define and identify them. Ramp definitions vary in the literature (Kamath, 2010(Kamath, , 2011Pinson and Girard, 2012;Bossavy 140 et al., 2013;Bianco et al., 2016) regarding the threshold of power change and the duration over which that change occurs.
Variations also exist regarding which data points in a given window of time should be used when calculating the change in power, and lastly whether to use power time series directly when defining ramps or instead use a filtered time series (Bossavy et al. 2013). Commonalities in the literature include the need to define ramp magnitude, duration, and sign (i.e., upor down-ramp). 145 This lack of a standard definition is primarily because what is considered an important ramp event depends on the needs of the wind farm operator or grid-system manager at any given time or location. Here, we employ a combination of the Minimum-maximum method used by Pinson and Girard (their eqn. 8, 2012) and that employed in the Ramp Tool & Metric created by Bianco et al. (2016) to generate separate ramp time series for up-ramps and down-ramps. Up-ramps and downramps are considered separately, because they have different impacts on the power grid and lead to different decisions. Up-150 ramps may result in a swap of conventional energy sources for cleaner wind power while a down-ramp may result in the opposite and can have more detrimental effects on the grid during periods of high electricity demand.
Before identifying power ramps, wind speed observations and forecasts must be converted to power. A conversion from wind speed to power in this study is achieved via the International Electrotechnical Commission (IEC) turbine power curve for Class 2 turbines (IEC, 2007). This power curve is for wind turbines with a cut-in wind speed > 3 m s -1 , rated power 155 ≥ 16 m s -1 , and a cut-out wind speed > 25 m s -1 . Using the resulting power time series, we create binary time series of up-Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci. Discussion started: 14 February 2018 c Author(s) 2018. CC BY 4.0 License. and down-ramp events into ones (ramp occurred) and zeros (no ramp occurred). We do this by first dividing the power time series into )*+ sliding time windows of length h and then finding the largest positive and negative power differences that exists within each window, ∆ ./0 and ∆ .*+ , respectively. If the largest positive power difference equals or exceeds the defined power change threshold , then the up-ramp time series is given a value of 1 for that time window. Conversely, if the 160 largest negative power difference is less than or equal to -, then a 1 is assigned to the down-ramp time series for that time window. If the above respective criterion is not met, then a 0 is assigned for that time window. The window then slides one hour forward in time and the process is repeated until there are )*+ binary values for both the up-ramp and down-ramp time series. We allow up-ramps and down-ramps to happen within the same time window, so that there could be a value of 1 assigned for the same time window in both the up-and down-ramp time series. This allowance is reasonable, because for 165 some longer window-lengths, up-ramps and down-ramps could both occur and are equally important to forecast. An example of the identification of up-and down-ramps according to this method appears in Figure 1. While more complex ramp definitions are available, the chosen criteria for up-and down-ramps reflect the common intuition about ramps including threshold and window length h to customize the definition to specific needs. As determined later, this ramp definition can be employed in a probabilistic framework and will be used to compare the different approaches to scenario generation.

Deterministic to probabilistic forecasts: univariate post-processing
To improve the skill of the raw HRRR forecasts at predicting ramp events, we employ statistical post-processing techniques 180 to enhance the HRRR forecasts through the addition of uncertainty information. These methods convert the deterministic (single value) raw HRRR forecast into probabilistic forecasts by creating a set of forecast scenarios of wind speed that represent the forecast uncertainty. Wind speed scenarios are converted to power scenarios and then probabilities of ramp events are derived. The first step to generating scenarios is to perform univariate post-processing on the HRRR forecasts at each individual lead time. 185 We first determine a predictive distribution model for each tower and forecast initialization time which accurately predicts future observations for each forecast lead time. We employ ordinary-least-squares regression on the observed wind speed data during which the HRRR forecasts are also available (01 January 2015 -28 September 2016). To make use of the ≈ 21 months during which both the HRRR forecasts and observations are available, we cross-validate these data. We leave one month out for verification and fit the statistical models used to determine the parameters of the predictive distributions 190 with the remaining 20 months of data (training period). We repeat this process so that 21 months of forecasts and independent verifying observations are obtained for each month, forecast initialization, lead time, and tower location. We find the mean and standard deviation of the predictive distributions by inserting verifying forecasts into the fitted regression model. Before performing the regression, we apply a power transform (not to be confused with wind speed-to-power conversion) with power exponent to the forecasts = < and observations = < to address the increase of forecast 195 uncertainty with wind speed (i.e., heteroscedasticity in the dataset). Heteroscedasticity in the data is visible as more spread in the data points at higher wind speeds than at lower wind speeds in Figure 2a The transformation and removal of the seasonal cycle makes the relationship between the transformed forecasts and transformed observations more homoscedastic (i.e., more consistent forecast variability for all wind speeds in Figure 2b).
This characteristic enables us to use more traditional statistical techniques and avoid the nonlinear regression that would be required between the untransformed forecasts and untransformed observations because of the heteroscedasticity. An inverse transformation of the observations, forecasts, and regression lines reveal the complexity of the regression line we would have 215 had to use if we had not transformed the data before applying regression analysis ( Figure 2c). The slight curvature in the standard deviation lines in Figure 2c shows the dependence of error variance on wind speed magnitude (i.e., heteroscedasticity); the black dots are closer to the red regression line at lower wind speeds than at higher wind speeds. The scatter in the red regression dots in Figure 2c illustrates how the annual cycle influences the regression; depending on the time of year, the transformation value can be different because of the annual cycle. 220 We test three candidate predictive distribution models for the transformed wind speed observations using the predictive mean and standard deviation produced from the above linear regression: truncated normal, truncated logistic, and gamma distributions where the truncated distributions exclude negative values. These distributions, given the same mean and standard deviation, vary in the shape of their peaks and size of their tails. Probability integral transforms (PITs) of each predictive cumulative distribution function (CDF, * ) and its verifying observation * are calculated for each candidate 225 distribution as * : = * * , and provide an assessment of which distribution yields the best calibration (Dawid, 1984;. Histograms of the PITs which include all verification days and lead times show that the gamma and truncated logistic distributions are well-calibrated to the observed transformed wind speeds at the NREL M5 and PNW towers, respectively for 00Z ( Figure 3) and 12Z (not shown) initialization times. The good calibration is qualitatively demonstrated by the mostly uniform histograms in Figure 3. For a more quantitative assessment of calibration, we compute 230 the continuous ranked probability score (CRPS). The CRPS is a proper scoring rule that is often used to evaluate the quality of a probabilistic forecast by summarizing the sharpness and calibration of the forecast distribution (Gneiting et al., 2005;. A proper score is one that produces the highest reward (i.e., lowest CRPS score) by using the true probability distribution . For a given pair of predictive CDF and verifying observation , the CRPS is defined as 235 where is the probability that the forecast will not exceed threshold and is a Heaviside step function which attains the value 1 when its argument is ≥ 0, and 0 otherwise. A low CRPS value suggests a predictive distribution model can 240 accurately represent future observations. We calculate the CRPS for each candidate predictive distribution using the closedform expressions for the CRPS of a truncated normal (Gneiting et al., 2006)  distributions (Scheuerer and Möller, 2015). Based on the CRPSs, averaged over all lead times for each tower and initialization (Table 1), and the PIT histograms, we choose to proceed with the gamma (truncated logistic) distribution model for the NREL M5 (PNW) transformed observations. 245

least-squares linear regression trends (solid red line in (b) and red dots in (c)) and lines representing one standard deviation (solid black line in (b) and black dots in (c)) from the regression lines are displayed for the transformed and back-transformed data.
Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci.

Generation of forecast scenarios: multivariate post-processing
We obtain probabilistic forecasts of univariate statistically post-processed wind speeds for each verification day, forecast initialization, and lead time for both towers by using the truncated logistic or gamma distribution models as discussed in 265 Sect. 3.2. These marginal distributions provide prediction uncertainty information for each lead time on a given day and initialization time, but they do not provide information about the serial dependence of the distributions across multiple lead times. Ramp events are changes in power over a short period of time; to identify ramps and the uncertainty associated with them, we need to generate scenarios of wind speed which represent that serial dependence and that can then be converted to scenarios of wind power. We model serial dependence of the individual lead-time predictive distributions to construct 270 forecast scenarios of wind speed which are then converted to power. We utilize four methods to define the interdependence structure and generate the scenarios. The Gaussian copula, standard Schaake Shuffle (StSS), MDSS, and MDSS+ methods are discussed below.

Gaussian copula
We first generate scenarios of wind speed following the Gaussian copula method (Pinson et al., 2009;Pinson and Girard, 275 2012). The Gaussian copula approach first converts the transformed wind speeds (Sect. 3.2) from the chosen forecast distribution (here, we use truncated logistic or gamma) into a uniform marginal probability distribution and then converts the uniform values into standard Gaussian-space using a combination of CDFs and inverse CDFs WE , where is either a gamma , , truncated logistic ℒ C ( , ) , or Gaussian (0,1) distribution. A flow diagram of the Gaussian copula procedure starting with a marginal gamma distribution is shown in Figure 4 and described below. An empirical covariance 280 matrix of the Gaussian values is constructed to estimate the correlation between the Gaussian values from all pairs of lead time. This covariance matrix provides information necessary to transition from marginal distributions for each lead time to multivariate distributions, which inform how the Gaussian values link across multiple lead times. Given the limited amount of training data and gaps in the range of dates for which observations are available, we do not attempt to estimate a timevarying covariance model. Instead, we follow Pinson and Girard (2012) and use a fixed exponential covariance model 285 where iE and iJ are the Gaussian random variables at lead time E and J , respectively, and is the range parameter 290 which controls the extent of correlation of transformed wind speed across lead times. An appropriate value for is selected empirically so that the resultant ECM for a given value of most resembles the decay of the empirical correlation values

Standard Schaake Shuffle 310
We also generate forecast scenarios of transformed wind speed using the Schaake Shuffle method, which uses historical wind speed scenarios to determine serial dependence of the wind speed forecasts across forecast lead times. This method for generating multivariate forecasts is used widely for precipitation and temperature forecasts (Clark et al., 2004), but has not yet been applied for wind speed and power forecasts. This method generates wind speed forecast scenarios which can be converted to power. Alternatively, the method could be used to generate power scenarios directly if given predictive 315 distributions and observations of power. Forecast scenarios are easier to visualize in wind-speed-space (transformed wind speed for our data) because of the strong non-linearity of the power curve, so we discuss the method starting with predictive distributions and observations of transformed wind speed. For a given date, we construct 50 forecasts for each forecast lead time by breaking the predictive distributions in Sect 3.2 into 50 quantiles so that the forecasts are simply the quantiles of The next step in the Schaake Shuffle method is to select an identical number of observed historical scenarios of transformed wind speed. The historical scenarios are selected from the 50 available dates preceding the forecast initialization date, so that the historical scenarios of transformed wind speed are from a similar season. Alternatively, dates could be pulled at random throughout the observed historical record. The method then ranks the 50 historical observations separately 325 for each lead time and assigns the same ranking to the 50 sorted forecast quantiles (an illustration of this process for three historical scenarios and three forecast quantiles is shown in Figure 5b, c). The final step of the Schaake Shuffle method is to connect the ranked quantile forecasts across lead times to yield multivariate forecast scenarios (Figure 5d). For instance, a forecast quantile that is associated with historical scenario '3' at lead time 0 will connect to all forecast quantiles that are also associated with historical scenarios '3' at their lead time ( Figure 5d). This shuffling of forecast quantiles to match the rank of 330 historical scenarios yields forecast scenarios that maintain a realistic temporal interdependence and shape across lead time while matching the predictive marginal distribution as described in Sect. 3.2.
Like in the Gaussian copula method, the generated scenarios of transformed wind speed forecasts from the Schaake Shuffle method can then be converted to power, if desired, to identify ramp events. Here, the selection of the historical scenarios used in the Schaake Shuffle was ad hoc; the method does not make a preferential selection of dates. We next 335 discuss two methods which preferentially choose historical scenarios that are most similar to the 1) quantiles of the forecast marginals and 2) the quantiles of the forecast marginals and also the quantiles of the wind speed difference between lead times. We distinguish between these three methods by referring to the standard method above as the Standard Schaake Shuffle (StSS), the first preferential method which will be discussed below as the Minimum Divergence Schaake Shuffle (MDSS), and the enhanced preferential method also discussed below as the MDSS+.

Minimum Divergence Schaake Shuffle (MDSS)
The first of two methods that we use to preferentially select and generate probabilistic forecast scenarios of transformed wind speed is the Minimum Divergence Schaake Shuffle (MDSS) method (Scheuerer et al., 2017) . The MDSS follows the same procedures as the StSS method that impose the ranking of historical scenarios on sorted quantiles of the forecast distributions and that connect forecasted quantiles associated with one particular historical scenario across all lead times. 355 Like for the StSS, the MDSS can also utilize historical observations from dates when no forecasts are available, an advantage over another variant of the Schaake Shuffle method introduced in Schefzik (2016). The identical processes of the StSS and MDSS methods are shown in Figure 5. The MDSS deviates from the StSS in its selection of historical scenarios; the MDSS preferentially chooses dates such that the marginal distributions of the sampled historical scenarios are most similar to the quantiles of the post-processed forecast marginal distributions across all forecast lead times rather than a 360 random or user-assigned selection of dates used for the StSS method. In the hydrological context discussed by Scheuerer et al. (2017), this preferential selection helped preserve features in the historical scenarios during the shuffling procedure shown in Figure 5c and d, and lead to improved multivariate probabilistic forecasts compared to StSS.
Because historical scenarios selected for the MDSS method are not limited to the most recent scenarios from the forecast initialization date as with the StSS method, the number of scenarios must be narrowed down to scenarios starting 365 from the total number of candidate scenarios C in the historical record, which for our dataset is ≈ 300 − 400 scenarios for each initialization time and tower location. This selection seeks the historical scenarios that yield the least divergence 1 ∆ i ℋ between the CDF of the forecast marginal distribution i w at each lead time , and the empirical CDF i ℋ calculated from a set ℋ of historical observation scenarios, Each scenario within the set ℋ is evaluated for final selection based on whether the scenario results in a larger or smaller total divergence ∆ yzy ℋ = ∆ i ℋ i when it is removed from the calculation. If the scenario results in a smaller divergence when it is left out of the computation, then it is not an optimal choice. Conversely, if leaving out the scenario results in a larger 375 divergence, then we know that the scenario is important for minimizing the divergence and should be kept as one of the final scenarios. Ideally, a set ℋ that includes all possible candidate scenarios would be reduced to size one-by-one, but this is computationally expensive. Therefore, we use a sequence of ℋ that reduces the starting number of candidate scenarios to test and eliminates more than one scenario with each iteration until scenarios are reached. For example, for the M5 tower 1 Divergence in this study means the integral of the squared difference between two CDFs and is different from the divergence term ∇ • commonly used in meteorology, where ∇ is the del operator and F is a meteorological field.

Enhanced version of the Minimum Divergence Schaake Shuffle (MDSS+)
Constraining the marginal distributions does not necessarily improve the representation of temporal gradients of the quantity 385 of interest. If the HRRR forecasts of temporal wind speed changes have some skill, then using a predictive distribution of these differences explicitly in the MDSS algorithm might result in a better selection of historical dates that have similar temporal gradients. This formulation is the idea behind the final method we use to generate transformed wind speed scenarios. The final method is much like MDSS, but includes an additional term to explicitly capture the variation in wind speed between neighbouring forecast lead times. For this enhanced MDSS method, historical scenarios are chosen that 390 yield the least divergence from both the forecast marginal distributions and the forecast distribution of the lag1-h lead time differences of transformed wind speed. Forecast distributions of lag 1-h lead time differences are attained in the same way as forecast marginal distributions (Sect. 3.2), except that now we perform a regression on lag 1-h difference of transformed wind speed. Based on PIT histograms (not shown), the best predictive distribution that represents these differences for both tower locations is the (non-truncated) logistic distribution. For this method, the historical scenarios that yield the smallest 395 divergence when considering both the forecast marginal distributions and the forecast distributions of wind-speed differences are selected. To emphasize the temporal gradient between two neighbouring lead times, we assign more weight to the divergence term associated with wind speed differences. In this study, we weight the wind speed difference term as five times greater than the marginal distribution term. This method requires that the lag 1-h difference between lead times in the historical scenarios best-match the lag 1-h differences of the forecast and is therefore an enhanced method to the MDSS. 400

Differences between historical observations selected by StSS, MDSS, and MDSS+
Marginal distributions of transformed wind speed of the historical scenarios used for each of the three Schaake shuffle methods ( Figure 6a) and the distributions of the lag 1-h differences of those scenarios (Figure 6b) reveal that the MDSS and MDSS+ produce historical scenarios closer to the forecasted distributions than does the StSS method. Of course, the MDSS+ is the only multivariate method that utilizes the lag 1-h differences when selecting historical scenarios and for that reason, we 405 see that the MDSS+ distributions for the lag 1-h differences (green boxes in Figure 6b) are often a slightly better match to the forecasted distribution (grey boxes in Figure 6b) than the regular MDSS or StSS methods (pink and blue boxes in Figure   6b, respectively). The MDSS+ method sometimes makes compromises in the selection of optimal scenarios for one of its two terms, because it seeks to find the historical scenarios that are an overall best match when considering both the quantiles Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci. Discussion started: 14 February 2018 c Author(s) 2018. CC BY 4.0 License. of the forecasted transformed wind speed distribution and the distribution of lag 1-hr differences of those wind speeds. Also, 410 the MDSS and MDSS+ methods only have a limited set of historical dates from which they can choose scenarios, so we cannot expect a perfect match. Boxplots of the distributions of transformed wind speeds and the lag-1h differences of those wind speeds are not shown in Figure 6 for the GC method, because we wanted to point out the differences among the historical scenarios selected by each Schaake Shuffle method; the GC method does not use historical scenarios. Once the historical scenarios are chosen, the quantiles of the forecast marginal distributions are reordered to have the same ranking of 415 the corresponding historical scenarios. Like for the Gaussian copula and StSS methods, both the MDSS and MDSS+ scenarios are then transformed back into wind-speed-space and converted to power before identifying ramp events.

Verification of deterministic HRRR forecasts with observations
To provide a reference for the performance of predicting up-and down-ramp events, we first illustrate how ramps identified from the raw HRRR forecasts compare to those identified from the observations at the M5 and PNW tower locations. The correlation between ramps identified with the HRRR forecasts and observations are low at both tower locations (Figure 7) 430 ranging between 0.23 and 0.37. The ramp definition used for Figure 7 is different from the ramp definition discussed in Sect. Figure 7 to show the magnitude of the change in wind speed that is observed and forecasted at each tower location during a period of three hours. Utilizing the magnitude directly -rather than a particular exceedance event -eliminates the need to set any particular threshold for a change in wind speed, which would be difficult to define anyway because of the nonlinear 435 relationship between wind speed and the power curve. The purpose of Figure 7 is to reveal biases in the HRRR forecasts and differences between the two tower sites, while the analysis of power ramp events in the subsequent sections is more applicable for decision-making of power grid operations.

3.1, because it shows ramps identified with wind speed instead of power. This ramp definition is only used in
At the M5 tower site, the HRRR predicts stronger wind speed ramps compared to observations; forecasted wind speed ramps ≥ 5 m s -1 make up 40% of the total number of up-and down-ramps while the observed ramps of the same 440 magnitude only make up 33% of the total number of ramps. The HRRR generally under predicts the magnitude of wind speed ramps at the PNW site; observed wind speed ramps ≥ 5 m s -1 make up 18% of the total number of up-and downramps while the forecasted ramps ≥ 5 m s -1 only contribute to 9% of the total number of ramps. These percentages also highlight that the M5 location has a greater percentage of observed ramps of the same magnitude than at the PNW location (33% vs 18%), suggesting that the wind speeds at the M5 site are more variable than at the PNW site. The M5 tower is 445 located in a region of very complex terrain about 5 km east of the Colorado Front Range, which because of the atmosphere's interaction with the mountainous terrain can cause large changes in wind speed over short periods of time. The PNW tower is also located in a region of complex terrain near the Columbia River Gorge, but the terrain is not as complex as the M5 site.
The low correlation coefficients between the HRRR forecasts and observed wind speed ramps suggest that there is some skill in the HRRR forecasts at predicting ramps, but the skill is limited and differs between up-and down-ramps. Low 450 correlation limits the extent to which statistical post-processing can improve the forecast. On the other hand, we have shown that systematic over-and under-forecasting biases can be reduced with statistical post-processing. Moreover, the multivariate methods discussed in this paper can provide information about the uncertainty of the forecast via generation of many possible wind speed scenarios.

Verification of multivariate methods compared with HRRR forecasts
We now compare the various multivariate methods used to generate scenarios of transformed wind speed. To also compare the different methods to the deterministic raw HRRR forecasts, we first employ an event-based metric to assess systematic 465 biases with regard to the frequency of ramp events. This metric counts the number of power ramps defined as in Sect. 3.1 identified from the scenarios generated by each method described in Sect. 3.3. The relative frequency of power ramps that exceed = 60% change in power capacity during 6 h for all days when forecasts and observations are available (Figure 8 plotted as a single line in Figure 8. We again see a general over-forecasting bias of the number of ramp events (this time power ramps) produced by the raw HRRR forecasts compared to observations at the M5 tower (Figure 8c, d) and the opposite behaviour of the HRRR forecasts at the PNW tower (Figure 8a, b). The HRRR forecasts especially struggled with the diurnal cycle and magnitude of the relative frequency of up-and down-ramps at the PNW location. The HRRR predicted the most up-ramps in the first four ramp windows (between 00Z and 9Z) and then levelled out for the remainder of the early 475 morning while the observations show a minimum in up-ramps during the first four ramp windows and a maximum during the remaining windows, which suggests that the HRRR incorrectly captured the diurnal cycle. For the down-ramps, the HRRR forecasted a gradual increase in ramp events across all ramp windows, while the observations show a peak in downramps around the fourth ramp window (≈ 9Z) followed by a gradual decrease in down-ramp events during the remainder of the morning. 480 The method that most-closely follows the ramp climatology of the observations (black line in Figure 8) is the StSS method followed by the MDSS+, MDSS, and lastly the Gaussian copula method. The StSS method has an overall better prediction of up-and down-ramp climatology than the raw HRRR forecasts when compared to a climatology of observed ramp events. As discussed in Sect. 3.3, the MDSS and MDSS+ methods make a preferential selection of historical scenarios that minimize the divergence between the post-processed forecast and past scenarios and should yield scenarios more similar 485 to the current forecast than the random or assigned scenarios used in the StSS method. Despite this preferential selection, the MDSS and MDSS+ methods do not outperform the StSS method in predicting the climatology of relative frequency of ramp events for this dataset. The reasons for this result are presented in the discussion for Figure 12. Before discussing reasons for why the more complex methods do not outperform the standard Schaake Shuffle method at predicting a climatology of ramp events, we next examine a metric used to compare the skill of various probabilistic forecast methods to determine the 490 differences in performance between the StSS and two MDSS methods.

500
The StSS, MDSS, MDSS+, and the Gaussian copula methods produce probabilistic forecasts of ramp events. To verify the skill of and compare among the different probabilistic methods, we compute Brier Skill scores (BSS). The Brier skill score quantifies the extent to which a forecast method improves the prediction of a two-categorical event compared to a reference forecast, where BS fcst is the Brier score of the forecast and BS ref is the Brier score of the reference forecast. The Brier score is a strictly proper score that summarizes the accuracy of a probabilistic forecast; it is defined as the squared error of the probability forecast of an event and the observed binary outcome (1 if the event happened, 0 if not). The events here are 510 characterized by the exceedance of a particular ramp threshold during a ramp window size h. Climatological probabilities of occurrence of up-and down-ramp events with a particular and h are used as the reference forecast. Before calculating the BS fcst , we took the average of the binary event forecasts from all 50 scenarios (1000 scenarios for Gaussian copula method) for each method to create a probabilistic forecast with a value between 0 (no ramps occurred in any of the scenarios) and 1 (ramps occurred in all of the scenarios). Brier scores were calculated for each type of ramp (i.e., up-and down-ramps 515 with = 0.20, 0.40, 0.60, and 0.80 and h = 3 h and 6 h). To quantify the sampling variability of the BSS induced by the limited data sample size, we first generated 100 bootstrap samples with replacement of the daily BS fcst and BS ref separately for each forecast initialization time (i.e., 00Z and 12Z). Then, we summed the 100 BS from each initialization time together before calculating the BSS to reduce sampling variability.
Box plots of the BSS for both tower locations and different types of power ramps reveal dependencies of the 520 forecast skill on , h, and tower location (Figure 9 and Figure 10). The most noticeable difference among the BSS is that the skill is generally higher for forecasts made at the PNW tower location compared to those at the M5 tower location for all types of ramps. Recall from Sect. 4.1 that the observed up-and down-ramps and those predicted by the HRRR had low correlation coefficients, which is why it is difficult to get positive skill with any of the methods at the M5 site; statistical post-processing can correct for systematic forecasting biases, but it cannot improve random errors which lead to low 525 correlation. Conversely, at the PNW site, there are overall higher correlation values (Figure 7) compared to those at the M5 site meaning that statistical post-processing will be more consequential. This behaviour results in generally positive and higher BSS for the PNW site than for the M5 site (Figure 9 and Figure 10). Greater positive skill is gained when we identify ramps in a window size of 6 h ( Figure 10) instead of 3 h (Figure 9) for both the PNW and M5 sites, because timing errors are less consequential when the time window is larger. 530 The multivariate methods do not present as much skill in forecasting down-ramps as they do in forecasting upramps at the PNW site, except for events with small (20%) power changes during 3 h. In Figure 7, the correlation between observed ramps and ramps forecasted by the HRRR is greater for up-ramps (0.37) than down-ramps (0.27) at the PNW site.
At the M5 site, the correlation between observed and HRRR-forecasted down-ramps (0.31) is larger than for up-ramps (0.23). We also note greater skill for the multivariate methods at predicting down-ramps opposed to up-ramps at the M5 site, 535 which combined with the relative skill of up-and down-ramps at the PNW site suggests that the quality of the initial raw forecast skill impacts the amount of skill that can be gained from the probabilistic approaches.
How does skill vary among the different multivariate methods? The scenarios produced with the Gaussian copula method result in significantly less skill than all of the other methods for the M5 tower location and marginally less skill than Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci. Discussion started: 14 February 2018 c Author(s) 2018. CC BY 4.0 License. the other methods at the PNW site. The Gaussian copula utilizes an exponential covariance model that defines the temporal 540 dependence of scenarios through the range parameter , which was set to 2.5 and 1.5 for the PNW site and M5 site, respectively. Those parameters were selected based on empirical correlations, but did not yield the highest BSS. Instead, we achieved the highest BSS using = 3.5 and 2.5 (not shown) for the PNW and M5 site, respectively, but these values are not apparent choices based on what the data suggest. The fact that a larger parameter produces a better result implies that a generic exponential model may not be a good fit for these complex sites; the inclusion of local forecasts to create 545 probabilistic scenarios is necessary to understand the correlation among lead times.
The most surprising result from the analyses is that the StSS method is more competitive than the MDSS method (and slightly more competitive than the MDSS+ method) despite the preferential selection of historical scenarios made by the MDSS method. The MDSS method selected historical scenarios of transformed wind speed that were most compatible with the marginal distributions of the forecast day and theoretically should provide higher BSS than the StSS method, which 550 only could use scenarios from the 50 available historical dates prior to the forecast day. The original MDSS method used by Scheuerer et al. (2017) worked well for precipitation events, but does not focus on the selection of historical scenarios based on their compatibility with forecasted (temporal) gradients, which are crucial for the prediction of ramps. This understanding led us to include an additional term in the MDSS method that is based on lag 1-h differences of transformed wind speed. The modified MDSS method MDSS+, matches historical scenarios to not only the forecast marginal distributions but also to the 555 forecast distributions of lag 1-h differences. The term of lag 1-h differences ensures a better selection of historical scenarios with ramps of similar slope or magnitude to the forecast.  location (a, c, and e) and the M5 tower location (b, d, f) and for up-ramps (filled boxplots) and down-ramps (non-filled boxplots). Ramp events with a power threshold of = 20% (a and b), 40% (c and d), and 60% (e and f)   We see that the median BSS using the MDSS+ forecast scenarios are often higher than those of the MDSS method and more competitive with the StSS method for all ramp types (Figure 9 and Figure 10). However, minute difference 580 between the three Schaake shuffle methods are unperceivable, because of the limited sample size which resulted in considerable overlap between the BSS boxplots. To highlight the differences between the three Schaake Shuffle methods, we generated 25 years of synthetic wind speed observations and forecasts (Appendix B). These synthetic data underwent the same univariate post-processing steps (Sect. 3.2) as the real data before applying the different Schaake shuffle methods. Box plots of BSS using the synthetic data present positive skill, show considerably less variability than the real data, and 585 highlight the differences between the MDSS and MDSS+ methods ( Figure 11). The inclusion of the lag 1-h differences in the MDSS+ is essential to achieve optimal and competitive skill from the method when compared to StSS. Recall that the lag 1-h differences are weighted five times more than the forecast marginal distributions in the MDSS+, meaning that the transformed-wind-speed gradient between lead times is even more important to match than the forecast marginal distributions. With this additional term, the MDSS+ is as competitive as the StSS method at choosing scenarios that lead to 590 skilful ramp forecasts.
Why is the rather simplistic StSS method as good (even better in regards to ramp frequency biases, see Figure 8) as the more sophisticated MDSS+ and significantly better than the MDSS method? Some insight is gained by analysing the lag 1-h differences of wind speed forecasts generated by the three different Schaake Shuffle techniques. We compute absolute lag 1-h differences of observed wind speeds and those of the historical observed wind speed scenarios selected by the StSS, 595 MDSS, and MDSS+ methods before shuffling. For each method, the absolute lag 1-h differences are calculated for each date and for each 12 pairs of lead times. For each date and paired lead time, the lag 1-h differences are then stratified according to the corresponding HRRR wind speed forecast. Lag 1-h differences from all dates and paired lead times associated with a certain range of HRRR forecasted wind speeds are then averaged together before plotting (Figure 12a). A dependency between the magnitudes of lag 1-h differences and HRRR wind speed forecasts emerge. The magnitude of the observed lag 600 1-h differences increases as the HRRR forecast wind speed increases, which suggests that higher wind speeds correspond to larger fluctuations in wind speed. Because the StSS method does not depend on the HRRR forecast to select historical scenarios, the lag 1-h differences of the StSS historical scenarios are independent of the magnitude of the HRRR forecast wind speed. This result is demonstrated by the relatively flat StSS (blue) curve in Figure 12a. Conversely, the MDSS and Wind Energ. Sci. Discuss., https://doi.org/10.5194/wes-2018-13 Manuscript under review for journal Wind Energ. Sci. Discussion started: 14 February 2018 c Author(s) 2018. CC BY 4.0 License. MDSS+ methods make a preferential selection of past observations based on the current HRRR forecast wind speed. The 605 result is that the MDSS and MDSS+ methods produce curves (pink and green lines, respectively in Figure 12a) of lag 1-h differences qualitatively similar to the observed curve (black line in Figure 12a).
This better initial selection of scenarios, however, is offset by the effects of the shuffling procedure. Panel b) in Figure 12 shows the mean absolute lag 1-h differences after shuffling. For the StSS, the shuffling makes the lag 1-h differences more similar to the observed lag 1-h differences; lag 1-h differences decrease for low HRRR forecast wind 610 speeds and increase for high HRRR forecast wind speeds during the shuffling procedure. For the MDSS and MDSS+ methods, shuffling always results in a slight increase of the magnitudes of lag 1-h differences (pink and green lines in Figure   12b). This increase after shuffling the scenarios explains why the MDSS and to a lesser extent, the MDSS+ have a tendency to over-forecast the magnitude and frequency of wind speed ramps (see Figure 8).
Lastly, we investigate why the shuffling procedure affects the historical StSS scenarios differently than the MDSS 615 and MDSS+ scenarios. We conjecture that one of the reasons for this effect is the difference in spread of the scenarios used by each method before shuffling. We quantify the spread as the mean absolute difference between the historical scenarios.
Since the historical StSS scenarios are chosen unconditionally, the spread of its marginal distribution (see blue line in Figure   12c) approximates the climatological spread of actual observed wind speeds. Preferential selection performed by MDSS and MDSS+ significantly decreases the spread of the historical marginal distributions with the exception for high HRRR wind 620 speed forecasts where the prediction uncertainty can exceed the climatological spread (i.e., exceed blue line). This initial reduction in spread, however, reduces a side effect entailed by StSS: the shuffling procedure squeezes together scenarios as the unconditional spread is transformed into a forecast-informed spread. By doing this, the shuffling procedure typically reduces the fluctuations present in the historical scenarios. Because all of the Schaake Shuffle methods discussed herein use the same quantiles of a particular forecast distribution, all methods have the same spread after shuffling (gray line in Figure  625 12c). Since the MDSS and MDSS+ historical scenarios already have low spread, shuffling does not change their characteristics as much as it does for the StSS historical scenarios; the level of fluctuations is similar before and after shuffling. In other setups, the shuffling side effect can be unwanted, but in the present setup, it seems to benefit the StSS method and results in the overall most accurate level of wind speed fluctuations.  Wind power ramps present challenges to wind power forecasters and the electrical grid, because they cause sharp changes in power production in time periods of minutes to hours. Better forecasts of ramp events can lead to more reliable wind power generation and less strain on the power grid. Generally, wind farm operators rely on a single forecast of persistence to 650 determine power fluctuations over the next 30 mins to an hour, which is not suitable during ramp events. Numerical weather prediction and statistical post-processing techniques can improve ramp forecasts by predicting rapid future fluctuations in wind speed and power and by providing uncertainty information to those forecasts.
In this paper, we used observed 80-m wind speeds from tall meteorological towers located in Boulder, Colorado (M5 site) and in eastern Oregon (PNW site). We also used forecasts of 80-m wind speeds from the HRRR model to create 655 probabilistic forecasts of up-and down-ramp events. With these data, we presented how to obtain probabilistic wind speed forecasts by first correcting biases in the forecasts and then applying one of the four multivariate methods discussed to generate scenarios of wind speed. We used the IEC power curve to convert scenarios of wind speed to scenarios of power before identifying ramps. Alternatively, a relationship between measured wind speed and power output from a training dataset could be used to bypass the use of a power curve for future wind speeds (Lange and Focken, 2006). Employing 660 stochastic power curves (Jeon and Taylor, 2012) would also take the conversion uncertainty into account. Because our study was focused on the evaluation and comparison of multivariate statistical post-processing methods and wind speed to power conversion affects all methods in the same way, using a fixed power curve warrants a fair comparison. If observed power production rather than observed wind speed was used as the 'ground truth', an inverse (power-to-wind speed) transformation could be employed to reconstruct the associated wind speeds (Messner et al., 2014), and the conversion uncertainty would be 665 accounted for implicitly.
Before generating the scenarios, we removed the seasonal cycle and corrected for heteroscedasticity within the observations and raw HRRR forecasts by applying a power transformation. We then regressed the transformed observations on the transformed forecasts to obtain regression coefficients. The mean and standard deviation of marginal predictive distributions for each forecast initialization and lead time were determined by inserting future forecasts into the fitted 670 regression model with these coefficients. We tested three candidate predictive distributions and found that the gamma distribution and the truncated logistic distributions were the best fits for the M5 and PNW tower locations in regards to wind speed, respectively. We determined that these predictive distribution models were suitable to represent observations based on uniform PIT histograms and low CPRS values. This approach to obtaining marginal predictive distributions is rather simple, but given the limited amount of data that remained after filtering, we thought that a stable parameter estimation for a more 675 complex model was not warranted. A larger training dataset would allow one to account for forecast biases that vary with wind direction (Eide et al., 2017), or to use an analog-based regression approach similar to the method proposed by Junk et al., (2015), and include analog predictor variables related to atmospheric stability. The marginal predictive distributions provided uncertainty information for each lead time, but did not inform us about the interdependence structure across all lead times. To construct this interdependence, we first used the Gaussian 680 copula technique following Pinson and Girard (2012), which relates the predictive distributions across all lead times by utilizing an exponential covariance model of Gaussian random variables. We used a random number generator to generate 1000 scenarios of wind speed using this method. The Gaussian copula method is based on parametric assumptions that may not be an adequate representation of the interdependence between observed wind speeds at different lead times, so we tested three new methods of generating scenarios of transformed wind speeds. The standard Schaake Shuffle (StSS), the minimum 685 divergence Schaake Shuffle (MDSS), and the enhanced version of the MDSS (MDSS+) methods all use historical observed scenarios to inform how marginal predictive distributions should be connected across all lead times, which results in more realistic forecast scenarios.
The StSS method only used an ad hoc selection of historical scenarios while the MDSS and MDSS+ made preferential selections of historical scenarios that best matched the forecast marginal distributions (MDSS) or matched both 690 the forecast marginal distributions and the forecast distributions of the lag 1-h differences of transformed wind speed (MDSS+). Even with these modified version of the Schaake Shuffle, we found that the StSS method provided the highest Brier skill scores overall using real data. However, all of these methods provided improvements over the raw HRRR forecasts, which struggled to capture the diurnal cycle and magnitude of the relative frequency of up-and down-ramp events.
These methods also reduced the over-and under-forecasting biases of the raw forecasts at the M5 and PNW tower locations, 695 respectively. We compared the three Schaake Shuffle methods at forecasting ramp events using a dataset of 25 years of synthetic forecasts and observations to emphasize the differences among the multivariate methods without constraints from the limited real dataset. We found that the MDSS+ method had significantly higher skill compared to the MDSS and was competitive with the StSS method suggesting that inclusion of the lag 1-h wind speed differences is a key component to accurate forecasting of ramp events when preferentially selecting historical scenarios. 700 We were limited with how much improvement statistical post-processing could provide with the real data, because the correlation between the observations and HRRR forecasts of up-and down-ramps were low. However, we still achieved some positive skill by reducing over-and under-forecasting biases and by employing the multivariate methods to generate probabilistic forecasts for the PNW tower which had overall higher correlation coefficients than that of the M5 tower location. Generally, the greatest skill was achieved for the prediction of up-ramps at the PNW site for which also happened 705 to be the ramp type associated with the highest correlation. This dependence on initial forecast skill is encouraging, because it suggests that for sites with fewer random errors and better skill (e.g., sites over flat terrain), we may be able to achieve significant improvement in forecast skill using these multivariate methods. A longer record of historical scenarios would also be advantageous, because it would increase the likelihood that forecasts would have a good match with past events for selection by the MDSS and MDSS+ methods. We demonstrated how statistical post-processing can correct forecast biases of up-and down-ramp events and how multivariate statistical methods can be used to generate probabilistic forecasts of wind speed and power scenarios. These methods can be implemented for real-time wind-farm operations using historical observations at a particular wind farm to gain uncertainty information regarding ramp forecasts. We used the generic IEC power curve to convert wind speed scenarios to power scenarios, but wind power forecasters should use their own turbine-specific power curves to further 715 reduce uncertainty. Additionally, these methods are applicable with other numerical weather prediction models besides the HRRR model. Therefore, wind power forecasters can use forecasts from their proprietary models as long as observations are available during the same time period for verification. Enhancements to the forecasts provided by gaining uncertainty information should help with decision-making in the energy-sector not only for direct power generation, but also for scheduling the availability of transmission lines, energy reserves, and energy trading. Future research that could improve 720 these methods includes improvement to raw forecasts via various methods (e.g., increased grid resolution and improved physics parameterizations), using additional predictors in the regression analysis of the univariate data (e.g., temperature and wind direction), and performing these methods for sites that generally yield higher quality forecasts. Overall these methods may find utility in assessing risks of other wind-speed dependent phenomena like wildfire propagation or pollution dispersion. 725

Data Availability
Data from the M5 meteorological tower are available at https://wind.nrel.gov/MetData/135mData/M5Twr/. Data from the  To generate a synthetic wind speed dataset (deterministic forecasts and observations) we use again a Gaussian copula approach, now applied to unconditional (climatological) marginal distributions. For simplicity, we assume the same climatology at each day of the year and each time of the day. Serial dependence in the Gaussian space is modelled via AR (1) processes, i.e. autoregressive processes of order one, that are used to generate two dependent time series (z t (x) ) t=1,..,T and (z t (y) ) t=1,..,T with time index ranging from 1 to T. We proceed in two steps: 840 1. Simulate a bivariate Gaussian time series with zero mean and marginal variances equal to 1 • let ρ = 0.8 be the correlation between the forecast and observation time series