Assessing Variability of Wind Speed: Comparison and Validation of 27 Methodologies

Because wind resources vary from year to year, the inter-monthly and inter-annual variability (IAV) of wind speed is a key component of the overall uncertainty in the wind resource assessment process thereby causing challenges to wind-farm operators and owners. We present a critical assessment 10 of several common approaches for calculating variability by applying each of the methods to the same 37-year monthly wind-speed and energy-production time series to highlight the differences between these methods. We then assess the accuracy of the variability calculations by correlating the wind-speed variability estimates to the variabilities of actual wind-farm energy production. We recommend the Robust Coefficient of Variation (RCoV) for systematically estimating variability, and we underscore its 15 advantages as well as the importance of using a statistically robust and resistant method. Using normalized spread metrics, including RCoV, high variability of monthly mean wind speeds at a location effectively denotes strong fluctuations of monthly total energy generations, and vice versa. Meanwhile, the windspeed IAVs computed with annual-mean data fail to adequately represent energy-production IAVs of wind farms. Finally, we find that estimates of energy-generation variability require 10 ±3 years of monthly 20 mean wind-speed records to achieve 90% statistical confidence. This paper also provides guidance on the spatial distribution of wind-speed RCoV.


Introduction
The P50, a widely used parameter in the wind-energy industry, is an estimate of the threshold of annual energy production of a wind farm that the facility is expected to exceed 50 % of the time (Clifton et al., 2016).The P50 is usually estimated to apply over the lifetime of a wind farm, typically 20 years.To estimate P50 in the wind resource assessment process, a single percentage value is usually assigned to represent the uncertainty for the desired time period at a wind site (Brower, 2012).The interannual variability (IAV) of wind resources, along with site measurements and windpower-plant performance, is an important component of the overall uncertainty in power production (Clifton et al., 2016;Klink, 2002;Lackner et al., 2008;Pryor et al., 2006).The IAV is also incorporated in the measure-correlate-predict process (Lackner et al., 2008), which usually considers wind measurements spanning less than 2 years.
Analysts and researchers use numerous metrics to quantify wind-speed variability, and the most common method is standard deviation (σ ).For instance, the variability in historical or future wind resources is often represented as the σ from the annual-mean wind speed of a certain location (Brower, 2012).As wind turbine power generation is a function of wind speed, the variability of wind resources has important implications for the resultant long-term energy production.Financially, when the wind resource is projected to fluctuate more from year to year (Hdidouan and Staffell, 2017), the levelized cost of wind energy increases as well.
Because the profitability of wind farms depends on wind variability, past research has explored the implications of interannual and long-term variability in wind energy.Pryor et al. (2009) analyze trends of annual wind speed and IAV, without explicitly quantifying IAV values.Archer and Jacobson (2013) evaluate the seasonal variability of wind-energy capacity factor.Lee et al. (2018) assess the spatial discrepancies between wind-speed variabilities of different temporal scales, from hourly mean to annual-mean data.Bett et al. (2013) use σ and Weibull parameters to assess the wind variability in Europe.Extreme event analysis also offers another perspective to assess variability.For example, Cannon et al. (2015) examine extreme wind-energy generation events via reanalysis data and discuss the associated seasonal and IAV qualitatively.Leahy and McKeogh (2013) also quantify the return periods of multiweek wind droughts.
To quantify variability, the normalized σ or the coefficient of variation (CoV), the σ divided by the mean of a time series, is a commonly used tool.Justus et al. (1979) calculate and compare the CoVs of monthly and annual wind speeds at different sites across the United States.Baker et al. (1990) quantify interannual and interseasonal variations of both wind speed and energy production at three locations in the Pacific Northwest.They find the annual CoVs ranged from 4 % to 10 %, matching the conclusions from Justus et al. (1979).Recently, Li et al. (2010) calculate hubheight wind-speed variance and σ over 30 years to spatially evaluate seasonal and IAV in the Great Lakes region.Bodini et al. (2016) estimate the IAV of wind resources with a modified version of CoV, using observed meteorological data in Canada.As the sample period increases, the IAVs of most sites gradually increase, averaging 5 % to 6 % among the chosen sites (Bodini et al., 2016).Krakauer and Cohan (2017) correlate the CoVs of monthly mean wind speeds with different climate oscillation indices and find the global mean CoV at 8 %.In addition to characterizing wind speed, the metric is also used to evaluate the benefits of grid integration.For example, Rose and Apt (2015) conclude that the interannual CoV of aggregate wind-energy generation in the central United States is 3 ± 0.1 %, much smaller than that of individual wind plants, which varies between 5.4 % and 12 %, ±4.2 %.
Aside from CoV, other metrics representing the spread of data have also been chosen to estimate variability in the literature.For example, the robust coefficient of variation (RCoV) normalizes the median absolute deviation (MAD) with the median.Gunturu and Schlosser (2012) quantify the spatial RCoV of wind-power density in the United States and demonstrate that the regions east of the Rockies, especially the Plains, generally have weaker variability and higher availability of wind resources.The seasonality index, originally used in Walsh and Lawler (1981) for precipitation purposes, is another measure to express variability.The seasonality index is defined as the sum of the absolute deviations of monthly averages from the annual mean, normalized with the annual mean.Chen et al. (2013) use the seasonality index to assess the interannual trend and the variability of wind speed in China, and they relate wind-speed IAVs to climate oscillations.
Alternative variability metrics emphasize the long-term trends via contrasting wind speeds of different periods.The "wind index", used in Pryor et al. (2006) and Pryor and Barthelmie (2010), is a ratio of wind speeds of a reference period and an analysis period.An entirely different wind index evaluated in Watson et al. (2015) is a ratio of spatially averaged wind speeds during two different periods.
Despite the importance of long-term variability, the windenergy industry lacks a systematic method to quantify this uncertainty.As various metrics to assess variability exist, a comprehensive comparison of measures is necessary.Therefore, the goal of this study is to evaluate various methods of estimating intermonthly and IAV in a reliable way using a long-term, consistent database.Specifically, our objective is to determine an optimal metric or metrics for relating wind-speed variability to energy-production variability.We describe the wind-speed and energy-generation data, the methodology, and the chosen variability metrics in Sect. 2. We evaluate different variability measures via two case studies in Sect.3. We also contrast the results computed from monthly mean and annual-mean data, and we illustrate the spatial distribution of wind-speed variability in Sect.3. We then recommend the best practice in using the ideal method in Sect. 4. We focus on the applicability of imposing such metrics to quantify the variabilities of wind speeds and windenergy production.

Wind and energy data
In this study, we use a 37-year time series of monthly mean wind speed and monthly total wind-energy production in the contiguous United States (CONUS).For wind speed, we use hourly horizontal wind components in the National Atmospheric and Space Administration's Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), reanalysis data set (Gelaro et al., 2017;GMAO, 2015) from 1980 to 2016.We use these components to derive the monthly mean wind speed at 80 m above the surface, which represents hub height in this study, via the power law (Eq.1) and the hypsometric equation (Eq.2): In Eq. (1), u (z 1 ) and u (z 2 ) are the horizontal wind speeds, at heights z 1 and z 2 , in which wind speeds are the square root of the sum of squared horizontal wind components, and α is the shear exponent.In Eq. ( 2), R d is the dry air gas constant, T is the average temperature between levels z 1 and z 2 , and p 1 and p 2 are the atmospheric pressures at z 1 and z 2 .In most grid cells, we use the MERRA-2 meteorological output at 10 and 50 m above the surface to calculate α, so as to extrapolate the wind speed at 80 m.In mountainous regions, the heights at 850 or 500 hPa may be closer to 80 than 10 m above the surface; in that case, we use data at the next available level of 850 or 500 hPa to derive the heights of that level and thus to extrapolate the wind speed at 80 m.The horizontal resolution of the MERRA-2 is 0.5 • in latitude (about 56 km) and 0.625 • in longitude (about 53 km).The MERRA-2 reanalysis interpolates the data and the metadata at the exact output latitude and longitude; hence the wind speed, air density, and elevation refer to the grid points with the particular sets of latitude and longitude (Bosilovich et al., 2016).Thus, the longest distance between a wind farm and the closest MERRA-2 grid-cell center is about 39 km.
For energy-production data, we use the net monthly energy production of wind farms in megawatt hours (MWh) from the US Energy Information Administration (EIA) between 2003 and 2016.Each of the wind farms has a unique EIA identification number.After we leave out about 300 wind sites with incomplete or substantially zero production data, a total of 607 wind farms in the CONUS are selected for this analysis.For simplicity, the CONUS in this analysis is defined as the area bounded by 127 • W, 65 • W, 24 • N, and 50 • N, and geographically includes the 48 states in CONUS and Washington, D.C. (Fig. 1).

Linear regression and data post-processing
We focus on the direct relationship between wind speed and energy production to investigate approaches for calculating long-term variability.Therefore, we must minimize the influence from other determinants of energy production, such as curtailment and maintenance.First, we eliminate data with zero values for monthly energy production, which is typical in the first months of a new wind farm.Next, we linearly regress the monthly total energy production on the monthly mean MERRA-2 80 m wind speed at the closest grid point to each wind farm from 2003 to 2016.In other words, each wind site is assigned its own regression equation.We then remove any production data below the 90 % prediction interval to exclude underproduction for reasons other than low wind speeds, and omit the data above the 99 % prediction interval, or potentially erroneous overproduction.Prediction intervals are calculated via the t values and the standard error of prediction (Montgomery and Runger, 2014).In other words, we define the outliers of energy production using the threshold of 1.64 times below the standard error and 2.58 times above the standard error of the site-specific regression.We also apply a third-order polynomial fit (Archer and Jacobson, 2013), and it leads to very similar results to the linear model.Hence, we focus on presenting the results from the linear fit in this study.
After regressing the outlier-free energy data on wind speed, we then filter the wind farms based on the coefficient of determination (R 2 ), which indicates the confidence of the  2).The grey box illustrates the boundary of the CONUS used in this study.linear regression.We select the R 2 threshold of 0.75: 349 of the original 607 wind farms pass this filter.Through this filter, we ensure that wind speed is the primary driver of energy production in the wind farms with high R 2 values.Lunacek et al. (2018) also use a similar R 2 -filtering method with a threshold of 0.7.Considering some farms lack years of complete generation data, we extend the monthly energy production to 37 years using the same site-specific linear models with the monthly MERRA-2 wind speed.In other words, we compute any missing energy-production data from 1980 to 2016 based on the linear fit from the years that do exist in the data set.Herein, we refer to this long-term extension of data as the predicted energy production.Of the 349 wind farms, 7.5 years is the median of the energy data that are derived via the linear fit, given the available EIA records between 2003 and 2016.
We then further apply a second filter using the Pearson's correlation coefficient (r) between the predicted and actual monthly energy production, and we only choose the 195 wind farms with r larger than 0.8.As a result, of the r-filtered wind sites, we ensure wind speed is the primary driver of wind-power production, and we confirm the energy predictions match well with those observed.
The nonfiltered, R 2 -filtered, and r-filtered wind farms carpet most of the popular wind farm regions across the CONUS (Fig. 1), even with the high r threshold of 0.8.Thus, the r-filtered samples provide a sufficient representation of the wind farms across the United States.To illustrate our analysis with examples, we select one site in Oregon (OR) and another site in Texas (TX) that demonstrate distinct windspeed distributions.We choose the two sites to contrast the results of different variability metrics throughout the paper; both sites pass the r filter (Fig. 1).
Recognizing that the horizontal resolution of the MERRA-2 data could be perceived as undermining the linear regressions, we explore any possible role of the distance between the closest MERRA-2 grid point and the actual wind farm, but we find no statistical relationship.In particular, horizontal and vertical discrepancies between the model and the observations do not affect the resultant R 2 in the linear regressions.More than half of the 607 wind farms pass the R 2 filter, and more than half of those pass the r filter (Fig. 2a).Additionally, the correlation between R 2 and the horizontal distance between the closest MERRA-2 grid point and the actual wind farm is close to zero (Fig. 2b); the correlation between R 2 and the vertical difference between the modeled grid point and the actual wind site is also weak (Fig. 2c).In other words, the horizontal and vertical distances between the MERRA-2 grid points and the wind farms have no apparent impact on the representativeness of the wind farms in the linear regression.
Additionally, we analyze the uncertainty of the linearregression method.We first test the influence of the error term in the regression, to account for the uncertainty associated with the input data.After a wind farm passes the R 2 threshold of 0.75, we add a random value within 1 standard error to the predicted energy production of each month.This random error term introduces uncertainty to the regression process but does not affect the R 2 of the site-specific regression.Furthermore, we also test the sensitivity of the R 2 and r thresholds by analyzing the results after modifying those limits.Specifically, we loosen the R 2 and r thresholds to 0.6 and 0.7, and we tighten the R 2 and r thresholds to 0.85 and 0.9.Loosening these thresholds increases the sample sizes of the wind farms that pass the filters and tightening the thresholds results in the opposite.
We test other factors that could undermine these regressions.We considered the hub-height air density extrapolated from MERRA-2 as another regressor in the regressions, but air density is a statistically insignificant predictor and thus is not discussed in the rest of this study.When we replace the prediction interval with the confidence interval, the sample sizes increase from 349 and 195 sites to 555 and 209 wind farms.However, at least 7 years of energy data are derived from the regression for 99 % of the samples, because confidence intervals are smaller than prediction intervals by definition.We also considered removing the long-term means and the impacts of annual cycles, yet the sample sizes decrease to 121 and 69 locations, and the regression fills at least some of the energy data for more than 99 % of the sites.Finally, to ensure these results were not specific to the MERRA-2 data set, we perform the same analysis on the ERA-Interim reanalysis data set (Dee et al., 2011).The results of the key variability parameters such as σ , CoV, and RCoV resemble the findings using MERRA-2; hence we focus on the MERRA-2 findings in this study.
Our analysis, although comprehensive, is constrained by the quality of our data.On the one hand, reanalysis data sets have errors and biases in wind-speed predictions from complexities in elevation and surface roughness (Rose and Apt, 2016).Reanalysis data sets also demonstrate long-term trends of surface wind speeds (Torralba et al., 2017).The MERRA-2 data set can also depict different meteorological environments than those at the wind farm locations, especially in complex terrain.The MERRA-2 data of coarse temporal and spatial resolutions may also represent a lower intermonthly or IAV than the wind sites actually experience.Thus, regressing actual energy production on reanalysis wind speed adds uncertainty to our analysis.On the other hand, constrained by the monthly total energy-production data from the EIA, our analysis ignores the signals finer than monthly cycles.The quality of the EIA data also varies across wind sites; therefore the filtering process via linear regression is necessary.

Variability metrics relating wind speeds and energy production
To evaluate the variabilities of both the wind speeds and the predicted energy generation from the filtered wind farms, we investigate a total of 27 combinations and variations of existing methods describing the spread of data.We categorize different variability metrics according to statistical robustness (insensitivity to assumptions about the data; for example, Gaussian distribution) and statistical resistance (insensitivity to outliers) (Wilks, 2011).Of the 27 variability methods tested, we select four representative measures to perform a comparison and discuss in detail, according to their robustness, resistance, and the nature of normalization by an average metric: 1. RCoV, defined as the MAD divided by the median (Gunturu and Schlosser, 2012;Watson, 2014), is a spread metric divided by an average metric and is both statistically robust and resistant.
2. Range (maximum minus minimum) divided by trimean (weighted average among quartiles) is a spread metric normalized by an average metric, and the numerator is not resistant.
3. CoV (Baker et al., 1990;Bodini et al., 2016;Hdidouan and Staffell, 2017;Krakauer and Cohan, 2017;Rose and Apt, 2015;Wan, 2004), defined as the σ divided by the mean, is a spread metric normalized by an average metric, and neither the denominator nor the numerator are robust or resistant.
4. σ is simply a spread metric that is not robust or resistant.
Among the four measures, only RCoV is completely statistically robust and resistant, and the first three methods are all normalized spread metrics.We further describe all the tested variability methods comprehensively in Table B1 in Appendix B. Each of these metrics is easy to implement via basic Python packages such as NumPy and SciPy with no more than a few lines of code.In addition, based on the exponential scaling relationship between power and wind speed developed by Bandi and Apt (2016), we also analyze the results from the exponential CoV and the exponential RCoV in this paper (Table B1).
In addition to calculating variabilities with the spread measures, we evaluate other diagnostics that describe distribution characteristics.These diagnostics include averaging metrics, such as the arithmetic mean (not resistant) and median (the 50th percentile, which is resistant); symmetry metrics, such as skewness (involving the third moment, not robust or resistant) and the Yule-Kendall Index (YKI, robust and resistant); a tailedness metric, namely kurtosis (involving the fourth moment, not robust or resistant); the Weibull scale and shape parameters (not robust); and the autocorrelation with a 1-year lag to dissect the interannual cycles.We summarize the diagnostics evaluated in this analysis in Table B2.Along with the regression results, results from the four representative variability metrics and other distribution diagnostics demonstrate differences between the two selected sites (Table 2).
Herein, we quantify the variabilities of the 37-year extended time series of wind speed and energy production via different methods, using a range of time frames: 1 year, 2 years, and up to 37 years for each wind farm.A metric is considered useful when the resultant wind-speed variability correlates well with the resultant energy-production variability across wind farms, even when random errors are im-plemented and the thresholds R 2 and r are changed.In this analysis, we compare results with three correlation metrics: Pearson's r, Spearman's rank correlation coefficient (r s ), and Kendall's rank correlation coefficient (τ ) (Table 1).
To assess the applicable time frames of various variability metrics, we evaluate the asymptote period of correlations for each method.In most cases, the correlation coefficients approach the 37-year value after a certain analysis time frame.Using RCoV as an example, the Pearson's r's of shorter analysis periods (1-year, 2-year, etc.) gradually converge to the 37-year value at 0.856 as the RCoV-calculation time frame expands (Fig. 5a).Hence, for each metric, assuming the 37year correlation coefficient represents the long-term correlation, we calculate the normalized differences between the correlation coefficients and the 37-year value in each time frame, starting from 1 year.When the absolute mean of the normalized differences drops below 0.05 in a particular year, we determine that year as the length of data required for reliable results via that variability method.In other words, the asymptote year of a certain metric illustrates that the error of the resultant correlation between wind-speed and energyproduction variability via that data length is less than 5 % from the long-term value.For example, the asymptote period of RCoV correlations is 3 years according to Pearson's r (Table 3).
Table 1.Details of the three correlation metrics applied, adapted from Wilks (2011).All three metrics yield values between −1 and 1.

Correlation metrics Robust and resistant Description
Pearson's correlation coefficient (r) No Calculate the covariance of x and y, divided by the product of σ 's of x and y.
Spearman's rho, or Spearman's rank correlation coefficient (r s ) Yes Transform x and y values into ranks within x and y themselves, then calculate the covariance of ranks in x and y, divided by the product of σ 's of ranks in x and y.
Kendall's tau, or Kendall's rank correlation coefficient (τ ) Yes Match all data pairs between x and y, with n(n−1) 2 matches possible with a sample size of n.Define concordant pair as both x 1 larger than x 2 and y 1 larger than y 2 , or both x 1 smaller than x 2 and y 1 smaller than y 2 .Define discordant pair as either x 1 larger than x 2 and y 1 smaller than y 2 , or x 1 smaller than x 2 and y 1 larger than .
To relate the IAVs between wind speed and energy production, we also perform the same analysis for annual-mean data.Strictly speaking, calculating the variabilities using monthly mean data yields intermonthly variabilities, because the results account for monthly, seasonal, and annual signals.
To isolate the signals from interannual variations, we also examine the metrics and their correlations between the annual means of hub-height wind speeds and energy production, after linear regressing and filtering via monthly data.However, the samples from each site are then limited to 37 data points of annual wind speed and energy production.Besides, selecting de-trended data from long-term means to calculate variabilities and their correlations leads to trivial results because of the small sample sizes and hence is omitted in this study.

Investigation of wind-speed RCoV
After we demonstrate that RCoV is the most systematic approach in linking wind-speed and energy-generation variabilities in Sect.3.2, we further examine the details of using RCoV, specifically determining the minimum length of wind-speed data necessary to quantify variability effectively.We use 37 years of wind speed in every MERRA-2 grid cell in the CONUS (a total of 5049 grid points), and we calculate the RCoVs with 1 to 37 years of data for each grid cell.Because the RCoVs calculated using data between 1980 and 2016 are only samples of the true long-term wind-speed variability and hence the results involve uncertainty, we select a confidence interval approach.
We assume that the distribution of RCoV is Gaussian with infinite years of wind speed.Hence, we use a chi-square (χ 2 ) distribution to set bounds for the σ 's from samples of RCoV.In other words, because the derived RCoVs differ with the years of wind speeds sampled, we use the χ 2 distribution to quantify the confidence intervals of RCoV for each sample size.To determine the minimum data required for RCoV calculation, we use the following criterion (Montgomery and Runger, 2014): where σ 37 is the predetermined 37-year σ of RCoV; n i is the sample size of n years in year i, which is between 1 and 36 years; σ 2 i is the variance of the sample of RCoVs in year i; and χ 2 α/2,n i −1 is the percentage point of the χ 2 distribution given the confidence level of α and the degrees of freedom of n i − 1.We select a pair of α levels, 90 % and 95 %; hence we use four percentage points of the χ 2 distribution at 0.025, 0.05, 0.95, and 0.975 to construct the respective confidence intervals.Because the 37-year RCoV is an estimate of the truth, which is the wind-speed RCoV of infinite years, its singular value does not yield any variance or possess any distribution shape.Thus, to construct the confidence interval of the σ of the truth, we set the predetermined σ 37 as a fraction of the 37-year RCoV.Particularly, the σ 37 's are 10 % and 5 % of the 37-year RCoV for the 90 % and 95 % confidence levels, respectively.
In summary, for each grid point, we first determine an uncertainty bound based on the 37-year wind-speed RCoV of the location: we assign a 37-year σ , which is either 5 % or 10 % of the 37-year RCoV and, depending on the confidence level, has either a 95 % or 90 % confidence level.For each year i, from 1 to 37 years, we calculate the pairs of χ 2 -derived σ 's of year i, which represent the lower and upper bounds of the confidence interval.When both of the χ 2derived σ 's become smaller than the predetermined 37-year σ , year i becomes the minimum length of data required to calculate RCoV effectively at the specific confidence level.We analyze the wind-speed RCoV via both monthly mean and annual-mean wind speeds.We label the resultant minimum length of wind-speed data based on the χ 2 method as the convergence year, in contrast to the asymptote period which determines the asymptote year of correlation coefficients.

Case studies: Oregon and Texas sites
We select two sites from two different geographical regions with considerable wind-energy deployment, the southern Plains and the Pacific Northwest in the United States, to contrast the results of various variability metrics.Based on the site-specific regressions, we extend the monthly energyproduction time series to 37 years (Fig. 3a and b) for the two sites.Both sites pass the R 2 filter at 0.75 and the r filter at 0.8.Although the OR site is farther from the closest MERRA-2 grid point in a region with more complex terrain, the resultant R 2 (0.87) and predicted-actual-energy Pearson's r (0.91) are larger than those of the TX site (0.79 and 0.81, respectively) (Table 2).The 37-year-average wind speed of about 7.6 m s −1 at the TX site is larger than that of the OR site at about 6.8 m s −1 (Table 2).Additionally, the 12-month-lag autocorrelations demonstrate that the annual cycle of monthly wind speeds of the TX site is stronger than that of the OR site, yet the autocorrelations of the sites, 0.53 and 0.32, are still lower than the CONUS median of 0.58 (Table 2).None of the monthly and annual wind-speed distributions of the sites are perfectly Gaussian.According to the kurtosis, skewness, and YKI values of the monthly mean wind speeds (Table 2), the monthly wind-speed distribution at the OR site skews towards lower wind speeds with more and stronger extremes (Fig. 3c).The skewed distribution at the OR site leads to 71.2 % of the monthly wind speeds located within 1σ from the mean, compared to the classic Gaussian of 68.3 %.Nevertheless, although the TX site monthly windwww.wind-energ-sci.net/3/845/2018/Wind Energ.Sci., 3, 845-868, 2018 speed distribution is very close to symmetric with fewer outliers (Fig. 3d), which is supported by near-zero skewness and YKI (Table 2), only 64.6 % of monthly data fall within 1σ from its mean.For annual-mean wind speeds, the averaging with a 12-month time span at both sites reduces the ranges and thus leads to kurtosis close to −1 (Table 2).Although the skewness and YKI are close to 0 (Table 2), only 59.5 % and 56.8 % of the annual-mean wind speeds fall within 1σ from the means of the OR and TX sites, respectively.The four selected variability methods yield similar resultant monthly variabilities that are close to the respective CONUS medians based on the 37-year monthly data.For variabilities of monthly wind speeds, the differences between the two sites are slight because the comparison among the results of the four metrics is inconclusive (Table 2): the monthly variabilities are not far from the national medians (Table 2).However, results from the normalized spread metrics (RCoVs, range divided by trimean, and CoV) using the 37-year and the observed energy production illustrate that the OR site generates more variable wind power than the TX site (Table 2).The magnitudes of the variabilities between the 37-year and the actual monthly energy production are also comparable, and the discrepancies between them are larger at the TX site than the OR site.Nonetheless, the predicted and the observed monthly energy production of the two sites demonstrate similar variability characteristics overall.
Moreover, when we apply the four selected methods to the annual-mean data, the metrics describe IAV exactly.For both variables, wind speed and energy generation, nearly all met-rics illustrate that the OR site has stronger IAV than the TX site, except for using σ to quantify energy-production IAV (Table 2).Echoing the results of the monthly data mentioned previously, the use of normalized metrics suggests the energy production at the OR site varies more than that at the TX site, intermonthly and interannually.Note that all the IAVs are smaller than the variabilities calculated using monthly data (Table 2), because the annual averaging collapses variations in the data.
Additionally, the magnitudes of energy variabilities and IAVs are also nearly or more than twice as large as those of wind speed (Table 2).The reason is the nature of the power curve: wind-power generation is a function of wind speed cubed at wind speeds below rated.Therefore, small windspeed variations propagate into large energy-production fluctuations that are discernible in monthly and yearly data.

Variability metrics comparisons
Matching the wind-speed and energy variabilities over 37 years at each r-filtered site, RCoV, as a statistically robust and resistant metric, yields the highest Pearson's r (0.86) among the four highlighted methods as well as all the variability metrics evaluated (Fig. 4 and Table B1).A perfect variability measure would link wind-speed and wind-power variations closely together with a correlation of unity, and so RCoV, with the highest Pearson's r, is the best of all.On the one hand, a strong correlation between the wind-speed RCoV and the energy-production RCoV implies that the high windspeed variability at a wind farm translates to high energy- generation variability, and vice versa (Fig. 4a).For instance, the moderate 37-year wind-speed RCoVs of the OR and TX sites indicate modest fluctuations in energy production between months (Fig. 4a).On the other hand, a nonresistant method, range divided by trimean, leads to a lower r (0.64) and suggests the OR site has variable wind speed and energy production (Fig. 4b).For the other two nonrobust and nonresistant methods, the CoV results in a modest r (0.70) with a similar scatter as the RCoV (Fig. 4c); the σ , not normalized by an average metric, does not relate wind-speed and energy variabilities effectively (Fig. 4d).The positions of the two wind farms relative to the rest of the sites in Fig. 4 illustrate that the TX site experiences average variabilities in wind resource and energy production, whereas the OR site has above-average energy-generation variability.Overall, the four methods lead to different representations of energy variability at the OR site.
By increasing the years included in the variability calculations using monthly data, the resultant correlations of most metrics vary less, the correlations gradually converge to their 37-year values, and their asymptote periods vary.The 37-year Pearson's r values from the four selected metrics between wind-speed and energy-production variabilities in Fig. 4 transform into the 37-year marks in Fig. 5, and we use a 5 % threshold of normalized deviation to determine the asymptote periods.Particularly, the r's from RCoV and CoV (Fig. 5a and c) reach their respective asymptotes steadily with longer length of data, whereas the r's from range divided by trimean do not (Fig. 5b).The 37-year correlation using σ is weak and thus the method is not actually useful: while the r's approach the 37-year benchmark (Fig. 5d), this correlation value is so low (0.2) as to be ineffective.Paired with a high long-term r, the asymptote period of a metric indicates the appropriate time span of wind-speed data required to represent the variability of wind-energy production.For example, the resultant r's using RCoV approach a high value after just 3 years, meaning one needs 3 years of windspeed data to estimate the wind-speed variability so as to adequately infer the energy-production variability of a certain or potential wind farm via RCoV.
The three correlation coefficients (Pearson's r, Spearman's r s , and Kendall's τ ) yield consistent results among all variability metrics tested; hence we primarily present the results using Pearson's r here.Table 3 summarizes the 37-year correlations (r, r s , and τ ), between the wind-speed variabilities and the energy-production variabilities using the r-filtered data, and the respective asymptote periods of the methods.The r and τ of RCoV are the largest (0.86 and 0.67, respectively) among all variability metrics, and the associate asymptote periods are also relatively short (2 to 3 years) (Table 3).Another normalized, robust, and resistant spread metric, interquartile range (IQR) divided by median, results in Each r represents the correlation using all the filtered sites of a particular time frame.The 37-year correlations are equal to the r values listed in Fig. 4. The box and whiskers represent the third quartile plus the 1.5 times of interquartile range (IQR), the third quartile, the median, the first quartile, and the first quartile minus the 1.5 times of IQR. the highest r s , and the r s of RCoV is the second largest (Table 3).More importantly, the asymptote periods of RCoV are the smallest of all, regardless of the choice of correlation coefficient.In other words, fewer years of data are necessary to calculate RCoV to effectively relate wind-speed and energy variabilities than any other metric.Overall, when a spread metric yields strong correlations between variabilities of wind speed and energy generation, the correlation metrics agree with each other (Table 3).Therefore, the results in this paper focus on Pearson's r, which is a commonly used correlation coefficient.
In addition to the spread metrics, other distribution diagnostics also yield strong correlations between the 37-year monthly wind speed and energy production.For example, kurtosis and skewness result in r and r s above 0.9.Because we determine the asymptote periods based on normalized deviations, when the 37-year correlation benchmark of a metric is high, the respective asymptote period tends to be shorter.Therefore, only 1 year of monthly data is required to compute kurtosis and skewness adequately, except for using r s in kurtosis, where those r s 's of the smaller number of years are low (Table 3).Moreover, the symmetry and the shape of the energy-production distribution can be characterized using wind-speed data, given the moderately strong correlations of YKI and the Weibull shape parameter (Table 3).
Additionally, we also perform the same correlation and asymptote analyses on the data from changing the R 2 and r filter thresholds as well as the data with random error, and RCoV again yields the strongest correlations and the shortest asymptote periods among all methods.We adjust the R 2 and r requirements in the linear-regression process, thus changing the filtered sample sizes.On the one hand, reducing the R 2 threshold to 0.6 and the r threshold to 0.7 increases the respective sample sizes to 461 and 306 wind farms, but weakens the correlations between wind-speed and energy variabilities for all methods (Table B3).On the other hand, increasing the R 2 threshold to 0.85 and the r threshold to 0.9 strengthens the wind-speed-energy correlations of all the metrics and shrinks the sample sizes to 212 and 83 wind farms, respectively (Table B3).Modifying the filtering thresholds leads to different r's yet similar asymptote periods among all metrics.Moreover, we also test the vigorousness of our findings by introducing an error term, randomized based on the standard error, in predicting the 37-year energy production.The error term adds uncertainty to resemble the reality of noisy wind-speed and power-production data.We introduce the error term to the predicted energy production for each of the 349 wind farms that pass the original R 2 threshold of 0.75.This approach weakens the correlations and lengthens the asymptote periods for most metrics (Table B3).Overall, according to the results from the R 2r threshold and the random error tests, RCoV yields the highest r's among all methods, and its asymptote periods remain reasonably short.Further, normalized and simple spread metrics yield different relative wind-speed variabilities between wind sites.On the one hand, the correlations coefficients between 37-year monthly mean wind-speed RCoV and CoV, two spread metrics that are normalized by average metrics, are nearly unity (Fig. 6a).The comparison between two simple spread metrics, MAD and σ , results in correlation coefficients close to 1 also (Fig. 6d).The relative positions of the OR site highlight the differences between Fig. 6a and d: compared to other wind farms, the OR site has moderate wind-speed RCoV and CoV, but small MAD and σ .Compared to Fig. 6a, the lower r s and τ in Fig. 6d illustrate that MAD and σ can misrepresent the relative wind-speed variabilities of a wind site.On the other hand, the results between a normalized spread metric (RCoV and CoV) and the respective simple spread metric (MAD and σ ), which is also the numerator of the normalized spread metric, lead to weaker correlations (Fig. 6b and  c).The r, r s , and τ between 37-year monthly wind-speed RCoV and σ are 0.684, 0.738, and 0.579, respectively (not shown).The wind sites with slower average wind speeds and thus disproportionately larger normalized spread results cause the deviations from perfect correlations in Fig. 6b and   c.Therefore, normalized spread metrics, which account for the differences in wind-speed magnitude, become advantageous over simple spread metrics in comparing variabilities of wind sites.Note that we demonstrate similar comparisons between wind-speed spread metrics via annual-mean data in Fig. A2 (Appendix A).
Meanwhile, using annual-mean data to compute IAVs can lead to misleading interpretations.Scatterplots of the 37-year wind-speed and energy IAVs similar to Fig. 4 are illustrated in Fig. A1, via the same 195 r-filtered sites.The correlations via yearly averages are generally weaker except for a few metrics, including range divided by mean, which yields the largest r of all (Table B4).However, the 37-year correlations do not adequately represent the long-term values (Table B4), so even though the resultant asymptote periods are longer than those using monthly data, the asymptote analysis method is unsuitable for annual data.Moreover, using annual averages greatly limits the sample size at each site even with 37 years of hourly wind-speed data.Statistically, a smaller sample leads to a smaller spread of that distribution.Accordingly, with few years of data, small spreads in annualmean wind speeds result in a tight cluster of IAVs among all www.wind-energ-sci.net/3/845/2018/Wind Energ.Sci., 3, 845-868, 2018 the wind farms.Therefore, the compact collection of windspeed and energy-production IAVs causes strong correlations, solely because of the small number of annual averages used in the IAV calculation.Thus, the correlations via annual means demonstrate a downward trend with increasing length of data, regardless of the variability metrics chosen (Fig. 7).
Although the correlations approach the 37-year values, the weakening correlations with more years included in the IAV calculations imply that using less data is preferred in connecting the two IAVs.Note that the spread cannot be computed with one data point and hence the correlations between wind-speed IAVs and energy IAVs do not exist with a single year of data (Fig. 7).Overall, the asymptote analysis causes deceptive results, and, given the nature of the annual data, we cannot determine the sufficient length of data to effectively link the IAVs of wind speed and energy production.In other words, relating wind-speed IAV and energy-generation IAV with annual-mean data is flawed.

Wind-speed RCoV calculation and spatial distribution
Now that we have established that RCoV is a powerful and accurate way to relate wind-speed and energy-generation variations, we assess the required amount of data to calcu-late the RCoV of wind speed.We compute the site-specific RCoVs using different spans of monthly mean wind speeds, including the OR and the TX sites (Fig. 8).The variations of RCoVs decrease as more years are included in the calculations, and for each location we use the 37-year wind-speed RCoV as the long-term benchmark.For example, the 37-year wind-speed RCoV of 0.082 at the OR site means that the median among the absolute deviations from the median is 8.2 % of the median monthly mean wind speed (Fig. 8a and Table 2).We determine the 37-year σ 's as 10 % and 5 % of the 37-year RCoV, and we apply the χ 2 approach at 90 % and 95 % confidence levels, respectively, to derive the convergence years, or the minimum length of wind-speed data required to calculate RCoV effectively.The convergence years of the OR and TX sites are 12 and 25 years with a 90 % confidence, and 20 and 31 years with a 95 % confidence, respectively (Table B5).In other words, for the OR site, one needs 12 years of monthly mean wind speeds to compute RCoV with a 90 % confidence that the resultant RCoV is within a 10 % deviation from the 37-year RCoV.
To quantify the intermonthly variability of wind speed at a wind farm, RCoV requires 10 years of monthly wind-speed records with a 90 % confidence.In general, the σ 's of windspeed RCoVs across the CONUS decrease with more years Wind Energ.Sci., 3, 845-868, 2018 www.wind-energ-sci.net/3/845/2018/included in the RCoV calculation (Fig. 9a).For each grid point, the sample size of RCoV also becomes smaller, from 37 RCoVs of 1 year of data to 1 RCoV of 37 years of data, and hence the σ of RCoV decreases as the length of the analysis period of wind speed increases (Fig. 9a).With the σ 's of RCoVs across 37 years, we determine the convergence years via the χ 2 method.For a certain confidence level, the cumulative fraction of the CONUS grid cells that exceed the associated threshold of χ 2 -derived confidence intervals increases with the length of data (Fig. 9b).Among all of the MERRA-2 grid cells in the CONUS, the median convergence year is 10 years and the associated MAD is 3 years at a 90 % confidence level (Fig. 9b and Table B5).In other words, to assess the wind-speed variability via RCoV with a maximum of 10 % error from the long-term value and a 90 % confidence, one needs 10±3 years of monthly mean wind-speed records.
Moreover, raising the confidence level extends the minimum length of wind-speed data to compute RCoV.At the 95 % confidence level, the median convergence year is 20 years, and 2.5 % of grid points in the CONUS require more than 37 years of monthly mean data to calculate RCoV (Fig. 9b and Table B5).Additionally, using yearly mean wind speeds instead of monthly data to calculate RCoV requires much longer time to reach convergence.At a 95 % confidence, 33 years of annual-mean data is the average required length, and half of the CONUS grid points have convergence years of more than 37 years (Fig. 9b and Table B5).We also perform the same analysis on CoV and σ of wind speeds (Table B5).Although CoV and σ need fewer years to attain convergence, these nonrobust and nonresistant methods yield worse correlations between wind-speed and energyproduction variabilities than RCoV, and hence we focus on demonstrating the RCoV results.For each year, each box summarizes the σ from each MERRA-2 grid cell in the CONUS; (b) the time series of the cumulative fraction of grid cells in the CONUS that satisfies the threshold: when the pair of the χ 2 -derived σ 's from the grid cell, calculated using the particular amount of data, become smaller than the 37-year σ .The solid black, dash black, solid orange, and dash orange lines, respectively, indicate the minimum length of data: when the wind-speed RCoV using monthly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, when the wind-speed RCoV using monthly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level, when the wind-speed RCoV using yearly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, and when the wind-speed RCoV using yearly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level.Spatial distributions of wind-speed RCoVs across the CONUS identify locations with reliable wind resources.Based on the site-specific convergence years at a 90 % confidence level (Fig. 10a), we calculate the RCoVs with monthly mean wind speeds of the particular time spans at each grid point and normalize with the CONUS median (Fig. 10b).Regions requiring long wind-speed records are irregularly scattered across the continent, such as the Northeast, the Dakotas, and Texas.The mountainous states generally illustrate high RCoVs, including the Appalachians and the Rockies.Given the strong correlations between the wind-speed RCoV and energy-production RCoV, Fig. 10b offers a realistic estimation of the general spatial pattern of the variability in windenergy production as well.Note that, qualitatively, Fig. 10b is similar to the maps of wind-speed variability in Fig. 13a of Gunturu and Schlosser (2012) and in Fig. 3 in Hamlington et al. (2015), which also illustrate the variability of wind resources in the CONUS.In addition, using a 10-year fixed length of wind-speed data for all CONUS grid points to compute RCoV results in a nearly identical spatial distribution to the pattern in Fig. 10b.
Further, an ideal location for wind farms should exhibit ample wind speeds with low variability.We combine the spatial variations of the normalized RCoV and the long-term wind resource (Fig. 10b and c), and we differentiate regions according to the CONUS median RCoV and wind speed (Fig. 10d).Favorable candidates for wind farm developments have above-average wind speeds and below-average variabilities, such as the Plains, parts of the upper Midwest, spots in the Columbia River region, and pockets nears the coasts of the Carolinas; poor places for wind power with weak winds and strong variabilities include the Appalachians and most of the Northeast.
The convergence years in some CONUS grid points are beyond 37 years when we increase the confidence level from 90 % to 95 % (Fig. 9b and Table B5), and those grid points do not demonstrate any geographical pattern as in Fig. 10a.Additionally, when using RCoV to represent IAV, the spatial patterns of required data lengths and the resultant normalized RCoVs for annual data are notably different from the monthly mean results, and geographical features seem to be irrelevant (Fig. A3).Furthermore, the categorical features of CoV resemble those of RCoV for onshore wind resources in the CONUS, whereas using σ results in notably distinct classifications of CONUS wind resources (Figs.10d and A4).

Discussion
When using statistically robust and resistant variability metrics, higher correlations between variabilities of wind speed and energy production emerge.Statistically robust methods do not assume or require any underlying wind-speed distributions, and statistically resistant methods are insensitive to wind-speed extremes.Of all methods, three robust and re-sistant metrics, RCoV, MAD divided by trimean, and IQR divided by median, result in the largest three r's in Tables 3  and B1, which suggests that they are the most useful metrics to quantify long-term variability.Depending on the meteorological data availability, wind-speed characteristics, and terrain complexity, different methods are appropriate in different conditions.Nevertheless, robust and resistant methods are best able to relate wind-speed variability and energygeneration variability, and RCoV is the most effective of all the metrics.
Overall, of all the methods we considered, RCoV consistently yields the strongest correlations between wind-speed and energy variabilities and exhibits reasonable asymptote periods (Tables 3 and B1), even after accounting for random standard errors and modifying the R 2 and r thresholds (Table B3).In addition, assessing wind-speed RCoV with a 90 % confidence requires 10 ± 3 years of wind-speed data (Fig. 9 and Table B5), which exceeds the asymptote periwww.wind-energ-sci.net/3/845/2018/Wind Energ.Sci., 3, 845-868, 2018 ods of 2 to 6 years to yield strong wind-speed and energyproduction correlations (Table 3).Even though different locations require various spans of data (Fig. 10a), the average of the resultant RCoVs using 10 years of wind speeds leads to nearly identical spatial distributions (Fig. 10b).Therefore, to effectively quantify wind-speed variability and thus adequately derive energy-generation variability, we recommend using the RCoV with 10 years of monthly mean wind-speed data.
Annual-mean data are inadequate to relate wind-speed and energy-production IAVs or to represent wind-speed IAVs.We cannot determine the minimum years of data to relate annual wind-speed and energy IAVs because their correlations decline with the length of data (Fig. 7).Moreover, the coarse time resolution of annual averages smooths out the fluctuations of smaller timescales.Yearly mean wind speeds also possess different distribution characteristics, such as skewness and kurtosis, compared to those of finer temporal resolutions (Lee et al., 2018).The nonzero kurtosis and skewness in Table 2 and in Lee et al. (2018) illustrate that most of the distributions of annual-mean wind speeds in the CONUS are non-Gaussian.Hence, using nonrobust metrics, such as σ , to evaluate IAV with samples of annual means from non-Gaussian distributions can lead to incorrect representations of variability.
Additionally, extended years of wind-speed data are also necessary to compute RCoV and represent IAV (Fig. A3a), and the resultant IAVs (Fig. A3b) differ from the variabilities calculated via monthly wind speeds (Fig. 10b).For instance, the low IAVs in the Appalachians (Fig. A3b) calculated with yearly mean wind speeds contradict the pattern of high monthly mean wind-speed RCoVs in mountainous areas (Fig. 10b) as well as the findings in past research (Gunturu and Schlosser, 2012;Hamlington et al., 2015).Furthermore, some of the grid points require more than 37 years of yearly mean data to calculate wind-speed RCoV with statistical confidence (Fig. 9 and Table B5).Although RCoV does not yield the strongest 37-year r in relating wind-speed and energy IAVs, readers should be cautious when using a limited number of annual-mean data to derive IAVs.In short, to effectively assess the long-term variability of wind farm productivity, one should use wind speeds finer than yearly mean data.
Regions with ample wind resources and low variability favor wind-energy developments, coinciding with the locations of many existing wind farms in the CONUS (Fig. 10d).Wind farms in the Plains and parts of the upper Midwest benefit from the above-average wind speeds and the belowaverage wind-speed RCoVs.Other regions, such as parts of the Columbia River region and the Carolinas, also experience strong, consistent winds.The Northeast and the Appalachians are relatively unfavorable for producing a stable, onshore wind-energy supply, whereas the area east of Cape Cod in Massachusetts and the sections along the West Coast exhibit a promising offshore wind resource.Wind farm developers should account for wind resource as well as its long-term variability in repowering existing turbines and building new wind farms.
Furthermore, mathematically, a normalized spread metric, namely a spread statistic divided by an average metric, is more useful than solely a spread metric in assessing variability, and a normalized spread metric should always be presented with the corresponding averaging metric.For example, RCoV and CoV between wind speed and energy yield larger r's than MAD and σ (Table 3 and Fig. A1), and the r's between wind-speed RCoV and CoV are also higher than those comparisons involving MAD and σ (Fig. 6).For σ , the root mean square of the deviation from the mean is not statistically robust or resistant, and 1σ means the uncertainty is 18.3 % from the mean.Hence, CoV, or the σ divided by the mean, is the respective normalized uncertainty metric to σ .For instance, the wind-speed CoVs of both the OR and TX sites are about 0.13 (Table 2), implying the σ is 13 % from the mean.In contrast, using RCoV, or the MAD divided by the median, is a robust and outlier-resistant metric of normalized uncertainty.For example, the wind-speed RCoVs of the OR and TX sites are 0.08 and 0.09, respectively (Table 2), indicating the MADs are 8 % and 9 % from their median wind speeds.Even though RCoV is not as commonly used and not as intuitive as σ or CoV, RCoV is unrestricted by any underlying distribution assumptions.Overall, to correctly and effectively use the normalized spread metrics, both the normalized spread metric and the average value need to be stated clearly in pairs.In other words, the interpretation of "the variability is 2 %" oversimplifies the statistics of uncertainty quantification.Therefore, we recommend presenting both the RCoV and the median of a time series together in estimating variability.
Distribution diagnostics, other than the variability metrics, are also effective in identifying the characteristics of windenergy production.We examine distribution parameters resulting in strong wind-speed-energy correlations, including kurtosis and YKI (Tables 3 and B2), which assess the degree of deviations from a Gaussian distribution.For example, we confirm that the monthly and annual wind-speed distributions for our case studies in OR and TX are not perfectly Gaussian because of their nonzero kurtosis and skewness values (Table 2), as well as their portions of data within 1σ .Moreover, a multimodal or an asymmetric wind-speed distribution (Fig. 3c and d) also implies a non-Gaussian energyproduction distribution.Gaussian distribution is invalid for wind speeds across averaging timescales in general (Lee et al., 2018).Hence, understanding the underlying distribution of wind resources can validate the applications and the legitimacy of Gaussian statistics, especially in quantifying P50 and the associated losses and uncertainties.

Conclusions
Wind-speed variability is a crucial component in assessing the overall uncertainty of P50, which is the estimated average energy production of a wind farm.This study highlights the importance of using rigorous methods to estimate intermonthly and interannual variability.To search for suitable ways to quantify this uncertainty under different conditions, we investigate 27 combinations of spread metrics over 607 wind farms in the United States, with closer examination of two geographically distinct sites.We evaluate the methods for robustness to non-Gaussian distributions and resistance to extreme values, in contrast to the common practice of using only standard deviation (σ ).We calculate variabilities using monthly and annual mean wind speeds from the MERRA-2 reanalysis data set and wind farm monthly net energy production from the EIA.We find that within the contiguous United States (CONUS), statistically robust and resistant methods predict variabilities more accurately, particularly in that wind-speed variabilities strongly correlate with observed energy-production variabilities.
We recommend using the robust coefficient of variation (RCoV) to quantify variabilities of wind resource and energy production.RCoV, defined as the median of absolute deviation from the median wind speed divided by the median of the wind speed, is a robust and resistant spread metric, in contrast to σ .RCoV yields strong correlations consistently (a Pearson's correlation coefficient, or a Pearson's r, of 0.856 with 37 years of monthly means) in various sensitivity tests via different correlation coefficients, whereas σ does not.In other words, using RCoV, a wind farm with high wind-speed fluctuations also possesses high variations in wind-energy generations and vice versa, whereas other metrics do not reflect that relationship as effectively.RCoV, as a normalized spread metric, also leads to a more accurate depiction of wind-speed variabilities than σ , a simple spread metric.Contrary to the custom of displaying uncertainty in one percentage value, we advise users to assess both the RCoV and the median in estimating intermonthly variability.Moreover, depending on the location, on average 10±3 years of monthly wind-speed data are necessary to compute windspeed RCoV with a 90 % statistical confidence, such that the resultant RCoV deviates within 10 % of the long-term RCoV.
RCoV characterizes the spreads of the distributions of wind resources and wind-energy production.The relatively low monthly mean wind-speed RCoVs in the central United States indicate stable long-term wind resources, and the RCoV overall spatial distribution in the CONUS agrees with the findings from past research.Other distribution diagnostics, such as kurtosis and skewness, also result in strong correlations between monthly mean wind speed and energy generation, and thus they adequately represent energyproduction characteristics.
Because the long-term correlations between the windspeed and energy-production interannual variabilities (IAVs) are weak (a Pearson's r of 0.668 for RCoV with 37 years of data) and decrease with the length of data, we cannot determine the minimum length of annual mean data required for skillful assessment of IAV.Hence, we do not recommend calculating IAVs with annual-mean data.Although the concept of IAV has been essential in determining the annual energy production in the wind resource assessment process, annualmean wind speeds mask signals of finer temporal scales and thus lead to unreliable representations of long-term variability.Overall, uncertainty arises in the process of calculating IAVs based on limited samples, whereas RCoV yields credible intermonthly variabilities considering the adequate amount of monthly mean data.Now that we have highlighted the preferred structure of using RCoV, we can assess finer-scale variations using high-resolution wind-speed and energy-production data.With data of different temporal scales, the autocorrelation of wind resources and its relationship with long-term energyproduction variations can also be quantified.The influence of climatic cycles on energy production can be explored.Furthermore, applying the concept of RCoV to reduce the uncertainty of P50 and assist financial decisions can be beneficial to the industry.10a and b, but the data plotted are annual-mean wind speeds: (a) map of the convergence years, or years of wind-speed data required to derive a maximum of 10 % deviation from the 37-year RCoV at each grid point at a 90 % confidence level.Because 12.6 % of the CONUS grid points yield convergence years beyond 37 years using annual data (solid orange line in Fig. 9 and first column in Table B5), we assign 37 years as the convergence years for those grid points.After excluding the non-numeric values, the CONUS median is 27 years and the MAD is 4 years; (b) map of RCoV of annual-mean wind speed using the grid-cell-specific convergence years in (a), normalized using the CONUS RCoV median at 0.020.The RCoVs illustrated are averaged over (37 − convergence year + 1) available year blocks.The MAD of the normalized RCoV in the CONUS is 0.205.Table B2.Description of the distribution diagnostics tested, adapted from Wilks (2011) and the 37-year r's from the r-filtered monthly data.
Reason I: the metric is not robust because the metric possesses distribution constraints, for example, assuming a Gaussian distribution, and the metric is not resistant because outliers influence it; Reason II: the metric is not robust because it assumes a Weibull distribution.B3.As in Table 3, but with the calculated metrics, the associated correlations, and asymptote periods using different R 2 and r filters and adding the randomized standard error to predicted monthly total energy production.The sample sizes of the 0.7-r threshold test, the 0.9-r threshold test, and the random error test are 306, 83, and 195

Figure 1 .
Figure 1.Wind farm locations in the CONUS: nonfiltered 607 sites in dark red, R 2 -filtered 349 sites in orange, and r-filtered 195 sites in yellow.The yellow square represents the Oregon site and the yellow star indicates the Texas site (Table2).The grey box illustrates the boundary of the CONUS used in this study.

Figure 2 .
Figure 2. (a)Histogram of R 2 of all nonfiltered sites (dark red), R 2 -filtered sites (orange), and r-filtered sites (yellow); (b) scatterplot of the R 2 and the horizontal distance between the closest MERRA-2 grid cell and the actual locations of the sites using the same color scheme in (a); (c) scatterplot of the R 2 and the elevation difference between the closest MERRA-2 grid cell and the actual locations of the wind sites using the same color scheme in (a).The r in (b) and (c) represents the Pearson's r using all nonfiltered sites.

Figure 3 .
Figure 3. (a) Time series of MERRA-2 monthly mean 80 m wind speed (black), actual monthly net EIA energy production (lime), and extended monthly energy production from 1980 to 2016 based on linear regression (green) at the OR site; (b) time series at the TX site with the same annotations as in (a); (c) histograms of MERRA-2 monthly mean wind-speed distribution (black) and yearly mean wind-speed distribution (grey) at the OR site from 1980 to 2016.The blue curve indicates the Gaussian fit of the monthly mean wind speeds via the mean and the σ , and the cyan curve represents the Gaussian fit of the annual-mean data; (d) histograms and curves of the Gaussian fit of wind-speed distributions at the TX site with the same annotations as in (c).

Figure 4 .
Figure 4. Scatterplots of 37-year wind-speed variability and energy variability via four metrics: (a) RCoV, (b) range trimean , (c) CoV, and (d) σ , based on monthly data from the 195 r-filtered wind sites.Each black dot represents each filtered site, and the r value at the corner of each panel indicates the Pearson's r between each pair of wind-speed and energy-production spread metrics.The yellow square and the yellow star denote the OR and the TX sites, respectively.

Figure 5 .
Figure 5. Box plots of Pearson's r between wind-speed variability and energy variability for different analysis time frames, from 1 to 37 years: (a) RCoV, (b) range trimean , (c) CoV, and (d) σ , based on the monthly data from the 195 r-filtered wind sites.Each r represents the correlation using all the filtered sites of a particular time frame.The 37-year correlations are equal to the r values listed in Fig. 4. The box and whiskers represent the third quartile plus the 1.5 times of interquartile range (IQR), the third quartile, the median, the first quartile, and the first quartile minus the 1.5 times of IQR.

Figure 6 .
Figure 6.Similar to Fig. 4, but for scatterplots to compare 37-year wind-speed variability metrics: (a) RCoV and CoV, (b) RCoV and MAD, (c) σ and CoV, and (d) σ and MAD, based on monthly data from the 195 r-filtered wind sites.Each black dot represents each filtered site, and the r, r s , and τ at the corner of each panel indicate the Pearson's r, the Spearman's rank correlation coefficient, and the Kendall's rank correlation coefficient between each pair of wind-speed spread metrics.The yellow square and the yellow star denote the OR and the TX sites, respectively.

Figure 9 .
Figure9.(a) Box plots of σ 's of wind-speed RCoVs, where the RCoVs are calculated using monthly mean MERRA-2 data of 1 to 37 years.For each year, each box summarizes the σ from each MERRA-2 grid cell in the CONUS; (b) the time series of the cumulative fraction of grid cells in the CONUS that satisfies the threshold: when the pair of the χ 2 -derived σ 's from the grid cell, calculated using the particular amount of data, become smaller than the 37-year σ .The solid black, dash black, solid orange, and dash orange lines, respectively, indicate the minimum length of data: when the wind-speed RCoV using monthly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, when the wind-speed RCoV using monthly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level, when the wind-speed RCoV using yearly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, and when the wind-speed RCoV using yearly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level.

Figure 10 .
Figure 10.(a) Map of the convergence years, or years of monthly mean wind-speed data required to derive a maximum of 10 % deviation from the 37-year RCoV at each grid point, at a 90 % confidence level.The CONUS median is 10 years with the MAD of 3 years; (b) map of RCoV of monthly mean wind speed using the grid-cell-specific convergence years in (a), normalized using the CONUS RCoV median at 0.100.The RCoVs illustrated are averaged over (37 − convergence year + 1) available year blocks.The MAD of the normalized RCoV in the CONUS is 0.224; (c) map of the mean monthly wind speed at 80 m of 37 years from 1980 to 2016.The CONUS median is 6.45 m s −1 with the MAD of 1.03 m s −1 ; (d) map of wind resource and its variability, by summarizing (b) and (c) into four categories: regions with belowmedian wind speed and above-median RCoV (grey), regions with below-median wind speed and below-median RCoV (orange), regions with above-median wind speed and above-median RCoV (orange red), and regions with above-median wind speed and below-median RCoV (dark red), based on the CONUS median wind speed and RCoV.

Figure A3 .
Figure A3.As in Fig.10a and b, but the data plotted are annual-mean wind speeds: (a) map of the convergence years, or years of wind-speed data required to derive a maximum of 10 % deviation from the 37-year RCoV at each grid point at a 90 % confidence level.Because 12.6 % of the CONUS grid points yield convergence years beyond 37 years using annual data (solid orange line in Fig.9and first column in TableB5), we assign 37 years as the convergence years for those grid points.After excluding the non-numeric values, the CONUS median is 27 years and the MAD is 4 years; (b) map of RCoV of annual-mean wind speed using the grid-cell-specific convergence years in (a), normalized using the CONUS RCoV median at 0.020.The RCoVs illustrated are averaged over (37 − convergence year + 1) available year blocks.The MAD of the normalized RCoV in the CONUS is 0.205.
Figure A4.As in Fig.10d, but the spread metrics are (a) σ and (b) CoV, calculated using monthly mean wind speeds of 37 years.
r with its own past and future values Not applicable Not applicable -Table

Table 2 .
Site details, monthly means, and annual means of various metrics at the two selected sites based on 37 years of monthly and annual wind speeds, and 37 years of predicted and actual energy production; and the CONUS medians of wind-speed metrics using 37 years of monthly and annual-mean data.

Table 3 .
Correlations and the associated asymptote periods of wind-speed variability and energy variability using various spread methods and distribution diagnostics with different correlation metrics, based on the monthly data of the 195 r-filtered wind sites.

Table B4 .
As in Table3, but with the calculated metrics, the associated correlations, and asymptote periods using annual-mean wind speed and energy production using the 195 r-filtered sites.

Table B5 .
Convergence years based on the χ 2 approach of wind-speed RCoV (as in Figs.8 and 9), wind-speed CoV, and wind-speed σ , using monthly and yearly wind speeds.The calculations of median and MAD exclude the data with convergence years beyond 37 years in the CONUS.