Norwegian hindcast archive (NORA3)-A validation of offshore wind resources in the North Sea and Norwegian Sea

A new high-resolution (3 km) numerical mesoscale weather simulation spanning the period 2004-2018 is validated for offshore wind power purposes for the North Sea and Norwegian Sea. The NORwegian hindcast Archive (NORA3) was created by dynamical downscaling, forced with state-of-the-art hourly atmospheric reanalysis as boundary conditions. A validation of the simulated wind climatology has been carried out to determine the ability of NORA3 to act as a tool for planning future offshore wind power installations. Special emphasis is placed on evaluating offshore wind power-related metrics and the 5 impact of simulated wind speed deviations on the estimated wind power and the related variability. The general conclusion of the validation is that the NORA3 data is rather well suited for wind power estimates, but gives slightly conservative estimates on the offshore wind metrics. Wind speeds are typically 5 % (0.5 ms−1) lower than observed wind speeds, giving an underestimation of offshore wind power of 10 %-20 % (equivalent to an underestimation of 3 percentage point in the capacity factor), for a selected turbine type and hub height. The model is biased towards lower wind power estimates because of overestimation 10 of the frequency of low-speed wind events (<10 ms−1) and underestimation of high-speed wind events (>10 ms−1). The hourly wind speed and wind power variability are slightly underestimated in NORA3. However, the number of hours with zero power production (around 12 % of the time) is fairly well captured, while the duration of each of these events is slightly overestimated, leading to 25-year return values for zero-power duration being too high for four of the six sites. The model is relatively good at capturing spatial co-variability in hourly wind power production among the sites. However, the observed decorrelation length 15 was estimated to be 432 km, whereas the model-based length was 19 % longer.

The NORA3 data is so far covering the period 2004-2018 and will later be extended back to 1979 and also updated continuously. We will therefore focus on the period 2004-2018 in this study. For further details on the model set-up, the NORA3 generation process, and a comparison between NORA3 and ERA-5 see Haakenstad et al. (2021).
2.1.1 Offshore wind data sets 90 Table 1 compares five data sets covering the offshore areas surrounding Norway. NWP models are continuously updated and improved. The first three data sets ("BYR2009","BYR2010", and "BER2009") are approximately 10 years old, thus the output from the newer data sets NEWA and NORA3 has higher quality given the improvements in NWP models over the last 8-10 years.
In contrast to the older data sets, NEWA and NORA3 are created using improved reanalysis products as boundary infor-95 mation. At 31 km ERA-5 is able to resolve smaller scale atmospheric features than the data set from the National Centers for Environmental Protection (NCEP) (1 • resolution), and the downscaling process done by WRF and HARMONIE-AROME benefits from the increased resolution in the reanalysis. In addition to the improved boundary conditions, the longer time periods covered in NEWA and NORA3 contain longer-term variability and statistics, which are beneficial when dealing with fluctuating geophysical variables and making decisions about wind power installations.

100
Not all of the model data sets listed in Table 1 underwent a peer-review process (see column "Val"). NEWA uses met masts on land to validate the onshore model data, while the offshore data is validated at 10 meters above sea level (m.a.s.l.) using satellite data. In addition, a validation at 100 m.a.s.l. is conducted by extrapolating the 10-m data. In addition to the validation conducted by Haakenstad et al. (2021), the offshore data of NORA3 is validated using in situ observations near a typical hub height (100 m.a.s.l.). The near-hub-height validation of the NORA3 data minimizes the uncertainties introduced by power-law 105 extrapolation of the wind from near surface to hub-height.

The observational data
The observations used in the verification of NORA3 are hourly wind observations 2 from five oil-and gas platforms (Ekofisk, Sleipner, Gullfaks C, Draugen, and Heidrun) and one met-mast (Fino1) (see Fig. 1 for the location of the sites and Table 2 for further site information), retrieved from the Norwegian Meteorological Institute. The observational data was quality checked 110 prior to the validation of NORA3. For a detailed description of this quality check process see Solbrekke et al. (2020). In addition to the routine Solbrekke et al. (2020) describe we also exclude all records of zero-wind conditions (u=0) that are likely to be erroneous according to the following: where u obs (i, j) and u n3 (i, j) are the observed and modeled wind speeds, respectively, at hour i for site j. n is the total number of 115 hours; m is the total number of sites; and M AD is the mean absolute deviation between the observed and modeled wind speeds Table 1. Information for five offshore wind data sets covering the Norwegian continental shelf and the surrounding offshore areas.
"BYR2009" is the data set created by Byrkjedal and Åkervik (2009); "BYR2010" was produced by Byrkjedal et al. (2010); "BER2009" comes from Berge et al. (2009); NEWA is the New Europen Wind Atlas NEWA (2020); and NORA3 was created by Haakenstad et al. (2021). "Model" indicates the model name used in the creation of each data set, with the version of the model given in parentheseis. Information regarding the name of the reanalysis product (model), the corresponding horizontal resolution in kilometers (res) and the forcing update of lateral boundaries (int) are found under "Boundary". "Period" is time period covered by the data set. "Temp res" is the number of outputs stored per hour per variable, and "Horiz res" is the horizontal resolution, and "Vert res" is the number of vertical layers in the model. averaged over all sites. In other words, whenever the observed wind speed at hour i and site j is zero and the corresponding modeled wind speed exceeds 5M AD = 7.2 ms −1 , the observed value at hour i is excluded from the time series for site j. This additional quality control leads to the exclusion of up to 5 h of observations per site, except at Heidrun which excludes 58 h of observations. For Heidrun, the removal of these erroneous records of zero-wind conditions (u=0) corresponds to an exclusion 120 of approximately 0.035 % of the total data.

Wind speed interpolation
To avoid introducing additional uncertainties into the observational data set, we verify the wind variables from NORA3 at the wind sensor heights, ranging from 68-140 m.a.s.l., for each site (see "WSH" in Table 2 for the sensor heights). By contrast, the wind power verification is performed at a typical hub-height, at 100 m.a.s.l., to ensure the production estimate are comparable 125 between sites.
The interpolation of wind speed data to sensor height or hub-height is done using the power law relation (Emeis, 2018). The interpolated wind speed is sensitive to the choice of the power law exponent α. Usually, α is assigned based on assumptions about atmospheric stability and surface roughness, both of which can introduce erroneous results. However, the data from NORA3 allows us to calculate α for each time step (i). Rearranging the power law relation, we get the following expression for the power law exponent α: where the height-subscripts 1 and 2 corresponds to the two layers within which the wind shear is calculated. The heights used to calculate α depend on the wind-sensor height (WSH) at the site in question: if WSH < 100 m.a.s.l. then α is calculated using NORA3 wind-shear between the two model layers z 1 = 50 m.a.s.l. and z 2 = 100 m.a.s.l.. If WSH >100 m.a.s.l. then alpha is 135 calculated using the wind shear between z 1 = 100 m.a.s.l. and z 2 = 250 m.a.s.l.. The mean α for the whole time period for the six stations ranges from 0.05-0.08 between 50 and 100 m.a.s.l., and 0.03-0.06 between 100 and 250 m.s.a.l..

Normalized wind power
To ensure our validation results are as general as possible, and since the turbine park at each site is only imaginary and of unknown capacity, we use normalized power calculations P w (i) = P T w (i) P max w to validate the wind power potential at each site 140 (Solbrekke et al., 2020). P T w (i) is the produced wind power at each time step (i) for a given site, and P max w is the nameplate capacity. Hence, the normalized wind power P w (i) is defined as follows: where u(i) is the wind speed at hour i, u ci = 4 ms −1 is the cut-in wind speed, u r = 13 ms −1 is the rated wind speed, and u co = 25 ms −1 is the cut-out wind speed. These numbers were retrieved from the SWT-6.0-154 turbines used in Hywind,

145
Scotland -the first floating wind park in the world (AG, 2011).

Ramp rates
To validate the ability of NORA3 to capture the wind speed and wind power variability we calculate the ramp rates (R), defined as how much the wind speed (u) or wind power (P w ) changes during a time-increment τ (Milan et al., 2014): Table 2. Relevant information for the sites used in the validation of NORA3. "Abb" lists the site-name abbreviations. "Lat" and "Lon" are the latitude and longitude for the site locations, respectively. "WSH" (in meters above sea level) corresponds to the wind sensor height at each site. The sensor type is listed under "Sensor", and the data period for the available observations for each site is listed under "Data period". In addition, the percentage of valid observations is also shown under "Valid obs (%)".

Site information
Site setting τ to be 1, we validate the model performance on hourly ramp rates. To gain a general picture of the model performance in terms of how much the wind speed or wind power changes from one hour to the next we calculate the mean absolute ramp rate (MAR) for each site, for both the observational data and the modeled data. MAR is defined as follows:

155
where R(i) is the ramp rate at hour i, and n is the total number of hours.

Zero-event duration using extreme value theory
A wind turbine has an expected life time of approximately 20 years. If the right steps are taken, the lifetime can be extended 15% -25 % depending on whether the structure is bottom-fixed or floating (Wiser et al., 2016). This means that the lifetime is expected to increase to 23-25 years. Therefore, determining the duration of long-lasting shutdowns expected to happen during 160 the lifetime of a turbine is important for estimating the levelized cost of energy (LCOE). The 25-year return value of the duration of a zero-event (a period of zero wind power production), the corresponding confidence interval, and the p-values are calculated from the observations and the model data using two statistical methods, "block maxima" (BM) in which the data is fitted to a generalized extreme value (GEV) distribution using yearly values of maximum zero-event duration, and "peak over threshold" (POT) in which the data is fitted to a generalized Pareto distribution (for more information see Smith (2002)) using the 99th-165 percentile of zero-event duration (the highest 1 % of zero-event in terms of duration) as the selected threshold. We calculate the Kolmogorov-Smirnov p-value (KS p ) to test the null hypothesis. The Kolmogorov-Smirnov statistic quantifies the distance between the empirical and the theoretical cumulative distribution functions. Hence, the cumulative distribution function from the BM data (POT data) is compared to the cumulative distribution function from the GEV (Pareto) distribution. Thus, given a significance level of p=0.025, if the KS p value is small (KS p < p), the distance between the cumulative distributions is too 170 large, and we can conclude that the empirical data (BM or POT) was sampled from a different population than the theoretical GEV or Pareto distribution with a probability of 1-p.

Validation of wind speed
The NORA3 wind estimates are extensively validated against observations and compared against the ERA-5 reanalysis in Haakenstad et al. (2021). For offshore observations (six oil platforms, many the same as used in this study) the NORA3 wind 175 estimates were shown to be better than the wind estimates from ERA-5 for all months and for all investigated percentiles of wind speed. Monthly wind speed biases were typically reduced from 6-8 % to 3-5 %. The improvement was particularly pronounced for strong winds, where the bias was reduced from 10-20 % to 2-4 %, while the bias reduction for median winds typically was reduced from 7-8 % to 3-4 %. In addition, improvements in coastal winds influenced by topography were shown to be significantly larger than for the offshore stations.

180
Prior to exploiting NORA3 as a planning tool for future offshore wind power installations the model performance in terms of variables related to wind power also has to be validated and verified. We start with the validation of mean quantities and distributions of wind speed. The most relevant wind speed measures can be seen in Table. 3. Arithmetic mean (µ) is used as a measure of the average wind speed. The corresponding wind speed variability is given by the Weibull standard deviation (std). The Weibull std is used instead of the Gaussian std because of the shape of the wind speed distribution (see Fig. 2a for 185 an example wind speed distribution). Mean wind speeds (µ) for the six sites lie within the interval 10 − 12 ms −1 . For all the sites except Fino1, the observed mean wind speeds are higher than the wind speeds from NORA3, indicating that the model underestimates the mean wind speed. The largest difference can be seen for Sleipner where the observed mean wind speed is 8.9 % higher than the simulated wind speed. The wind speed at each site is highly variable, with the Weibull std (σ) for the observations varying from 4.7 − 6.5 ms −1 , where the model wind speed being slightly less variable (0-20 %). Hence, the 190 observed wind speed is somewhat more intermittent and variable than the modeled wind speed, indicating that HARMONIE-AROME is missing some of the high-frequency variability embedded in the wind field.
The Weibull scale parameter ("a" in Table 3) indicates height and width of the distribution. A larger scale parameter indicates a wider and lower probability distribution. In all but one site the observed scale parameter is slightly higher than the modeled; the modeled scale parameters are on average 3.74 % lower than the observed. In other words, the observations contain more 195 wind speed events at the tails of the Weibull distributions resulting in a larger scale parameter.
As all observed and modeled Weibull shape parameters ("b" in Table 3) are less than 2.6 the distributions are positively skewed, with a long tail to the right of the mean. The observed shape parameter is lower than the modeled (on average 7.6 % lower) indicating that the observed data is more positively skewed with a longer right tail, meaning that the observed data contain more high wind speed events than the NORA3 wind speed data.

200
According to Table 3 the model underestimates the wind speed at five of the six sites. Since the wind power production is a function of the wind speed cubed the wind power is highly sensitive to systematic deviations between the observed and simulated wind speeds. However, the sensitivity varies with wind speed and is especially strong within the interval between  Differences between NORA3 and observational wind speed probability density functions (∆pdf = pdf mod − pdf obs ) for the six sites. When ∆pdf = 0.01 the probability that the given wind speed will occur is 1 % higher in the model output. The large gray area corresponds to the range within which the rated wind speed usually falls. The gray vertical lines at the left and right mark the cut-in and cut-out wind speed limits used in this study, respectively.
10 https://doi.org/10.5194/wes-2021-22 Preprint. Discussion started: 1 April 2021 c Author(s) 2021. CC BY 4.0 License. Table 3. Statistical measures of the wind speed for the observations (Obs) and the model (n3). µ is the arithmetic mean while σ is the Weibull standard deviation. "a" and "b" are the Weibull scale and shape parameters, respectively. The wind speed validation is performed at the sensor height to avoid uncertainties related to power-law extrapolation (see Table 2 for information on heights). cut-in and rated wind speeds. Fig. 2b-h shows the differences in the observed and modeled wind speed probability density functions (∆ pdf = pdf mod -pdf obs) for the six sites, in addition to the wind speed distribution for Ekofisk (Fig. 2a). The 205 main finding is that the model underestimates the number of events with high wind speed, and overestimates the number of events with low wind speed for all sites, except Fino1. The model is biased towards too few high-wind events and too many low-wind events than the observations, and the transition occur near the typical rated wind speed (11 − 13 ms −1 ) for state-ofthe-art wind turbines (the widest gray area in Fig. 2, panels b-h). This model-bias will have a large impact on the difference between the observed and modeled wind power.

Wind speed ramp rates
The wind speed ramp rate is a measure (ms −1 ) of the hourly variability in the data set. In other words, the ramp rate quantifies how much the wind speed changes during 1 h. Figure 3 shows the distributions of observed and modeled hourly wind speed ramp rates for Ekofisk (the other sites have similar distributions). The distribution is wider for the observations than for the modeled data, illustrating that the observed wind speed change from one hour to the next is greater than that in the modeled 215 wind speed data.
The mean absolute ramp rate (MAR) for the observed wind speed (u) is shown in Table 4. Typically observed MAR is around 1 ms −1 and the difference between modeled and observed ramp rates (Table 3) indicates that the model underestimates the variability in hourly wind speed by 30 %-36 %.
3.2 Far offshore to coastal wind speed gradient 220 An important feature of a model wind data set is the ability to properly estimate the horizontal wind speed gradient from far offshore to coastal areas. There are limited possibilities to investigate this using the available observational data. However, we  made use of data from an observational met-mast situated on the coastal island of Frøya (see Fig. 1) to present some indicative results. Generally, using wind speed data at sensor height for the three sites Heidrun (far offshore), Draugen (near coastal) and Frøya (coastal) as depicted in Fig. 1, shows that there is no clear bias in the model (see Table 5). NORA3 underestimates the 225 local far-offshore to near-coastal wind speed gradient, but slightly overestimates the near-coastal to coastal gradient.

Wind direction
Another important factor for planning a wind park using modeled data is the quality of the modeled wind direction. State-ofthe-art wind turbine technology allows the wind turbines to yaw to face the main wind direction. Mapping the wind direction climatology is important for the wind park configuration and internal positioning of the turbines. Wind-rose plots (see Chapter

230
A Fig. A1) demonstrate that the modeled and observed data in general show the same wind direction distributions, with only small differences, except for Fino1. Fino1 is excluded from the verification of wind direction because the wind rose for that site shows a clear directional disturbance, as the wind is affected by the observation mast (see Fig. A1, lower most row). Fig. 4 graphs the differences between the modeled and observed data (%) in the number of wind direction events (30 • -intervals) for four wind speed categories (u < u ci ,u ci ≤ u < u r ,u r ≤ u < u co , and u co ≤ u). There is no systematic bias in wind direction 235 that can be seen across the sites, and the biases in frequency are less than 5% for all directional intervals and all sites. The wind speed interval with the greatest difference between the model and the observations features wind events corresponding to u ≥ u co . The wind speed interval with the smallest difference between the model and the observation are the "too low" wind events (u < u ci ). Hence, the model is better at capturing the wind direction when the wind speed is low.
Sleipner is the site with the greatest difference between model and observations for almost all wind direction intervals. The 240 mismatch between the observed and modeled wind direction events for Sleipner is probably tied to the model performance.
But, we cannot rule out that the platform design at Sleipner affects the flow field more than the design of the other platforms.

Uncertainties in observed wind speed
Working with observational data and numerical weather prediction models involves dealing with data that contains uncertainties and errors of known or unknown character. The majority of the observational sites used in this study (five of six sites) are oil-245 and gas platforms. The platforms are large structures that may influence the upcoming flow. On the other side, an observational mast may also influence the flow when the upcoming wind is guided to pass through the mast before being recorded by the sensor.
To what extent these large offshore structures influence the ambient flow field is unclear (Berge et al., 2009;Vasilyev et al., 2015;Furevik and Haakenstad, 2012). Nevertheless, using observations from oil-and gas platforms enable us to validate   Table 6. Median (q 50 ) and inter-260 quartile range (IQR) are independent of data distribution and are therefore good representations of the average wind power production and the intermittency, respectively. All wind power estimates are calculated at a hub height of 100 m.a.s.l. using the wind extrapolation given in Section 2.3 and the power curve given in Section 2.4.
Both the observation-based and model-based median wind power production estimates reveal very good wind power potential for the six sites (Table 6). Nevertheless, since the model underestimates those wind speeds exceeding the rated wind speed, 265 this partly counteracts the model's overestimation of low wind speeds, making the modeled total power production slightly underestimated. Therefore, the observation-based estimates of the median hourly power production q 50 span from 0.3-0.5 (i.e. the median power production for a given hour would typically be 30 %-50 % of installed capacity), compared to 0.3-0.4 for the model-based estimates. IQR, a measure of the variability, is the range between the first and third quartiles (q 75 −q 25 ). Since the range of the normalized wind power P w is 0-1, IQR values close to 1 correspond to high variability, since almost the entire data  Figure 5 shows the distribution of observation-based and model-based hourly normalized wind power ramp rates for Ekofisk (the other sites have similar distributions). As for the distribution of hourly wind speed ramp rates, the distributions of hourly 280 wind power ramp rates are wider for the observation-based ramp rates than for the model-based, illustrating that the hourly estimated wind power variability based on observations is greater than the estimated variability based on NORA3 data. The difference in MARs indicate an hour-to-hour variability typically 7 %-9 % (Table 7) of the installed capacity based on observations. In contrast, the variability for model-based estimates is 5 %-6 % and is underestimated at all sites. Table 6. Statistical measures of the observation-based (Obs) and model-based (n3) wind power production. q50 is the hourly median production, IQR is the inter-quartile range of the hourly production, and CF is the wind power capacity factor. The wind power measures and estimates are performed at a typical hub-height of 100 m.a.s.l. using the interpolation of observed wind speeds as outlined in Section 2.3 and the power curve given in Section 2.4 for all the sites. In addition to encompass short-term variations in wind speed and estimated power production it is essential for a model data set to contain the correct long-term variations. In this section we evaluate NORA3's ability to capture the longer-term climatic variability of the wind power potential for a given site. The inter-annual and seasonal variations in CF provide a good indication of how NORA3 performs in terms of long-term wind power fluctuations. Fig. 6a and b illustrate the inter-annual and seasonal CF, respectively, from the observation-based estimates. In addition, the 290 CF deviations (∆ CF) between the model-based estimates and the observation-based estimates are illustrated in Fig. 6c and d.

Wind power ramp rates
The observed year-to-year variation in CF is substantial, varying up to 0.12 (12% of installed capacity) from one year to the The model's underestimation of CF can also be seen in the seasonal CF-values. Fig. 6d shows that ∆ CF < 0 for all the sites except Fino1. The underestimation of the seasonal CF values is largest during the summer months (May-September), meaning that the relative importance of the summer months in wind power production will be slightly underestimated in the 300 model-based estimates.

Spatial wind power co-variability
Many studies have shown that interconnection of wind power production sites mitigates wind power intermittency (Kempton et al., 2010;Reichenberg et al., 2014;St. Martin et al., 2015;Reichenberg et al., 2017;Solbrekke et al., 2020). Therefore, modeling tools for use in making decisions about future wind power installations should be able to represent spatial and 305 temporal co-variability between wind power sites. A specific year was excluded from the plot if more than one-half of the data for that year was missing. Figure 7 illustrates the ability of NORA3 to capture the spatial co-variance in estimated hourly wind power production between the six sites. The figure demonstrates how the correlation between two sites changes as a function of the separation distance, both for the observation-based estimates (blue) and the model-based estimates (red). For almost all separation distances the model overestimates the correlation between two connected sites. The overestimation is generally small, but is 310 greatest for small separation distances. This result indicates that NORA3 is better at capturing the large-scale spatial variability than variance on smaller scales.
A general description of the dependency between correlation and separation distance can give us information on the decorrelation length for the sites used in this study. Using the station-pair correlations we identify a best fitting exponential curve and a de-correlation length L (in kilometers). Connecting sites separated by a distance greater than the de-correlation length 315 ensures that the collective wind power intermittency from the two sites is substantially reduced compared to the intermittency from one of the sites. We use the e-folding distance 3 as a measure of the offshore de-correlation length L. The exponential curves and the corresponding de-correlation lengths for both the observations and NORA3 are presented in Fig. 7. and NORA3 data (n3, red). An exponential fit is also shown (e bx a ) for both the data sets with the corresponding de-correlation lengths, L. independent hourly power production, a greater interconnection distance is needed than that indicated by the observation-based 320 estimates.

Zero wind power events
Knowing about the risk, duration, and frequency of zero-events (periodes of zero wind power production) is important for decision-making and also in turbine maintenance planning, as these measures influence the levelized cost of energy and hence the decision-making process (Cory and Schwabe, 2009). A zero-event is caused by a wind speed that is too low (u < u ci ) or 325 too high (u ≥ u co ), and these events depend to some extent on the technical specifications of a wind turbine, but also, and more significantly, on the ambient wind climate in the area of interest. Table 8 shows the percentages of all hourly wind speed data values that fall into each wind power category (u < u ci , u ci ≤ u < u r , u r ≤ u < u co , and u co ≤ u) for each site. In addition, the table lists the total risk of having zero wind power production (P w = 0). The percentage of hours when the wind is too weak to produce wind energy (u < u ci ) ranges from 8 % to 14% in the observation-based estimates and is overestimated by the model produce wind power and its underestimation of the number of hours with winds that are too strong results in a well-captured total numbers of hours of zero wind power production, which differs from the observed value by 0.8 percentage point.

335
The atmospheric conditions causing winds that are too weak for wind power production are very different from those causing winds that are too strong. Therefore, we split the zero-events accordingly. Figs. 8 and 9 illustrate the ability of the NORA3 to capture the observation-based estimates of zero-events of different durations. Fig. 8a shows the observation-based numbers of zero-events of varying durations caused by too weak winds. As expected, the number of zero-events decreases as the duration of the events increases, ranging from around 90-150 yearly events lasting less than 3 h for most sites to close to zero such events 340 lasting longer than 2 days. Figure 8b graphs the relative differences (in percentage) between the NORA3 and observation-based estimates of the numbers of zero-events by duration. The model-based estimates typically have 40 %-50 % too few zero-events of short duration (1-3 h) compared to the observations. For longer zero-events the model is biased towards too many events.
The model's underestimation of short zero-events caused by too low wind speeds and its overestimation of longer zero-events occur as a result of the model having lower variability than the observations, as seen in the ramp-rate analysis (see Section 345 4.1). This lower variability means that when these zero-events occur in the model they tends to be of longer duration, but the frequency of such events is too low.
From Fig. 9a it is evident that the yearly average occurrence of zero-events caused by too strong winds is a factor of ten lower than the number of zero-events caused by winds that are too weak. Hence, one zero-event caused by too strong winds happens for approximately every 10 zero-events caused by too weak winds. The model underestimates the number of zero-350 events caused by too strong winds for all sites (Fig. 9b); depending on the zero-event duration, NORA3 typically has 40 %-70 % too few zero-events caused by too strong winds. Table 8. The percentages of observed wind speeds (Obs) and modeled wind speeds (n3) that fall into the following four categories: 1) the wind speed is less than the cut-in limit (u < uci), 2) the wind speed interval in which the wind power is a function of the cube of the wind speed (uci ≤ u < ur), 3) wind power production is rated (ur ≤ u < uco), and 4) wind speed exceeds the cut-out limit (uco ≤ u). In addition, the total hours of zero wind power production (Pw = 0) divided by the total number of observations, are shows as a percentage.

Expected maximum zero-event duration over the turbine lifetime
In this section we attempt to validate the model's ability to provide reliable estimates of extremely long zero-events. This is done by estimating the 25-year return value for the duration of a zero-event (the typical length of a zero-event that statistically 355 would occur at least once over a 25-year period) using the method outlined in Section 2.6. Using the Kolmogorov-Smirnov test, we cannot exclude the possibility that the BM-data and POT-data are drawn from a GEV distribution and a Pareto distribution, respectively. Thus, it is reasonable to fit the observation-based and model-based extreme zero-event duration estimates to these distributions and find the 25-year maximum expected zero-event duration. Figure 10 displays the results from fitting the Pareto distribution to the POT-data (the results fitting the BM-data to the GEV 360 distribution are similar). From the observed data the typical length of the longest zero-event expected to occur at least once during the lifetime of a turbine is on the order of 40-60 h, but a zero-event of more than five days cannot be ruled out. The uncertainty in the estimations make it difficult to judge which sites have the shortest and longest maximum zero-even duration.
Using the model data, the estimates are typically longer than the observation-based estimates (not significant at the 2.5 % significance level for five of six sites); and are in line with the lower variability in the modeled hourly wind speed and wind 365 power as seen in the ramp-rate analysis (see Sections 3.1 and 4.1). In conclusion, using NORA3 to estimate extreme zero-event duration would lead to a conservative estimate of the return values, and the duration might be overestimated due to the lower variability in the model.

Summary
NORA3, a high-resolution (3-km) numerical mesoscale weather simulation data set from the Norwegian Meteorological In-370 stitute for the period 2004-2018 has been validated for offshore wind power purposes using observations of wind speed and direction from six offshore sites along the Norwegian continental shelf. Validation of mean quantities, hourly distributions, and variability have been carried out for wind speed and wind power. In addition, wind power estimates based on a selected hub height and power curve have been conducted using the observations and modeled wind speed. Comparisons between modeled and observed data have been made for estimated wind power capacity factors; temporal variability; spatial co-variability be-375 tween production sites; frequency, duration and total number of hours of wind power zero-events; and the maximum length of a zero-event expected to happen at least once during the turbine lifetime. The general picture is that the NORA3 data is rather well suited for wind power estimates in the absence of in situ data. There is a tendency for the modeled data to give slightly conservative estimates of the wind resources.
In five of the six offshore sites NORA3 seems to be biased towards lower mean wind speeds (u obs = 10.59 ms −1 , u n3 = 380 10.04 ms −1 ). The differences in wind speed distribution between the observations and the model output reveal that the model underestimates the number of events with wind speed exceeding the rated wind speed and overestimates the number of events with wind speeds below the rated wind speed (see Fig. 2). The transition between over-and underestimation by the model occurs near a typical rated wind speed (11 − 13 ms −1 ). As the model underestimates the wind episodes above the rated wind speed, this partly counteracts the model's overestimation of low wind speeds, making the total modeled power production 385 slightly underestimated.
NORA3 is also biased towards less variable wind speeds on hourly timescales. Analysis of hourly wind speed ramp rates show that the hour-to-hour variability is typically slightly above 1 ms −1 while the model-based ramp rates are slightly below 1 ms −1 , resulting in an underestimation of wind speed ramp rates on the order of 30% (see Table 4).
Generally, estimates of wind power from NORA3 are biased towards too low median values (P w,obs = 0.43, p w,n3 = 0.37) 390 and wind power CF (CF obs = 0.50, CF n3 = 0.47). The negative bias is a consistent feature seen in all years and for all months for five of the six sites.
The wind power ramp-rate analysis shows that the hourly wind power variability of the model-based estimates are too low.
The observation-based wind speed variability leads to a corresponding wind power ramp rate that is typically 0.08 (8% of installed capacity), while the model-based ramp rate estimated is typically 0.05.
By interconnection of site-pairs we demonstrate that the spatial co-variability in estimated hourly wind power production between sites is slightly higher for the modeled data than for the observational data. Hence, the decorrelation length is estimated to be 19 % longer in the model-based estimates.
The estimation of the occurrence and duration of zero-events show a fairly well-captured total risk of hourly zero-events (n3 = 12.22 %, obs = 11.39 % of the time). We split the zero-events into episodes of no wind power production caused by either too 400 low (u < u ci ) or too high (u ≥ u co ) wind speeds. For zero-events caused by winds that are too strong NORA3 underestimates the occurrence of zero-events for all durations. For winds that are too weak, NORA3 underestimates the number of short zeroevents (1-3 h) but is biased towards an excess of zero-events with longer duration. As a result, when a zero-event occurs in the model, it tends to be of longer duration, but the frequency of such events is too low. This deviation from the observation-based zero-events is in line with the lower variability in hourly wind speeds seen in the ramp rate analysis (Section 3.1 and Section 405 4.1).
In the extreme-value analysis we found that at least once during the lifetime of a turbine (25 years) a zero-power event is expected to last for 1 to 3 days, depending on the site in question (see Fig. 10). However, a zero-event lasting longer than 5 days cannot be ruled out for some sites. Overall, the model is somewhat conservative, with a tendency towards longer maximum zero-event duration than seen in the observation-based return values.

410
To a large degree the NORA3 wind speeds and directions resemble the characteristics seen in observations. This results in rather good estimates of wind power and wind power variability. However, the model slightly underestimates the wind power potential, and the hourly variability in the model output is lower than in the observations. These characteristics should be kept in mind when using the NORA3 data set as an offshore wind power planning tool.
Author contributions. IMS and AS conceptualized the overarching research goals of the study, in addition to conducting formal analysis 415 regarding statistical and mathematical techniques and methods. HH is the creator of NORA3 data set and is responsible for the data resources.
IMS was responsible for the data curation and creation of software code, validation, visualization of data, and preparation of the original draft, with contribution from AS. In addition, AS was responsible for the supervision. All the authors contributed to the review and editing process of the paper.
Competing interests. No competing interests to declare. Figure B1. The wind speed difference (ms −1 ) between the model and the observations (mod-obs) for the six sites. Dark green bars correspond to platform-sites (Ekofisk, Sleipner, Gullfaks C, Draugen and Heidrun) and light green to the met masts (Fino1 and Frøya).
To investigate a potential flow distortion effect in the observations caused by offshore platforms we sort the modeled wind speed data into categories according to the four wind power regimes: simulated wind speeds lower than cut-in wind speed (u < u ci ), simulated wind speeds falling between the cut-in and rated wind speed (u ci ≤ u < u r ), simulated wind speeds corresponding to nameplate capacity (u r ≤ u < u co ), and simulated wind speeds exceeding the cut-out limit (u co ≤ u). We then extract the observations for each hour that falls into each of the four categories, and determine the mean wind speeds for 430 each category and for each site. The results are shown in Fig. B1. For the platform-sites, the observed wind speeds are greater than the modeled wind speeds in all four categories, with the largest difference seen in the rated wind regime (u r ≤ u < u co ).
The finding that the observed wind speeds are greater than the modeled wind speeds for all four categories suggests a speed-up effect may be present for both weak and strong winds. For the two met masts, at Fino1 and Frøya, the model wind speed is weaker than the observed wind speed for the three first categories, but for the "too strong" wind regime (u co ≤ u) the modeled 435 wind speed is greater than the observed wind speed.
The five platforms have different layouts and therefore influence the upcoming flow field differently. Therefore, we calculate the difference in mean wind speed (n3-obs) for each wind direction interval (30 • -interval) (see Fig B2) to investigate a potential flow distortion caused by the platforms. For almost all wind direction intervals the observed mean wind speeds are higher than