Interactive comment on “Minute-scale power forecast of offshore wind turbines using single-Doppler long-range lidar measurements”

This is a very well written and thorough study looking at using a single doppler lidar for minute-scale wind/power forecasts. The authors took a very systematic approach to the study, leveraging previous work/studies. They especially did a good job discussing and considering the possible causes of their results, especially when results were not favorable (stable conditions). I only have a few minor questions/ comments.


Introduction
With the increasing penetration of renewable energies in the power system, the demand for very short-term power forecasts is continuously rising. Transmission System Operators (TSOs) need to ensure grid stability by balancing supply and demand of power at all times. In this regard, very short-term forecasts are an important tool to support power system management and re-15 duce curtailment costs (Liang et al., 2016). Further, minute-scale forecasts hold significant value for energy market applications (Cali, 2011), especially with gate closure times nowadays being as short as only 5 minutes, for example in Germany, Belgium and France (EPEXSPOT, 2020). Also, the provision of ancillary services, e. g. the supply of reserve power by wind farms (50Hertz et al., 2016), would benefit from improved very short-term forecasts. Probabilistic forecasts additionally provide uncertainty information and are thus especially useful to support decision-making processes (Dowell and Pinson, 2016). 20 While for forecast horizons of several hours or days, typically physical models such as numerical weather prediction (NWP) models are used, on shorter time scales, i. e. lead times ranging from minutes to several hours, statistical models are applied (Giebel et al., 2011). For lead times of few to several hours, this includes mainly time series models, Kalman filters and Model Output Statistics (MOS) (Sweeney et al., 2019). The simplest statistical model for even shorter lead times is persistence, which by distinguishing between different atmospheric conditions. To address their performance we compare lidar-based forecasts against the benchmark persistence.

Planar long-range lidar measurements
For the purpose of forecasting, typically horizontal Plan Position Indicator (PPI) lidar scans, i. e. with an elevation angle of ϕ = 0 • , are used. Hereby, the lidar device can be placed either on the nacelle or transition piece (TP) of a wind turbine or a 70 nearby platform. The aim is to cover an area upstream of the wind farm, preferably in main wind direction. Scan parameters, i. e. averaging time and azimuthal resolution are chosen to maximise the measurement distance while keeping the scanning time as short as possible. Scan orientations need to be adjusted according to the wind direction. For each measurement, typically the line-of-sight (LOS) velocity, carrier-to-noise-ratio (CNR) as well as azimuth angle, range gate, and time information are available. 75 For the case study presented in Section 4 of this paper, such a typical set-up was used. Without loss of generality of the methodology introduced in Section 3, we are describing the main parameters of this lidar campaign to provide a realistic example. Lidar scans were performed at the offshore wind farm Global Tech I (GT I) located in the German North Sea from August 2018 until February 2020 with a Leosphere Windcube 200S (Serial no. WLS200S-024) lidar system positioned on the transition piece (TP) of the westerly located turbine T2 as depicted in Figure 1. The lidar was placed at a height of about 24.6 m 80 above mean sea level. Scans were performed with an azimuthal resolution of 2 • , averaging time of 2 s per measurement, a pulse length of 400 ns and range gates ranging from 500 m to 8000 m with 35 m spacing. The lidar scan spanned a sector of 150 • , thus the duration of one scan was T tot = 156 s, i. e. measuring time T ϑ = 150 s plus a measurement reset time of approximately T r = 6 s. One of four different scan orientations (Figure 1 (b)) was chosen manually according to the wind direction. A more detailed analysis of the lidar data will follow in Section 4.1. 85 Besides horizontal PPI lidar scans, high elevation scans with ϕ = 13.57 • , measuring the inflow of turbine T2, were performed. Here, it was measured with an azimuthal resolution of 1 • and an averaging time of 0.2 s per measurement. At hub height, measurements were performed with a distance to the rotor larger than 2.4D and therefore outside of the induction zone, as recommended by the International Electrotechnical Commission's (IEC) standard for power curve measurements (IEC, 2017). A total azimuth range of 180 • , varying from 134 • to 313 • was spanned, which means it took about 36 s to perform one 90 scan and approximately 8 s to reset the measurement. Mean wind speeds and wind directions with an averaging period of 44 s at hub height were determined by applying a Velocity Azimuth Display (VAD, see Section 3.1) algorithm to each scan. Only situations with wind directions ranging from 180 • to 270 • were considered for further analysis. The 44 s-mean wind speeds were used to construct a probabilistic power curve in Section 3.5.
3 Methodology 95 Figure 2 gives an overview of the proposed lidar-based forecast methodology. First, a wind field reconstruction algorithm was applied to retrieve horizontal wind field information from line-of-sight measurements of the angular scans (Section 3.1). To keep as much data from far ranges as possible, a dynamic data filtering approach was used. The low scanning speed required the time synchronisation within each lidar scan, which was realised by means of a propagation algorithm (Section 3.2). Following, an advection technique was applied to determine a wind speed forecast (Section 3.3). The wind speed forecast was defined by 100 selecting wind vectors arriving within a predefined area of influence (AoI). This set of wind vectors formed the basis of the probabilistic wind speed and power forecast. In the next step, wind vectors were extrapolated from measuring height to hub height (Section 3.4). Finally, hub height wind speeds were translated into a probabilistic power forecast utilising a probabilistic power curve (Section 3.5). Figure 2. Schematics of the lidar-based forecast methodology. Line-of-sight wind speed measurements uLOS,n measured within a time interval [tn, tn+1 − Tr] were filtered and a wind field reconstruction was performed. Using two consecutive lidar scans, the horizontal wind speeds u h,n were then synchronised at time tsyn. A propagation technique was applied to propagate wind vectors to tn+1 − Tr + k with end time of the scan tn+1, measurement reset time Tr and lead time k. Wind speed forecasts umeas were further extrapolated to hub height and transferred to power forecasts by means of a probabilistic power curve.

105
When performing lidar measurements several causes such as meteorological conditions, hard targets and device limitations can lead to invalid measurements.Typically, the carrier-to-noise-ratio (CNR) is used as an indicator for the backscattered signal's quality. Low CNR values hereby indicate low data quality and are commonly neglected by means of threshold filters (Aitken et al., 2012). However, when applying a CNR-threshold filtering approach, a significant amount of valid data, especially from far distances, is being excluded (Valldecabres et al., 2018b). As long measurement distances are most important 110 for this work, we combined a CNR-threshold filter and a dynamic filtering approach. All measurements with CNR > 0 dB and CNR < −30 dB were neglected, measurements with −26.5 dB < CNR < −5 dB were always considered valid and remaining values were filtered using the dynamic density filter developed by Beck and Kühn (2017). Here, CNR as well as line-of-sight (LOS) wind speed measurements were first normalised and sorted in a 2D plane before a 2D Gaussian function with standard deviations σ CNR and σ LOS and mean values µ CNR and µ LOS was fitted to the normalised values. Finally, those values 115 positioned outside of an ellipse defined by the semi-axes 2.75 σ CNR , 2.75 σ LOS and the centre position µ CNR and µ LOS were discarded.
After filtering, the global wind direction was determined by performing a VAD fit individually for each range gate in a certain scan. To do so, homogeneity across range gates was assumed and the vertical wind speed component neglected (Werner,120 2005). Range gates with less than 15 valid lidar measurements were discarded. A one-dimensional wind speed projection on the prevailing wind direction of the range gate r was performed using .
Values with where ϑ denotes the azimuth angle of the lidar's scanner and χ the wind direction, were neglected as they show large error values due to the almost perpendicular orientation of wind direction and azimuth angle. We will refer to those as critical angles or critical region in the following. Apart from that, remaining outliers with values deviating more than 2.75 standard deviations σ from the mean wind speed of the scan were neglected (Felder et al., 2018). Only scans with an overall data availability of at least 80 % were considered for the forecast. For further analysis, the results were interpolated onto a Cartesian grid with 25 m 130 spacing. Figure 3 shows an example of a reconstructed wind field.

Time synchronisation of lidar scans
When the time shift within a lidar scan is larger than the averaging time (1 min) of the forecasted values, one cannot assume the scan to be quasi-instantaneous, which is commonly done when considering wind speed averages from lidar scans. Several approaches to account for the time shift within the scan have been tested, all aiming to synchronise the scan in time before 135 applying the propagation methodology (Section 3.3). We found the most accurate results applying a time synchronisation developed by Beck and Kühn (2019), which is visualised in Figure 4. Here, lidar scans were propagated by means of a semi-Lagrangian advection technique. Propagated scans were generated with a temporal resolution of ∆T . Each propagation was a combination of a forward-as well as a backwards-propagated scan, weighted according to a trigonometric function following the suggestion of Beck and Kühn (2019). The weighting was dependent on the time passed since the initialisation 140 of the original scan. Hereby, backward propagations were only taken into account after one-fifth of the total scanning time T tot . The total scanning time consists of the measuring time T ϑ and the measurement reset time T r . A 3D natural-neighbour interpolation (Sibson, 1981) was applied to the sequence of propagated scans, determining the horizontal wind speed u h across the scanned domain and at time t syn . Figure 4 shows the current lidar scan initialised at time t n and the previous one by two consecutive lidar scans only, avoiding the need for a future scan. We chose the maximal t syn = t n + a ∆T to minimise the wind vector advection period as indicated by the black arrow in Figure 4.

Wind speed forecast
To generate a wind speed forecast the methodology developed by Valldecabres et al. (2018a) was utilised. A Lagrangian advection technique, based on the assumption that wind vectors propagate with their local horizontal wind speed and wind 155 direction (Germann and Zawadzki, 2002), was applied. It was thus assumed that the wind field vectors do not change their trajectory with time. As a consequence of the wind field reconstruction explained in Section 3.1, the direction of wind vectors was the same for all azimuth angles and varied only with range gate. Apart from that, we neglected vorticity, mass conservation and diffusion (Germann and Zawadzki, 2002;Valldecabres et al., 2018a). To develop a wind speed forecast with lead time k, wind field vectors were propagated in time and space from their original position at the synchronised time step t syn to the last 160 time step of the scan t n+1 − T r and further to t n+1 − T r + k.
Vectors arriving within a previously defined area of influence around the turbine of interest and within a time interval of t n+1 − T r + k ± 30 s were selected and used for the wind speed forecast. An example of such a point cloud is shown in Figure   3 (d). The AoI was defined as a circle centred around the turbine's position and its radius was optimised by minimising the average continuous ranked probability score (crps, see Section 4.4.1) (Gneiting et al., 2007) of a 1-minute-ahead wind speed 165 forecast at a reference free-flow turbine as suggested by Valldecabres et al. (2018a). That means the forecast was optimised with respect to its probabilistic rather than its deterministic scores. Further, the minimum required amount of wind vectors reaching the turbine was determined by applying the same methodology. Forecasts that were based on less vectors are invalid.
At this point, two orders of the methodological steps are possible, i. e. propagating wind vectors at varying heights that are different from the height of interest to the target turbines before extrapolating to hub height or performing the extrapolation 170 prior to the wind vector propagation. Each of the two possibilities is associated with specific errors. In this case study, we chose to propagate wind vectors before the wind speed extrapolation as this yielded more accurate results. The consequences of this approach will be discussed in Section 5.1.

Wind speed extrapolation to hub height
As the lidar was positioned at TP height, an extrapolation to the hub height is needed. A logarithmic wind profile including a stability correction Ψ( z L ) (Peña et al., 2008) was used to do so: With the horizontal wind speed u h , roughness length z 0 , height z, the gravitational acceleration g and the Obukhov length L.
The friction velocity u * is expressed in terms of the Charnock parameter α c , which describes the relation between wind speed 180 and roughness of the sea surface and was set to α c = 0.011 as suggested by Smith (1980) for far offshore conditions. The Von Kármán-constant is defined as κ = 0.4.
The atmospheric stability for each lidar scan was determined using the methodology described by Sanz Rodrigo et al. (2019).
Air and sea surface temperature, pressure and relative humidity values were used to determine the virtual potential temperature 185 difference ∆Θ = Θ TP − Θ 0 between TP height Θ TP and the sea surface Θ 0 as well as the virtual temperature at sea level T v .
The wind speed u TP was defined as lidar measurements at the closest range gate of 500 m. The stability estimation was performed using 30-minute moving averages of all variables. Here, first the Bulk Richardson number Ri b was calculated, which was then transferred into the stability parameter ζ as defined by Grachev and Fairall (1997) and finally the Obukhov length L according to Equations (5), (6) and (7).
For the calculation of the stability correction term Ψ the definition of Dyer (1974) shown below was used.
With β = 6 and γ = 19.3 as suggested by Högström (1988). The roughness length z 0 was determined by fitting the wind speed profile to the wind speed measurements u TP , using the calculated Obukhov length L.
With the height of the measurement z meas the wind speed at hub height u hh can then be expressed as: In the following, we will refer to c h as the height extrapolation factor.

Probabilistic wind power forecast
The forecasted wind speed distribution was finally transformed into a wind power distribution. To do so, a probabilistic power curve constructed using high elevation lidar scans (Section 2) and high-frequency SCADA power data of turbine T2 (Section 205 4.1) was applied. Usually, 10-minute wind speed and power averages are used to construct power curves, however, we used 44second-mean values, in accordance with the measurement time per scan, to capture the power curve's associated uncertainties more accurately (Gonzalez et al., 2017). Wind speed values were air density corrected as described by Ulazia et al. (2019) and according to IEC 61400-12-1 (IEC, 2017). Air pressure and temperature values were hereby corrected to hub height applying temperature gradients of the ISO standard atmosphere as suggested by ISO2533 (ISO, 1987). The mean value and standard 210 deviation of power within wind speed intervals of 0.5 m s −1 width were determined (Gonzalez et al., 2017). These values were further used to define a normal cumulative distribution function (cdf) of power for each wind speed interval. Figure 5 shows the normalised probabilistic power curve with standard deviations of power indicated by error bars. For each value of the forecasted wind speed distribution, i. e. for each wind vector reaching the area of influence, one power value was randomly selected using the normal cdf of its corresponding wind speed interval. A resampling technique with replacement (Efron, 1979) 215 was applied to the resulting power distribution, randomly selecting 10 000 power values, as suggested by Valldecabres et al. (2018a).
In the following, we will first introduce the case study at the offshore wind farm Global Tech I, then analyse the method's advantages and limitations, and afterwards assess the quality of a 5-minute-ahead lidar-based deterministic as well as probabilistic 220 wind power forecast of the free flow turbines T1-T7, based on the mentioned case study.

Case study at the offshore wind farm GT I
Power forecasts at Global Tech I were analysed as a case study. The wind farm consists of 80 wind turbines of type Adwen 5-116 with a rotor diameter of D = 116 m, a hub height of z hh = 92 m and a rated power of P r = 5 MW. The total capacity of the wind farm is P total = 400 MW. The 1 Hz SCADA data, including power and wind direction values of all wind turbines, as 225 well as information regarding the turbines' operational status, was available for the period of the measurement campaign. Wind speed values were not measured but estimated by the SCADA system based on power, pitch angle and the turbine power curve.
Further, information regarding the SCADA data quality was available and used to remove low-quality data. In the following analysis, we used 1-minute-mean values of wind speed and power within the interval t ± 30 s to validate wind speed as well as power forecasts for seven wind turbines in the first south-westerly row marked in Figure 1. We refer to those turbines as T1-T7 230 in the following.
A forecast was generated for each lidar scan, thus with a temporal resolution of approximately 2.5 minutes. Forecasts within the interval 08.03.2019 to 31.05.2019 were evaluated. Here, we only considered situations in power production mode below rated wind speed. For further analysis, only scans with a total spatial availability of at least 80 % after applying the filtering algorithms (Section 3.1) were considered. The total availability is considered 100 % if data at all measured range gates and 235 azimuth angles between 140 • and 300 • is valid. Missing data beyond these azimuth limits was considered not to impact the quality of the forecast gravely and thus neglected when determining the total spatial availability. In total, 17 024 lidar scans with a mean availability of 89.7 % were used for the analysis. The wind speed and direction distribution of those situations considered are visualised in Figure 6. North-westerly winds from 250 • to 320 • were identified as the prevailing wind direction.
Wind speeds mainly lay between 6 m s −1 and 12 m s −1 . As a consequence of the wind farm's layout, we only used scans with 240 wind directions 130 • < χ ≤ 350 • , indicated as grey shaded area in Figure 6.
To perform the time synchronisation an interpolation time step ∆T = 6 s was chosen. With a scanning time of T tot = 156 s, we chose the synchronisation time as t syn = t n + 5∆T = t n + 30 s in order to avoid the need for a backwards propagation as explained in section 3.2. Time synchronised wind vectors were propagated with a lead time of k = 300 s to generate a wind 245 speed forecast. For a forecast to be valid, at least a number of Z = 20 wind vectors needed to be available. The radius of the area of influence was set to R AoI = 0.2D = 23.2 m, following the methodology described in Section 3.3, with T2 as reference turbine.
L was determined using meteorological measurements: Air pressure, humidity and air temperature measurements were performed using two sensors (Vaisala PTB330 and Vaisala HMP155 respectively) from July 2018 until February 2020, both positioned at the height of the lidar at about 24.6 m. Additionally, sea surface temperature (SST) data, which showed a good agreement with on-site buoy measurements performed at an earlier time (Schneemann et al., 2020), was available from the OS-TIA data set (Good et al., 2020). SST data is available at noon every day and was linearly interpolated to match the timestamps of the lidar scans. L was then used to extrapolate wind vectors from measuring height to hub height following Section 3.4.
During the measurement campaign, a slight elevation misalignment of the lidar was detected. Using a so-called sea surface 255 levelling method the magnitude of pitch and roll of the lidar, i. e the tilt of the geographical coordinate system, was determined as proposed by Rott et al. (2017). The inclinations were hereby found to be related to the mean wind speed and wind direction, i. e. the thrust respectively the yaw orientation of the turbine. Pitch and roll, defined as clockwise rotations around the xrespectively y-axis, were 0.02 • and −0.11 • for the turbine in idling mode and 0.02 • ±0.15 • respectively −0.11 • ±0.11 • during power production, depending on the mean wind speed. As even small errors in the elevation will lead to large differences in the 260 measurement height, especially for far measurement distances, we accounted for the misalignment by means of a correction function. The correction function used the power production of the turbine and the mean wind direction to determine pitch and roll. These values were then used to estimate the corrected measuring height across the scanned area. Height differences due to the curvature of the Earth were considered as well. An additional uncertainty was introduced by the tide, which varied approximately ±0.6 m. For simplicity, we neglected this influence.

265
The measuring height z meas in Equation (9) therefore varied with range gate and azimuth for each scan. Heights of wind vectors contributing to wind speed forecasts in this analysis spanned between a height of 12 m and 65 m, with a mean height of 36 m.
Wind vectors extrapolated to hub height were in a final step transformed to wind power values using the methodology and power curve introduced in section 3.5. For the evaluation of the probabilistic wind power forecasts, we distinguished between stable or neutral and unstable atmospheric stratification. Situations with values of −1 000 m < L < 0 m were classified as unstable, 270 while those with 0 m < L < 1 000 m were defined as stable (Van Wijk et al., 1990). All other cases were defined as neutral.

Evaluation of Methodology
Here, we aim to present the results of the individual methodical steps introduced previously. We assessed how the use of single-Doppler measurements and the low scanning speed affected the lead time, availability and skill of the forecast. Further, the impact of the extrapolation to hub height was analysed (Theuer et al., 2020). Finally, we examined how the single-Doppler 275 data may have influenced the prediction intervals of the forecast.
The data availability of all valid lidar scans dependent on range gate, applying different filtering methodologies, is compared in Figure 7. Clearly, the availability of data was increased for far ranges when applying the density filter (red line) as compared to a CNR-threshold filter (blue line) with −26.5 dB < CNR < −5 dB. While the data availability at a range gate of 7 km has already decreased to 42 % for the threshold filter, it still lies at 73 % when using the density filter. Also at very close range 280 gates from 500 m to 1450 m, the availability was increased from about 95 % to almost 100 %. The green line depicts the data availability after applying the density filter and additionally neglecting all other invalid data. That included the removal of wind speed outliers, however, the dominant effect was the omission of values within the critical region as described in Section 3.1. For the given measurement set-up and range gates up to 6 km, the availability was reduced to approximately 85 % of the density filtered data. At 7 km it has decreased to 61 %. As the data availability was already reduced for far distances, the impact 285 of further filtering was smaller compared to near ranges with higher data availability.
The number of observations at each measurement point in the polar coordinate system of the lidar before filtering is shown in Figure 8 (a). Clear differences in the number of observations are visible as a consequence of the four scanning trajectories of the lidar (Figure 1 (b)). In accordance with the wind direction distribution (Figure 6), north-westerly sectors were covered more frequently than southerly sectors. Figure 8 (b) visualises the number of observations after filtering not only dependent on 290 range gate, but also on the azimuth angle. The single-Doppler set-up caused the need to apply a VAD-fit and as a consequence to filter certain regions of the scan, earlier referred to as critical regions. As an effect, data availability is significantly reduced. for turbines T5, T6 and T7. The forecast's quality is also worse for those wind directions, especially for T6 and T7 (Figure 9 (b)). Here, the lidar scans mainly covered the north-westerly inflow direction of the wind farm. Consequently and due to the https://doi.org/10.5194/wes-2020-78 Preprint. Discussion started: 11 May 2020 c Author(s) 2020. CC BY 4.0 License. threshold filter density filter density and wind direction filter Figure 7. Data availability dependent on range gate when applying the threshold filter (blue) and the density filter (red). In green the availability after applying the density filter and neglecting the critical angles and wind speed outliers is depicted. possible to generate forecasts with lead times larger than 5 minutes using the available lidar scans.
The wind speed extrapolation to hub height is, following the method introduced in Section 3.4, mainly dependent on stability. 325 Figure 10 shows the dependence of the height extrapolation factor c h , calculated with Equation 9, on Obukhov lengths L assuming an extrapolation from a height of 24.6 m to 92 m and a roughness length of z 0 = 0.0002 m. While the slope of the curve becomes very small when approaching Obukhov lengths L with large magnitudes, thus neutral situations, especially for very stable cases L → 0 the change of the correction factor with L is very large. This consequently means misestimations of Obukhov length L have a larger impact on the wind speed extrapolation in stable situations. In order to determine this effect, 330 we distinguished between stability cases in the following analysis. While during 55.5 % of the valid scans unstable atmospheric stratification was observed, in 18.2 % the atmosphere was defined as neutral. Stable situations were observed in 26.3 % of the cases. To be able to evaluate unstable cases, during which we expect the highest errors in persistence as compared to stable and neutral ones, separately and to keep the number of analysed cases similar we chose to combine stable and neutral situations for the analysis. Wind speed point forecasts were calculated as the mean of the predicted wind speed distributions. Forecasts (fc) were verified with 1-minute-mean SCADA data (obs) and using the root mean squared error (rmse), mean absolute error (mae) and bias. N denotes the total number of forecasts considered.
As a reference, the benchmark persistence was used, which assumes the future value at t + k equals the current value at time t, i. e. fc(t + k) = obs(t).

Unstable stratification
Figure 11 compares 1-minute-mean SCADA wind power values of turbine T3 with persistence and the lidar-based forecasts (LF) in unstable atmospheric conditions. Both methods show an overall good agreement between forecast and observation with R 2 = 0.80 respectively R 2 = 0.86, with the LF's scatter being slightly smaller than that of persistence. The LF outperforms persistence in terms of rmse and mae. The lidar forecast's bias of 0.52 % is slightly larger than that of persistence with 0.31 %.

350
The magnitude of the error is increasing with increasing power for both persistence as well as the lidar-based forecast. As the wind speed forecasting error was not found to increase with wind speed, the increase of error with power is attributed solely to the cubic nature of the power curve. outperforms persistence for turbines T1-T4 in terms of rmse and mae, with the lowest rmse observed for T1 and the largest improvement as compared to persistence for T3 with 20.1 %. The bias of those turbines is slightly larger than of persistence but rather small and not suggesting a systematic over-or underestimation of power caused by the model. T5 shows lower forecast skill and outperforms persistence only in terms of rmse. The quality of the LF at T6 and T7 is below that of T1-T4 with a strongly reduced number of valid forecasts N . We attribute this to the turbines' position in an area not covered well by the lidar 360 scans, which means fewer wind vectors can be propagated to the target turbines (Section 4.2). In Figure 8 (b) it can be observed that especially regions very close to these turbines have low data availability. Low wind speeds, possibly originating from those areas, are thus not represented well in the wind speed distributions causing an overestimation of wind speed and power. Also for only simultaneously available forecasts, the LF outperforms persistence for all turbines except T6 and T7. The difference in quality is less distinct in that case, with the rmse increasing by a factor of 1.8 instead of 2.4 from T3 to T7. While therefore, 365 some of the quality differences observed for all available cases can be explained by the varying time intervals considered, this also confirms that forecast accuracy depends on the availability of wind vectors.   Figure 12 shows the comparison of SCADA data and LF as well as persistence forecasts for stable and neutral atmospheric conditions at turbine T3. While the overall agreement between observation and forecast is very good for persistence, larger 370 scatter, a higher rmse and mae are observed for the LF. Generally, persistence clearly outperforms the LF during stable and neutral conditions in terms of rmse and mae as summarised in Table 2. While the lidar forecast's bias for T3-T5 is lower than that of persistence, it shows a large overestimation of power, especially for T6 and T7. Similar to unstable cases, the quality and number of valid lidar-based forecasts decreases for turbines positioned in areas not well covered by the lidar scan. As especially areas close to the turbines are not represented well (Figure 8 (b)), wind speed and power are being overestimated.

375
The quality of persistence is much better compared to unstable situations, due to lower wind speed fluctuations characteristic in stable situations (Stull, 2017). The lidar forecast's skill, however, is considerably lower compared to unstable situations. We attribute this to the extrapolation of wind speed to hub height. Variations in Obukhov length L and measuring height z meas have a larger impact on the height extrapolation factor for stable situations compared to unstable situations, leading to larger errors in case of misestimations. We will discuss this in more detail in section 5.1. Probabilistic forecasts are generally evaluated by means of their sharpness and calibration. Sharpness describes the broadness of its distribution, while calibration estimates the consistency between the statistics of forecasts and observations (Gneiting et al., 2007). Both calibration and sharpness are estimated with the average crps: Here, F denotes the cdf of the forecasted wind power, x 0 the observed wind power and H the Heaviside step function with H(x − x 0 ) = 0 for x < x 0 and H(x − x 0 ) = 1 otherwise.
To assess the forecast's calibration quantile-quantile reliability diagrams (Hamill, 1997) were used. A reliability diagram determines what percentage of the observations lies below a certain quantile of the forecasted distribution. Ideally, j % of the 390 observation should lie below the jth percentile of the forecasts. Additionally, confidence intervals were estimated to account for the varying amount of values per bin (Wilks, 2011). Again, forecasts were verified with 1-minute-mean SCADA data.
Also for the evaluation of probabilistic forecasts, persistence was used as a reference. Here, we generated a probabilistic persistence forecast by adding the errors of the 19 previous time steps to the forecast, as suggested by Gneiting et al. (2007).

Unstable stratification 395
In Table 3 the average crps of persistence and the LF is compared for turbines T1-T7 for unstable situations for all available forecasts as well as all simultaneously available ones. Here, forecasts of turbines T1-T4 are sharper and better calibrated than persistence, while for T6 and T7 persistence outperforms the LF. When considering only simultaneously available forecasts, persistence only outperforms the LF forecast for T7. These results are in good agreement with the deterministic scores, indicating that the LF achieves better quality in unstable conditions as long as sufficient wind field data is available.

400
An exemplary time series of the lidar forecast for unstable stratification is shown in Figure 13. The turbine's 1-minute-mean SCADA power, the LF's mean values and persistence are plotted in blue, red and green, respectively. Each marker represents Figure 13. An example 1.5-hour time series of 5-minute-ahead lidar power forecast for unstable stratification at turbine T3 shown in red.
Confidence intervals are visualised as shaded grey areas from 5 % to 95 % in 10 % intervals. The blue curve shows 1-minute-mean SCADA data of T3. one forecast, generated with a temporal resolution of about 2.5 minutes. Shaded grey areas around the lidar forecast's mean indicate 5 % to 95 % prediction intervals in 10 % steps. Generally, the LF is able to follow the observed power more accurately than persistence does. Starting at 15:09 UTC a ramp event occurs, with a power drop from 92 % to 42 % within a time interval 405 of 13.5 minutes. The LF predicts the ramp event quite accurately. Another extreme power drop of 40 % within 5 minutes can be observed at 16:21 UTC, also well captured by the lidar forecast. For both cases, persistence strongly overestimates the power.
The width of the prediction intervals ranges from 18 % to 48 %. Broader intervals might be an indicator for higher uncertainties associated with the forecast. At all times except for two time steps, the intervals are able to capture the true power fluctuations.
In Figure 14 the reliability diagram of turbines T1-T7 is depicted for persistence as well as the lidar-based forecast for the unstable cases. For none of the seven turbines, persistence is well-calibrated, but shows large discrepancies to the diagonal black line, which would indicate a perfect calibration. For T3 about 27 % lie below the 5 % quantile, while only 73 % lie below the 95 % quantile. All turbines have very similar reliability. The calibration of the LF is in general better than for persistence, 415 especially for turbines T2 to T4. For low quantiles, for all turbines 7 %−14 % of the LF lie below the 5 % quantile. Considering the confidence intervals assigned to those values, here the forecasts are well-calibrated. For high quantiles, the turbines show large differences in reliability. While T3 is comparatively well-calibrated with 86 % below the 95 % quantile, T7 is hardly calibrated with a value of 77 %. The generally too low values for large quantiles suggest that a higher probability needs to be assigned to higher power values (Hamill, 1997).

Stable and neutral stratification
We compare the average crps for stable and neutral conditions of persistence and the LF in Table 4. Here, persistence is generally more accurate than the LF. Again, persistence's quality is considerably better compared to unstable situations, while that of the LF is strongly reduced. The reliability diagrams depicted in Figure 15 demonstrate that persistence is also better calibrated in stable and neutral cases, however, still about 20 % of the forecast lie below the 5 % quantile while only about 425 80 % lie below the 95 % quantile. Again, all turbines show very similar results. Using the LF, especially at high quantiles, fewer observations than expected lay below the respective quantiles, indicating that higher probabilities need to be assigned to larger values (Hamill, 1997). T2, T3 and T4 are best calibrated with 79 %-80 % below the 95 % quantile. For all other turbines, in particular those positioned in areas with low lidar scan coverage, results are worse than for persistence.

430
We introduced a minute-scale forecasting methodology for long-range single-Doppler lidar measurements and used it to predict the power of seven free-flow turbines of the offshore wind farm Global Tech I. The proposed model was developed as an extension of and an alternative to existing methods and is applicable to far offshore sites. Emphasis was hereby put onto the use of single-Doppler measurements as compared to dual-Doppler set-ups. In the following, we discuss the model's ability to skillfully predict power under different atmospheric conditions. Moreover, limitations, possibilities and necessary adjustments concerning the forecast horizon are assessed. Finally, we qualitatively analyse the forecast uncertainty.

Forecasting skill for different atmospheric conditions
While the LF was able to predict wind power more reliable than persistence for unstable situations, the methodology failed when applied during stable stratification. Generally, we would expect the assumptions of a homogeneous wind field and negligible vertical wind speed component, that are the basis of the wind field reconstruction, to be less applicable during unstable 440 situations when high amounts of thermal buoyancy cause strong vertical mixing (Stull, 2017). Also, the Lagrangian advection technique is expected to be more accurate for stable cases as during unstable situations vertical mixing considerably impacts the flow . Valldecabres et al. (2018b), for instance, found that for far ranges wind speed fluctuations are not well captured by the applied wind field reconstruction methods. This implies that especially unstable situations cannot be predicted well. We, therefore, suppose that the low forecast skill observed during stable and neutral stratification is not related Further, the height difference causes errors in wind vector advection. We chose to propagate wind vectors to the target turbines prior to the wind speed extrapolation. Vectors were thus propagated with the lower wind speed at measuring height compared to that at hub height. This suggests wind vectors arrive at the turbines slightly delayed, with the extend of the delay related to the measuring height. We assume this reduces the forecast skill. As the increase of wind speed with height is larger 455 during stable stratification, we expect the effect to be more distinct for those cases. In this case study, the alternative, i. e.
extrapolating wind speed before propagation, caused even larger errors compared to the ones presented in Section 4. That means, here the propagation of wind vectors associated with large errors due to wind speed extrapolation has a stronger impact on the forecast accuracy than the advection at lower heights.
For future applications, a more accurate description of the wind profile, especially in stable situations, is required to further 460 improve the forecast skill. That also includes a more accurate estimation of stability and therefore demands reliable meteorological measurements. Additional profile information could, for example, be collected using range height indicator (RHI) lidar scans or data from a nearby met mast.
While the benchmark persistence yields good forecasts for stable and neutral situations, it has obvious shortcomings for strongly fluctuating situations and ramp events. Its comparison to the LF model has shown the latter's ability to predict such 465 situations better (Figure 13). Especially the probabilistic forecast has proven to be more skilful compared to persistence as it provides better-calibrated estimations of prediction intervals. We thus consider the lidar-based forecast a valuable addition to the benchmark persistence during unstable situations.

Forecast horizon and scanning trajectory
In this work, we developed a 5-minute-ahead power forecast. In order for remote sensing-based forecasts to be useful for power 470 grid balancing and electricity trading, the forecast horizon needs to be extended further (Würth et al., 2019). The accuracy of the lidar-based forecasts is expected to decrease with increasing lead times, however, Würth et al. (2018) found the accuracy of the state-of-the-art persistence to decrease faster. Lidar-based forecasts thus have the potential to bridge the gap between persistence and hour-ahead forecasts.
Small lidar systems suitable for offshore campaigns typically reach measurement distances from 8 km to a maximum of A large disadvantage of single-Doppler lidar data is the need to exclude a critical region with 75 • < |ϑ − χ| < 105 • as a 485 consequence of the VAD fit (Section 3.1), which enhances this effect. The extent to which specific turbines are affected also depends on wind direction. Additionally, an inaccurate adjustment of the scanning trajectory to the wind direction can reduce data availability.
Furthermore, the long duration of the scans in this analysis caused the need for a time synchronisation and reduced the achievable forecast horizon after the end time of the scan significantly (Section 3.3). Possibilities to reduce the scan time are 490 i) an increased scanning speed, which reduces the maximum measurement distance, ii) a lower azimuthal resolution, which introduces errors to the wind field interpolation especially for far range gates, and iii) a reduced total azimuth spanned, which further reduces forecast availability and quality for some of the target turbines. To make reliable statements regarding the optimal lidar position and scanning trajectory, a more detailed analysis of forecast quality for different wind directions and scan geometries is necessary. This should also include a study on the effect of reduced scanning time on forecast skill.

Uncertainty estimation and data availability
We already mentioned the errors attributed to the extrapolation of wind speeds to hub height, namely uncertainties in stability and wind profile estimation as well as an inaccurate determination of measuring height. While measuring at hub height would reduce the need for a wind speed extrapolation, it would introduce new challenges as the correction for significant stationary and dynamic inclination of the scan plane due to the flexibility of the wind turbine tower and its dynamic excitation. In our case study, we had to correct for wind speed dependent platform inclination despite the fact that the lidar was positioned on a comparably stiff platform on a tripod foundation of the offshore turbine at GT I.
We further have to consider errors during the wind field reconstruction, including the estimation of global wind direction by means of a VAD fit, assuming a homogeneous flow and neglecting the vertical wind component. Those wind direction errors, uncertainties in azimuth, elevation and range gate of the lidar system, as well as errors of the measured line-of-sight 505 velocities, all contribute to the uncertainties in the estimation of the horizontal wind field. The use of dual-Doppler instead of single-Doppler data would allow for a more accurate estimation of horizontal wind speed components and would likely decrease the associated errors significantly. We further expect the propagation of wind vectors by means of their local wind speed and direction, both assumed constant along the entire trajectory, to introduce some uncertainties, enhanced by the errors assigned to its input parameters. Another large contribution to the overall forecast error is the transformation from wind speed 510 to power values, as uncertainties in wind speed are magnified due to the cubic nature of the power curve. As discussed earlier, we found the above-mentioned uncertainties to not only depend on the lidar set-up, but also on atmospheric condition. Detailed knowledge of the forecast uncertainty is important to be able to further assess the possibilities and limitations of the proposed method and to reduce sources of error.
The variety of uncertainties associated with the model emphasises the importance of the probabilistic approach as it allows 515 us to account for some of them. The area of influence hereby plays a crucial part to determine the probabilistic forecast. The AoI estimated in this case study is five times smaller than the one Valldecabres et al. (2018a) defined in their work, despite applying the same methodology. We explain this by the many factors influencing the crps and consequently AoI, i. e. the lidar wind field, the SCADA time series and the number of wind vectors available to be propagated. The difference in AoI suggests that it needs to be determined individually for each data set.

520
As already mentioned, the VAD fit caused the estimated wind direction to be constant across azimuth angles and only vary with range gates. This likely had an impact on the individual wind vectors reaching the area of influence. The uniform wind direction across range gates restricted the area from which vectors could be propagated to the target turbines. We assume this led to a misestimation, most likely an underestimation, of the spread of the observed wind speeds. Consequently, it is anticipated that the spread of the forecasted wind power distribution is too small. When using dual-Doppler measurements, 525 wind directions could be determined individually for each measurement point and the forecast's distribution represented more accurately.
Another limitation of the LF is its need for high data availability. Lidars send out laser pulses and use the backscattered signal to estimate wind speed. If not enough or too many aerosols are in the air, the signal becomes noisy (Newsom, 2012).
That means e. g. during rain and fog, no accurate lidar measurements will be available and no forecast can be generated. One 530 solution might be the development of a hybrid method that does not solely depend on the availability of lidar measurements.
We developed a methodology to forecast wind power of individual wind turbines on very short time horizons based on single-Doppler long-range lidar scans, as a feasible alternative to existing remote-sensing based forecasts that is applicable to far offshore sites. The work is based on a probabilistic forecasting model developed for dual-Doppler radar measurements. It 535 was extended to include a dynamic filtering approach, a time synchronisation of the lidar scans and an extrapolation of wind speeds to hub height. The model was tested in a case study at the offshore wind farm Global Tech I. Here, we predicted wind power of seven free flow wind turbines with a 5-minute horizon. The lidar-based forecast was able to predict wind turbine power skillfully compared to the benchmark persistence during unstable atmospheric conditions, as long as sufficient wind field information was available in the region from which the wind vectors were propagated to the turbine of interest. During 540 stable and neutral conditions the forecast quality was reduced. We mainly attribute this to higher uncertainties in the wind speed extrapolation to hub height during stable conditions, as a consequence of the nature of the stability corrected logarithmic wind profile. To outperform persistence for stable situations a more accurate description of the wind profile, e. g. using reliable meteorological information, is required.
Future work aims to include the modelling of wake effects in the forecast, allowing to forecast power not only for free-flow 545 turbines.
Data availability. Lidar data could be made available on request. GT I SCADA data is confidential and therefore not available to the public.
Author contributions. Frauke Theuer performed the main research and wrote the paper. Marijn Floris van Dooren contributed to the scientific discussion, the outline and review of the manuscript. Lueder von Bremen and Martin Kühn supervised the research, contributed to the scientific discussion, the research concept and the outline and thorough review of the manuscript.
the German Federal Environmental Foundation (DBU) as this project receives funding within the scope of their PhD scholarship program.
We thank Jörge Schneemann and Stephan Voß for conducting the measurement campaign and supporting our lidar data analysis and Andreas Rott for his help characterising the lidar misalignment.