Parameterization of Wind Evolution using Lidar

Wind evolution refers to the change of the turbulence structure of the eddies over time while the eddies are advected by the main flow over space. With the development of the lidar-assisted wind turbine control, modelling of the wind evolution becomes an interesting topic, because the control system should only react to the changes in the wind field which can be predicted accurately over the distance to avoid harmful and unnecessary control action. This paper aims to achieve a parameterization model for the wind evolution model to predict the wind evolution model 5 parameters according to the wind field conditions. For this purpose, a two-parameter wind evolution model suggested in literature was applied to model the wind evolution and the wind evolution was estimated using lidar data. A statistical analysis was done to reveal the characteristics of wind evolution model parameters. Gaussian process regression was applied to achieve the parameterization model. The results have proven the applicability of Gaussian process regression model to predict the wind evolution model parameters with sufficient accuracy. 10


Introduction
Wind evolution refers to the physical phenomenon that the turbulence structure of the eddies changes over time while the eddies are advected by the main flow over space. The mathematical measure of the "change" of turbulence structure is usually defined as magnitude-squared coherence (hereinafter for brevity, also referred to as coherence) between two time series data sets of the turbulent velocity with certain time shift. Coherence is a dimensionless statistic in the frequency domain that can describe 15 the correlation between two signals or data sets, taking value between 0, for no correlation, to 1.0, for perfect correlation.
Whereas, Taylor's hypothesis is a special case that assumes the pattern of turbulent motion remains unchanged while the eddies are carried by the main flow (Taylor, 1938). In other words, Taylor's hypothesis assumes a perfect correlation, which means the coherence is 1.0 over the whole frequency range.
The research on wind evolution dates back to 1970s. Pielke and Panofsky (1970) attempted to generalize some of the 20 mathematical description of the horizontal variation of turbulence characteristics. The final goal at that time was to figure out an empirical model of the four dimensional space-time structure of turbulence. In Pielke and Panofsky's work (1970), the coherence model suggested by Davenport (1961) to describe the correlation between horizontal wind components at different heights, also known as Davenport Geometric Similarity, was extended into other wind components and separation directions. points simultaneously, which is a great advantage for studying the dependency of wind evolution on separation compared to meteorological tower. Some lidar data from two measurement campaigns undertaken in different terrain types are available.
If any data of the meteorological tower is also available in the corresponding measurement campaigns, it is involved in the analysis to provide a comparison.
This study is carried out as follows: After the preliminary data processing, wind evolution is estimated with lidar data 70 and fitted to a wind evolution model to obtain the model parameters. The distributions of the obtained wind evolution model parameters are analyzed to figure out some common characteristics. Then, the parameterization model is achieved by training a Gaussian Process Regression (GPR) model. The Automatic Relevance Determination Squared Exponential kernel (ARD-SE kernel) is applied to estimate the relative importance of the different wind field condition variables and to select the suitable input variables for the GPR model. Finally, the performance of the parameterization model is evaluated with a 5-fold cross-75 validation.
The present paper is organized as follows: Section 2 briefly explains the theoretical basis of wind evolution and the concept of the parameterization model as well as the principles of the applied methods in this work; Section 3 introduces the involved measurements and the procedure of the data processing; Section 4 presents the results of the statistical analysis of the wind evolution model parameters; Section 5 illustrates the process of the model training and the evaluation of the obtained 80 parameterization models; Section 6 summarizes the presented results again and gives the conclusions and outlook of this study.

Methodology
This section first explains the mathematical definition of wind evolution in Sect. 2.1. Then, the model concept for the prediction of the wind evolution and the corresponding workflow of this study are presented in Sect. 2.2. After that, the wind evolution model applied in this work is introduced in Sect. 2.3. Finally, the details of the workflow are introduced and discussed in Sect. 85 2.3-2.7.

Wind Evolution
As mentioned in the introduction, wind evolution is mathematically defined as the magnitude-squared coherence between two wind speed signals i and j measured at two points separated in longitudinal direction, with i for the signal measured at upstream where S ii (f ) and S jj (f ) represent the power-spectral densities (PSDs) of signals i and j, respectively, and S ij (f ) represents the cross-spectral density between i and j. It must be emphasized that the coherence corresponds to a lagged correlation, which means the signal j should be shifted by the travel time ∆t that the signal i is expected to arrive at the downstream point for calculation of the coherence.

Concept and Workflow
It is aimed to predict the wind evolution, i.e. the coherence, but it is not possible to predict every point of the coherence curve. Therefore, a model is needed to describe the coherence with limited parameters, which then can be predicted by a parameterization model, i.e. a surrogate model. That is the wind evolution model. Figure 1 shows the prediction concept of this work. The idea is first to predict the model parameters of the wind evolution model according to the wind field conditions by 100 a parameterization model. Then, the coherence can be reconstructed with the wind evolution model using the predicted model parameters.
Based on this concept, the study is carried out as follows (see Fig. 2): 1) estimation of the coherence using lidar data; 2) determination of the wind evolution model parameters by fitting the estimated coherence to the wind evolution model; 3) calculation of the potential predictors from the measured data; 4) training the Gaussian process regression model. More details 105 are introduced in Sect. 2.3-2.7.

Wind Evolution Model
As explained above, it is very important to choose a wind evolution model that can match the trend of the coherence with only several simple parameters. A two-parameter wind evolution model, similar to the Simley's model (2015), is used in this study. Following the theoretical considerations by Ropelewski et al. (1973), the coherence decreases exponentially with increasing 110 travel time ∆t of the signal with respect to "eddy turnover time" τ : where the term C represents the decay behaviour of the coherence depending on the time ratio. C is here virtual which can be a constant, a linear function, and even more complicated term.
τ is a time scale associated with the characteristic eddy size λ and characteristic velocity of turbulence which is approximated 115 by the standard deviation of wind speed σ as following: This expression implies that eddies are supposed to decay faster under a high turbulence. Given the same degree of turbulence, large eddies are supposed to take longer time to decay.
The eddy size λ is linked to the frequency f and the flow mean wind speed U with this relation: Combining Eq.
(2)-(4) and introducing the second parameter in the model, as inspired by Simley's model (2015), the wind evolution model is finally defined as: where the decay parameter a, which summarizes the term C and the other deduced terms, represents the decay effect of the 125 coherence, and the offset parameter b is used to adjust the intercept (coherence for 0 frequency) of the modeled coherence curve.
The intercept equals exp(−|b|). The both wind evolution model parameters are dimensionless. The benefit of introducing the offset parameter b is shown in Sect. 3.2. The travel time ∆t is determined by the time lag of the peak of the cross-correlation between two signals, indicated as ∆t maxcorr . In some previous studies, ∆t is approximated by d U using Taylor's hypothesis (d is separation), indicated as ∆t Taylor . Considering the effect of induction zone of the wind turbine, this approximation is not 130 applied in this study. The term f · ∆t is dimensionless, and thus is defined as dimensionless frequency f dless .
To give an overview of the wind evolution model, Fig. 3 shows the theoretical curves calculated with different values of a and b as example.

Estimation of Coherence using Lidar Data
In this work, the coherence is estimated with lidar data because lidar can provide more data sets with respect to different 135 measuring separations for the estimation of the coherence, which is not easy to obtain from data of meteorological towers. And the prediction of the coherence is mainly expected to be applied coupled with the deployment of a lidar, e.g. in lidar-assisted wind turbine control.
A Doppler wind lidar is a remote sensing device measuring wind speed based on the optical Doppler effect. Lidar emits laser pulses and detects the Doppler shift in backscattered light from the aerosol particles in the atmosphere that are entrained with 140 the wind. The Doppler shift is proportional to the line-of-sight wind speed, i.e. the wind speed projected on the laser beam, and thus can be used to estimate the line-of-sight wind speed. The measurement principle of Doppler wind lidar is explained in many publications (see e.g. Weitkamp, 2005;Peña et al., 2013;Liu et al., 2019) and thus is not introduced here in detail.
However, it must be emphasized that the coherence estimated with lidar data is deviated from that estimated with data of point measurements, e.g. from two ultrasonic anemometers. There are several reasons for that: 1) The sampling rate of a lidar 145 is generally much lower than that of an ultrasonic anemometer; 2) Lidar does not measure the wind speed at a point but the averaged wind speed over the measurement volume along the laser beam because of its measurement principle, which is the so-called spatial averaging effect of lidar; 3) Lidar can only measure the wind speed projected on its emitted laser beams, i.e. the light-of-sight wind speed. The influence of these three aspects is discussed as follows, specifically considering the case of starring mode of lidar:

150
Low sampling rate of lidar. According to the sampling theorem (oppenheim et al., 1997), the upper frequency limit of a signal transformed from the time domain into the frequency domain is the half of the sampling frequency. As long as the sampling rate of lidar is sufficiently high to acquire a complete coherence curve, it will not have a noticeable effect on the study of the coherence. To obtain as high sampling rate as possible, it is decided to select the data of the staring mode to calculate the coherence. The staring mode of lidar generally means the lidar measures the wind speed with a single laser beam pointing 155 towards a fix direction. Specifically in this work, the laser beam points to the upstream of the wind turbine horizontally.
Spatial averaging effect of lidar. The spatial averaging effect can be modelled with a moving window averaging weighted by a Gaussian-like shape function centered at the measurement point.
Consider a pulsed lidar (only pulsed lidars are involved in this work), the weighting function w(x) is an even function centered at every measurement point along the laser beam. The lidar measured wind speed at the measurement point x 0 for any 160 instant can be modelled with: where u p (x) is a function of the point measurement of the wind speed with respect to the x-axis aligned with the laser beam of the lidar.
According to the convolution theorem (oppenheim et al., 1997), the following relationship is valid for the Fourier transfor-165 mation between space and wavenumber domain: where F{ } is the Fourier transform operator.
Follow Eq. (1), the coherence estimated with lidar data, indicated with the subscript "l", is:

170
where S ii,l (f ) and S jj,l (f ) are the auto-spectrum at the point i and j, respectively, S ij,l (f ) is the cross-spectrum between i and j, and f is the frequency in Hz. They are all estimated with lidar data. The auto-spectrum is: where u i,l (t) is the time series of the wind speed at i, and the symbol * means conjugate. And the cross-spectrum is:

175
Assume the laser beam is aligned with the wind direction and the Taylor's hypothesis is valid, Eq. (7) is also valid for the Fourier transformation between time and frequency domain by applying t = x U , U is mean wind speed. Thus, Eq. (8) can be written as (t and f are omitted for clarity): Because the function w(x) is real and even, according to the conjugate symmetry of the Fourier transformation (oppenheim This means that the spatial averaging effect does not influence the coherence if the above-mentioned assumptions are fulfilled. Misalignment of wind direction and lidar measurement. The above derivation is based on an important assumption that 185 the laser beam is aligned with the wind direction. This will not always be fulfilled in the reality even for a nacelle based lidar conducting the starring mode. Figure 4 shows a sketch for a case of misalignment of wind direction and lidar measurement. The coherence estimated with lidar data is γ 2 12 . The coherence of the two longitudinal wind components at the point 1 and 2 and that of the corresponding line-of-sight wind speeds are the same when the both angles are the same, because the angles only introduce a same constant 190 in the respective wind speed time series. However, γ 2 12 is no longer the pure longitudinal coherence only with respect to the longitudinal separation but the horizontal coherence (Panofsky and Mizuno, 1975), combined the effects of the longitudinal coherence γ 2 23 and the lateral coherence γ 2 13 . To retrieve the longitudinal coherence in this case, the above discussed spatial averaging effect must be coupled to a specific turbulence model (Schlipf, 2015;Mann et al., 2009), and thus the wind evolution model is included in the final model implicitly.

195
In this study, it is decided to develop the parameterization model based on the horizontal coherence for the following reasons: First, a standalone parameterization model, independent from any turbulence model, is desired for more flexibility in application. Second, the determination of the model parameters of an implicit wind evolution model is complicated and it is a must to acquire the angle between the wind direction and the laser beam, which is not always possible in application, especially when considering lidar as the only data source. Since the prediction concept is expected to be applicable for different 200 data availability. It is not desired to make the estimation of coherence depend on a variable which is not always guaranteed to be available. Third, consider the case for a nacelle based lidar, the presence of the misalignment of the lidar measurement means the wind turbine misalignment exists as well. In this case, it makes sense to predict the corresponding horizontal coherence.  Whereas, the variation in the horizontal coherence caused by the direction misalignment is attributed to the misalignment angle, which is considered as one of the predictors (see Sect. 2.5).

205
Certainly, if the measurement of the direction misalignment is available in a given application scenario, the prediction concept can be easily adjusted by changing the model used in the second step in the workflow (see Sect. 2.2) to obtain the wind evolution model parameters.

Potential Predictors
Three groups of wind field condition variables are supposed to be relevant to wind evolution, and thus will be the potential 210 predictors of the surrogate model. A discussion about these variables is presented as follows: Wind statistics. As explained in Sect. 2.3, wind evolution is supposed to be correlated with the degree of turbulence or rather turbulent kinetic energy. Therefore, standard deviation σ, turbulence intensity I T , integral length scale L, and integral time scale T are considered as representative variables. The integral time scale T is defined as (Pope, 2000):

215
where ρ(s) is the autocorrelation function. In this study the integral time scale is computed by integrating the autocorrelation up to the first zero crossing. Since it is not possible to calculate the integral length scale L based on its definition, it is approximated with the integral time scale T : Moreover, it is interesting to study the influence of other statistical characteristics on wind evolution, such as the third 220 standardized central moment skewnessμ 3 , a measure of the asymmetry of the distribution of the wind speeds in a data block, and the fourth standardized central moment kurtosisμ 4 , a measure of the "tailedness".
Atmospheric stability. The atmospheric stability represents a global effect of the boundary layer on the wind field, and thus it is considered as an influence factor on the wind evolution. The atmospheric stability is usually characterized with Monin-used as stability indicator. Because it is found a functional relationship between the dimensionless height ζ and the gradient Richardson number for flat terrain (Businger et al., 1971), these two variables are supposed to be equivalent for representing the influence of stability in the surrogate model. The dimensionless height ζ is defined as (Businger et al., 1971): where κ is von Kármán constant, g is gravitational acceleration, z is the measurement height, θ is average of potential temper-230 ature, u * is friction velocity, and w θ v is covariance of perturbations of vertical velocity and virtual potential temperature.
Relative position of the measurement points. The distance between the two measurement points d is likely to play an important role in the wind evolution because it determines how far the eddies travel, and thus how likely or to what extent the local terrain changes. So as the travel time ∆t. For prediction, it is not possible to obtain ∆t maxcorr . Therefore, the travel time approximated using Taylor's Hypothesis ∆t Taylor is considered as predictor.

235
As discussed in Sect. 2.4, the angle between the lidar beam and the main wind direction α is associated to the influence of the lateral coherence on the horizontal coherence, and thus is considered as a relevant variable. And it is important to analyze to what extent the horizontal coherence would depend on this angle.
It is emphasized that the variables discussed above are "potential" predictors of the parameterization model according to physical consideration. Further analysis needs to be done to determine which of them should be selected for model training, 240 which is called feature selection. The principle of feature selection is to figure out variables which provide best predictive power (accounting to most of the variation of the model responses) and ideally, these variables should be independent from each other to prevent over-fitting in model training. Since one of the objectives is to figure out the necessary predictors under different data availability, different combinations of predictors are discussed in Sect. 5.
The Table 1 lists the notations of the potential predictors used in this work. These variables are acquired from data of lidar 245 and/or ultrasonic anemometers according to the data availability of the measurement campaign. This is distinguished with subscripts: "l" for lidar and "s" for ultrasonic anemometer. For example, U l means mean wind speed calculated with lidar data while U s means that calculated with data from ultrasonic anemometer (hereinafter for brevity, also referred to as sonic data).

Gaussian Process Regression
In the preliminary study it is found out that the Gaussian Process Regression (GPR) overall performs the best for the prediction 250 of the wind evolution model parameters (Chen, 2019). Therefore, in the present work, attempt is made to further analyze the applicability of the GPR model for this purpose taking into account different measurement scenarios. This section briefly introduces the principle of the Gaussian process regression (GPR) and the most important model hyperparameters which modify the behaviour of a GPR model. The model training is done using Matlab Statistics and Machine Learning Toolbox.
The principle of GPR. Think of making a regression model from some data. A very intuitional approach is fitting the data 255 to certain types of function, e.g. linear function or polynominal function. However, this requires an initial guess about the underlying functions of the data, which is very difficult in this case, because the wind evolution model parameters do not µ3 skewness of wind speed [-] µ4 kurtosis of wind speed [-] IT turbulence intensity [-] α angle between lidar beam and main wind direction [°] indicate any clear dependency on the potential predictors. The reasons could be multiple: first, the data could be noisy; second, the dependency could exist in multidimensional space which is not possible to be observed in a single dimension, etc. Under this circumstance, GPR turns out to be a good choice because it is non-parametric probabilistic model, which means the model 260 is not a "fix" function but a probability distribution over functions. The principle of the GPR is based on the Bayesian inference.
The prior distribution over functions, which can be understood as a guess about what kind of functions could be present without knowing the data, is specified by a particular Gaussian process (GP) which favours smooth functions. In the training process, as adding the data, the distribution over functions is adjusted in the way that the probability of the functions which do not agree with the observations will be decreased, which gives the posterior distribution over the functions (Rasmussen and Williams,265 2006).
Hyperparameters of GPR. The behaviour of a GPR model is defined by its hyperparameters. To introduce the hyperparamters, a basic explanation in mathematical aspect is given following Rasmussen and Williams (2006).
The GPR is based on the Bayesian inference. The Bayesian linear regression model with Gaussian noise is defined as: 270 where x is the input vector of different parameters, φ(x) is the function which maps the input vector into a higher dimensional space where the Bayesian linear model is applicable, w is the weight vector of the linear model, f (x) is the function value, y is the observed value, and ε is independent identically distributed Gaussian noise with zero mean and variance σ 2 n : The Bayesian linear model is a GP given the prior of w is Gaussian distributed with zero mean. Since a GP is fully specified 275 by its mean and covariance, it is written as: where X is the aggregation of all input vectors. This is the prior distribution over functions. The presence of ε shows another advantage of the GPR that it is able to inherently assume noisy observations and take into account this effect in the model. σ n is one of the hyperparameters.

280
It is common but not necessary to assume GPs with zero mean function. The use of basis functions makes it possible to specify a non-zero mean over functions. So, the model can also be assumed as: where h(x) are a set of basis functions and β are the corresponding coefficients. Basis function is one of the hyperparameters.
The coefficients β are estimated with training data.

285
The covariance of the function values is usually acquired through a kernel function: There are two types of kernel functions: one is kernel functions with same characteristic length scale for each predictor; the other is that with separate characteristic length scale. The latter is called Automatic Relevance Determination kernel functions which can be used to select predictors. Kernel function and its characteristic length scale are hyperparameters of GPR model.

290
In this work, the Automatic Relevance Determination Squared Exponential kernel function (ARD-SE kernel) is applied.
ARD-SE kernel function is basically the squared exponential kernel function with a separate characteristic length scale σ m for each predictor m (m is the index of predictors). For any pairs of observations i, j, the ARD-SE kernel function is defined as: where σ 2 f denotes the signal variance, which determines variation of function values from their mean. The length scale σ m im-295 plies the sensitivity of the function being modeled to the predictor m. A relatively large length scale indicates a relatively small variation along the corresponding dimensions in the function, which means these predictors are less relevant in comparison to the others (Duvenaud, 2014).
In the end, the key predictive equation for GPR can be derived by conditioning the joint Gaussian prior distribution on the observations: where the subscription * denotes test data. f * represents f (X * ) for convenience.
To summarize, the hyperparameters defining the GPR model are the basis function h(x), the noise standard deviation of the Gaussian process model σ n , the kernel function K(x i , x j ), the standard deviation of the function values σ f , and the length scale in the kernel function σ m . These hyperparameters can be tunned in the training process to achieve a better model.

Model Validation
The trained model is evaluated with a k-fold cross-validation, in which the data is divided into k disjoint, equally sized subsets.
The model validation is done with one subset (also called in-fold observations) and the training is done with the rest k-1 subsets (also called out-of-fold observations). This procedure is repeated k times, each time with a different subset for validation. The predicted target values and the goodness-of-fit measures of the regression models are computed for in-fold observations using 310 a model trained on out-of-fold observations. In this study, 5-fold cross-validation was applied.
The model performance is evaluated with two goodness-of-fit measures: the Root Mean Square Error (RMSE) and the coefficient of determination (R 2 ), which are defined with Eq. (23) and Eq. (24), respectively.
In Eq. (23) and Eq. (24) y and y pred denote the observed target values and the predicted one, y denotes the average of the observed target values, N denotes the number of observations.
It is worth mentioning that according to this definition R 2 can be understood as taking prediction with mean value of the observations as reference to evaluate the model performance. In this case R 2 ranges from −∞ to one, for perfect prediction.

320
R 2 yields zero if the prediction is simply made with the mean value of the observations. The higher the R 2 is, the better the model performs. A negative R 2 indicates that the prediction with the selected model is even worse than the prediction just using the mean value of the observations.

Data Processing
This section first introduces the data sources in Sect. 3.1 and then explains the procedure for the determination of the wind 325 evolution model parameters in Sect. 3.2.

Data Source
In this study, measured data from two research projects are involved. The reasons for using two different data sources are on the one hand to find commonality in two different measurements and avoid accidental conclusions, and on the other hand, to study whether there are differences or what kind of differences in the wind evolution can be observed. The relevant research 330 projects as well as the measurement campaigns are briefly introduced as follows: LidarComplex. The research project LidarComplex was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi). In this project, a lidar measurement campaign was carried out in Grevesmühlen, Germany. The measurement site is basically flat, mainly farmland with hedges and few large trees. The SWE Scanner 1.0, which has five measurement range gates (can be understood as measurement distances), was installed on a wind turbine with a roter diameter of 109 m 335 in the measurement campaign. A meteorological mast is located 295 m southwest of the wind turbine on which the lidar was installed. The SCADA data of the wind turbine is also available. More details about the measurement campaign can be found in Schlipf et al. (2015).
The data of the ultrasonic anemometer installed on 93 m height is used in this study. Instead of the meteorological coordinate system, the wind coordinate system with x-axis aligning to the main wind direction is more useful for the analysis of wind 340 evolution. Thus, the longitudinal (indicated with the subscript "x") and the lateral (indicated with the subscript "y") wind speeds are obtained by projecting the high-resolution time series wind speed on this coordinate system. The main wind direction is determined with the mean wind direction for each data block.
ParkCast. The research project ParkCast 1 is an ongoing project funded by the German Federal Ministry for Economic Affairs and Energy (BMWi). Currently, a lidar measurement campaign is being conducted on the offshore wind farm alpha 345 ventus. Two long-range lidars StreamlineXR have been deployed in the measurement campaign. The data used in this study was gathered by the one installed on the nacelle of the wind turbine AV4 measuring the inflow stream. Unfortunately, the data of the FINO1 and the SCADA data of the AV4 for the observed period was not yet available when the analysis was done.
Therefore, it has to be assumed that the wind turbine had no direction misalignment during the observed period and the lidar was always measuring along the main wind direction.

350
In comparison to ultrasonic anemometers, lidar systems have much lower sampling rate because of its measurement principle. To obtain as high sampling rate as possible, it is decided to select the measurement periods where the staring mode was conducted for the both campaigns.
Essential information about the measurements are summarized in the Table 2. Figure A1 gives an overview of the wind statistics of these two selected measurement periods by illustrating the relative frequency distribution of lidar measured wind 355 speed and lidar measured turbulence intensity. For brevity, "LidarComplex" and "ParkCast" are used to refer the selected measurements throughout the paper.

Determination of Wind Evolution Model Parameters
To obtain the wind evolution model parameters a and b, the wind evolution is estimated with lidar data and then fitted to the wind evolution model (Eq. 5). The processing procedure is described as follows:

360
Step 1: Filtering of the lidar data.
The lidar data from LidarComplex is filtered using a CNR filter with the valid range from -24 dB to -5 dB.
The lidar data from ParkCast is filtered with a range filter which is originally applied in image processing but adapted for lidar data filtering for the long-range lidar by Würth et al. (2018). In comparison to the CNR filter, the range filter removes less valid data points while ensuring the filter quality. Its approach is to detect the extreme values which exceed the threshold 365 within certain number of adjacent data points defined by the window size. A range filter is applied to check the line-of-sight wind speed of lidar and its standard deviation. The thresholds for both are 6 m/s and 3 m/s, respectively. The window size is set as three data points.
The lidar data is divided into 30-min blocks. This is also consistent with the common used period for calculation of the 370 Obukhov length. Only the data blocks with more than 80% valid data points are used to estimate the coherence. The missing values are interpolated with the shape-preserving piecewise cubic interpolation. And the missing end values are replaced with their nearest value.
The data measured at different range gates (i.e. measurement distance) is paired in the way shown in Fig. 5 to obtain as many samples (i.e. data blocks) as possible. The pairing has C 2 N possibilities (N is the number of the lidar range gates).

375
The travel time of the wind field is approximated with the time lag at the maximum of the cross-correlation ∆t maxcorr between these two wind speed signals. The upstream point is always regarded as reference point. The data measured at the downstream is shifted by ∆t maxcorr .
The magnitude-squared coherence is estimated using Welch's overlapped averaged periodogram method using Hamming window, 24 segments, and 50% overlap.

380
The data of the reference point is used to calculate the lidar measured wind statistics.
Step 3: Fitting to the wind evolution model.
Because both lidars are installed on the top of the wind turbine nacelle and the nacelle is actually moving with certain frequencies, the focus points of lidar are thus moving as well. This movement causes excitation at certain frequencies in the estimated coherence.  Figure A2 shows a comparison between an example coherence curve and the power spectral density (PSD) of the fore-aft and in-plane tower top acceleration of LidarComplex. The excitation in the coherence conforms to that in the both PSDs and is mainly located in the frequency range higher than 0.2 Hz. To avoid the negative effect on the fitting quality caused by this excitation, the cut-off frequency is set to be 0.2 Hz and the coherence is only fitted up to this cut-off frequency.
The fitting is done through nonlinear least squares fitting using Levenberg-Marquardt algorithm. Only the data blocks with 390 R 2 of the fitting higher than 0.8 are considered as valid samples.
The final filtering was done by checking the value distribution of every parameter to omit outliers. It is emphasized that outlier is not necessarily wrong data. For some cases, outlier is just sample collected from the value range where not enough samples have been collected. Though, it is very important to filter the outliers properly because it is difficult for the regression 395 model to capture the relationship for those value ranges with few samples. The outliers are determined as the data with value higher than the 99 th percentile of the data. Figure 6 is an example plot of the data block 07 Dec. 2013 12:00-12:30 from LidarComplex. This data block is selected as a representative case study example for LidarComplex and is referred througout the paper. The figure illustrates the estimated coherence between different range gates and the corresponding fitted curves. The shaded areas show the cut-off frequency of 400 0.2 Hz is reasonable for this case. A similar plot from ParkCast is found in Fig. A3. Because the sampling rate of ParkCast is lower, the excitation by the nacelle top movement is not observed in the coherence, and thus no cut-off frequency was set for ParkCast data.
In Fig. 6 (c) and (d) the intercept of the coherence is much lower than one even though the separation is not very large.
This confirms the necessity of choosing a wind evolution model which is able to define different offset values depending on 405 the conditions. Indeed, in comparison to the fitting quality of Pielke and Panofsky's model which merely contains a single parameter -the decay parameter a, the fitting quality of the wind evolution model (Eq. (5)) is overall better (see Fig. A4).
The R 2 of the fitting to Eq. (5) is almost always higher than that of the fitting to the Pielke and Panofsky's model. The wind evolution model used in this work (Eq. (5)) is thus proven to be able to model the coherence better.

Distribution of the Wind Evolution Model Parameters
To study the overall characteristics of the wind evolution, the value distributions of the wind evolution model parameters for the both measurements are displayed in Fig. 7.

415
As listed in Table 2, there are two main differences between the lidar settings in the both measurements: sampling rate and measurement range, which might affect the distributions of the wind evolution parameters. To enhance the comparability of the both distributions, two special post-processings are executed correspondingly. Firstly, because the lidar sampling rate of LidarComplex is approximately as three times as that of ParkCast, an artificial data set is made for LidarComplex by averaging every three data points of the original lidar data to simulate the lidar data as if it were measured with a similar sampling rate as 420 that of ParkCast, so that the distributions of the both measurements can be compared. The fitted probability density function (PDF) of the wind evolution model parameters determined with this data set are plotted as yellow dashed lines in Fig. 7 (a) and (b). The comparison between the fitted PDF of the original data and that of the data with reduced sampling rate indicates that the lidar sampling rate only very slightly affects the wind evolution model parameters, or perhaps more accurately, the estimated coherence. Hence, the different sampling rates of the both measurements do not account for the differences between the both 425 cases observed in Fig. 7. Secondly, because of the limited measurement range of LidarComplex the maximum separation between two range gates only reaches 109 m, while that of ParkCast reaches more than 700 m. To make them comparable, Fig.   7 (c) and (d) merely show the wind evolution parameters calculated from the coherence with separation smaller than 120 m of ParkCast.
Apart from that, the measurements were carried out in different environments (onshore and offshore), at different times of 430 the year (which would impact atmospheric stability), and have different wind speed and turbulence intensity distributions (see The corresponding fitted parameters of the PDFs (orange curves) are displayed in Table 3. It is interesting to observe that the peak of the probability density is located around a = 1.8 for the onshore case LidarComplex, while around a = 0.8 for the offshore case ParkCast. Moreover, the medians of a are approx. 2.0 and 1.5 for LidarComplex and ParkCast, respectively. The 440 mean (see µ in Table 3) and median of a as well as its value of the peak location of the PDF of LidarComplex are all higher than that of ParkCast. This indicates that the coherence under similar separation generally decays faster in an onshore location than an offshore location. In terms of b, most of the values is near 0 and values higher than 0.1 are not often observed. Therefore, the y-axes in Fig. 7 (b) and (d) are shown in logarithmic manner to make the part of higher value of b visible. However, there is no significant difference of b between both cases observed in the figure.

445
It is not yet possible to explain the physical relationship between the wind evolution model parameters and the abovementioned PDFs and the physical meaning of the corresponding PDF parameters. To verify whether the above discussed phenomena commonly exist in wind evolution, further research involving more different measurement campaigns is necessary.
At this point, a hypothesis is made that the values of a and b might follow an inverse Gaussian distribution and a Gamma distribution, respectively. The corresponding PDF parameters probably depend on the terrain types on the one hand. It is 450 not clear if the roughness length would be the suitable parameter to quantify the influence of the terrain type on the value distribution of wind evolution model parameters. To figure out a concrete relationship between the PDF parameters and the terrain types, again, it is necessary to involve more measured data gathered from different terrain types. On the other hand, The notations µ, λ, k, θ are independent from the other notations out of the table.  ParkCast (see Fig. A5), which proves these conclusions are not accidental.
To further study the dependency of the wind evolution model parameters on the measuring separation, the box plots of the wind evolution parameters grouped by the measuring separations are given in Fig. 9. Although the ranges of the measuring separation of the both measurements are very different, similar trends can still be observed from the box plots. The decay 470 parameter a shows a decreasing trend with increasing measuring separation. This decreasing trend of a gradually stops when the separation approaches 300 m as observed in Fig. 9 (c). The offset parameter b shows an increasing trend with increasing measuring separation. An increasing b implies a decrease of the offset of the coherence curve. This is consistent with the phenomena observed from Fig. 8 and Fig. A5.
The decay of the coherence is supposed to result from the evolution of the turbulence eddies with respect to the travel time.

475
The dependency of the decay parameter a on the measuring separation, or rather the travel distances, actually reveals the dependency of a on the travel time. Figure 10 shows the correlation between a and the travel time approximated by ∆t maxcorr of ParkCast. The fitted curve represents a negative correlation trend between them. This implies that the decay rate of the coherence decreases with an increasing travel time. The fitting is done through nonlinear least squares fitting with Levenberg-Marquardt algorithm.

Parameterization Model
This section first presents the training procedure of the GPR model with application of the ARD-SE kernel to select the suitable predictors in Sect. 5.1. Followed is a discussion about the predictor selection in Sect. 5.2 and an evaluation of the model performance of the GPR models for the prediction of the wind evolution model parameters in Sect. 5.3.

485
The model training is a two-step process. In the first step, all the potential predictors are included in a preliminary model training to determine the characteristic length scale σ m for each predictor (see Eq. 21). As explained in Sect. 2.6, a relatively large log(σ m ) indicates that the corresponding predictor is less relevant to the model, and thus is not a suitable or necessary predictor for the model, and vice versa. Figure 11 illustrates a comparison among the log(σ m ). In the second step, the predictors are selected according to different preset limits of the log(σ m ) considering different cases of application or data availability, 490 e.g. if only lidar data is available or if sonic data is also available. The models are trained again using the corresponding selected predictors with application of a 5-fold cross validation. The corresponding RMSE (see Eq. 23) and R 2 (see Eq. 24) are calculated and used as criteria for the evaluation and comparison of the model performance.
The initial settings of the GPR model training is listed in Table 4. The setting of "exact GPR" means a standard GPR is applied in the fitting and prediction process. In contrast, the GPR can be approximated using different methods to reduce the 495 computational time for large amount of training data. The initial values of σ n , σ f , and σ m listed in the table are used to initiate the training. The final values are determined by the training data. The standardization is done by centering and scaling the data of each predictor by its mean and standard deviation, respectively, which gives the standard scores (also called z-scores) of the predictor data. When the standardization is activated, the training is done using the standardized predictor data.

500
In the predictor selection, two different situations of data availability are considered: only using variables calculated with lidar data as predictors (in both of LidarComplex and ParkCast available) and only using variables calculated with sonic data (only in LidarComplex available). The main focus of this study is to evaluate the possibility to predict the wind evolution only using lidar data. Including the situation of only using sonic data aims to provide a comparison. Table 5 shows the different predictor combinations and the corresponding limit values of the log(σ m ) for the both wind The overall performance of the GPR model is satisfactory -in all situations, the R 2 of the recommended cases are over 0.65 and for the best case can even reach 0.8 (see case 14). These results are much better than that of the preliminary study 510 (Chen, 2019), especially the prediction accuracy of the offset parameter b has been significantly improved. This is attributed to the application of the ARD-SE kernel in this study, whereas in the preliminary study, kernel functions with a common length scale for predictors were tried. Moreover, for the cases of using lidar data, the same predictors are selected for the both measurements despite the very different measurement conditions, which proves the corresponding predictor combinations are not accidental. Therefore, it is believed that the ARD-SE kernel is able to select reasonable predictors for the GPR model. As 515 shown in Table 5, the predictors with log(σ m ) lower than 1.0 are generally essential for the model. The limit of log(σ m ) can be set to 0 if the number of predictors needs to be reduced while the model performance should be kept acceptable.
The following points about predictor selection are worth discussing: Selection between two related variables. In the preliminary training, two pairs of related variables are intentionally involved at the same time: standard deviation σ and turbulence intensity I T , integral time scale T and integral length scale L. Only 520 ParkCast lidar data a 1 < 1.0 U l , σ l ,μ 3,l ,μ 4,l , L l , T l , ∆t Taylor   one of the two related variables needs to be selected as predictor if necessary, because they can be converted to each other. In terms of σ and I T , it is surprising to notice that the GPR model shows a preference for σ rather than I T , although I T is more commonly used in data analysis and simulation in wind energy. However, the situation becomes complicated in terms of T and L. The GPR model shows a slight preference for T in most of the cases of using lidar data except in the case 6-8 where L is clearly more preferred. Considering that L is actually approximated by the multiplication of T and the mean wind speed (see 525 Eq. (14)) and the relationship between σ and I T is also similar, it is possible that the GPR model generally tends to select the variable calculated from the data but not from the other variables.
Introducing higher-order wind statistics as predictors. In literature, the higher-order statistics skewness and kurtosis of the wind speed are not considered in the study of the wind evolution. However, it is interesting to observe that they are selected as predictors for all the cases of using lidar data besides the mean and the standard deviation. The comparison between the case 530 6 and 7 proves that introducing skewness and kurtosis as predictors can indeed improve the prediction accuracy of the wind evolution. This also implies that the skewness and the kurtosis probably contain additional information which can describe the state of turbulence more concretely. For example, given the same mean wind speed and standard deviation, a positive or a negative skew could indicate two different states of the turbulence, which could further be associated with other conditions. Introducing one of the targets as a predictor for the other. Since the two wind evolution parameters jointly determine the 535 shape of the wind evolution model, there is a certain correlation between them. Introducing one of them as a predictor for the other may improve its prediction accuracy. The comparison of case 9 and 10 confirms that introducing a as a predictor for b can indeed increase the R 2 of the prediction of b. This means it could be a good idea to predict the wind evolution model parameters successively rather than separately. This concept is not yet fully studied in this work, and thus case 9 is not considered as recommendation. To prove its applicability it is necessary to investigate which wind evolution model parameter 540 should be first predicted and how the prediction uncertainty of the first one would be propagated to the second one.
Prediction using sonic data. As mentioned above, the research on the cases of using sonic data as predictors is involved to provide a comparison. Figure 12 illustrates a comparison between the model performance of the recommended cases of using lidar data and sonic data. It must be emphasized that the ultrasonic anemometer is installed on a met mast located at 295 m away from the lidar. There must be a deviation between the sonic data and the data of the wind field where the coherence is 545 estimated. Despite this, the best cases of using sonic data as predictors (case 11 and 14) show a higher prediction accuracy, R 2 of about 0.8, in comparison to that of using lidar data (case 6 and 9). This implies that the upper limit of the prediction accuracy will be higher if sonic data is used. Moreover, case 13 shows a slightly better result than that of case 6, with only three predictors: the mean and standard deviation of the longitudinal component U x,s and σ x,s , and the standard deviation of the vertical component σ z,s . Think of the physical consideration of Kristensen (1979). The Kristensen's model assumes that 550 the wind evolution is determined by two probabilities: one is the probability that the eddy decays within the travel time; the other is the probability that the eddy diffuses in the transversal direction. The former could be associated with U x,s and σ x,s , while the latter with σ z,s . This could be an explanation for this predictor combination.

Model Evaluation
As mentioned above, the prediction accuracy of the parameterization model using lidar data as predictors is satisfactory, 555 showing an R 2 at least over 0.65. Section 5.2 already covers some contents of model evaluation, from the perspective of the prediction of the wind evolution model parameters. In this section, model evaluation is further discussed from other perspectives.
Firstly, it is interesting to visualize the predicted coherence in the frequency domain, combining the prediction of the both wind evolution model parameters. Figure 13 shows the predicted coherence and the corresponding 95% confidence interval by 560 the GPR models using lidar data (case 6 and 10) and sonic data (case 13 and 14) for the case study example of LidarComplex, respectively. The predicted coherence and the 95% confidence interval are reconstructed through putting the predicted value of a and b and their lower and upper bounds of the 95% confidence interval into the wind evolution model (Eq. (5)). From the figure, it can be observed that the prediction is very good because the predicted coherence is almost overlapped with the one estimated from the measured data and the 95% confidence interval is quite narrow. the predictors is indicated in the both figures. For some cases, the range of the whiskers is relatively large because of a small sample size.

575
As a result, it is proven that the Gaussian process regression is capable of achieving an accurate parameterization model for the wind evolution. However, it is worth emphasizing that the performance of any regression model is only possible to be as good as the quality of the training data. No matter what kind of regression model cannot eliminate the noisy term in the training data. And the noisy the training data is, the more uncertainty the prediction of the regression model will contain. A good data source is always very essential for training a good regression model.

Conclusions and Outlook
This paper aims to achieve a parameterization model for the wind evolution model to predict the wind evolution according to the wind field conditions. Because the concept of the lidar-assisted wind turbine control needs the aid of an accurate model for the prediction of the wind evolution to avoid harmful and unnecessary control action.
For this purpose, a two-parameter wind evolution model suggested in literature was applied to model the wind evolution.

585
Some data of two nacelled based lidars and an ultrasonic anemometer of two measurements carried out in an onshore and an offshore location were involved in the study. The wind evolution, i.e. the coherence between two measuring points, was estimated using the lidar data and the wind evolution model parameters a and b were determined by fitting the estimated coherence to the wind evolution model. The relevant wind field condition variables were derived from the data of the lidars and the ultrasonic anemometer.

590
Two main outcomes are presented: a statistical analysis of the wind evolution model parameters to reveal some characteristics of them and an investigation of the applicability of the Gaussian process regression for prediction of the wind evolution model parameters.
In the statistical analysis, the distributions of the wind evolution model parameters of the both measurements show some common characteristics, despite different wind field conditions and settings of the measurements. The value ranges of the 595 both wind evolution parameters a and b are very similar in the both measurements. The distributions of a and b seem to follow an inverse Gaussian distribution and a Gamma distribution, respectively. The fitted parameters of the probability density functions are different in the both measurements. It is thus hypothesized that the parameters of the probability density functions should depend on the terrain type. But it was not possible to analyze the influence of the atmospheric stability due to the lack of necessary data. Moreover, it was observed a strong dependency of wind evolution model parameters on the measuring 600 separation. The decay parameter a shows a decreasing trend with increasing measuring separation, while the offset parameter b shows an increasing trend with increasing measuring separation.
The applicability of Gaussian process regression to achieve the parameterization model was investigated. The trained models show satisfactory prediction accuracy under a 5-fold cross validation. The automatic relevance determination squared exponential kernel was applied to evaluate the relative importance of the predictors for the model and select the essential predictors.

605
It is confirmed that this kernel is capable of selecting the reasonable predictors under different data availability cases. Some interesting insights are given by the predictor selection, e.g. introducing higher-order wind statistics or one of the predicted targets as predictors can improve the prediction accuracy of the model. The model performance is also evaluated by a visualization of the predicted coherence and its 95% confidence interval in the frequency domain. The predicted coherence matches the coherence estimated with the measured data very well and the 95% confidence interval is relatively narrow. In addition, it 610 was analyzed whether the prediction errors is relevant to the values of the predictors. No obvious relevance can be observed.
As a result, it is proven that the Gaussian process regression is capable of achieving a parameterization model with sufficient accuracy for the prediction of the wind evolution.
For the model performance of the parameterization model, there is still space for improvement. Since the performance of any regression model is only possible to be as good as the quality of the training data, reducing the uncertainty in the training 615 data or increasing the data amount could improve the model performance. For example, methods to improve the estimation of the coherence and the wind statistics from lidar data are desired. Moreover, the predictors discussed above do not cover all the possibilities. Introducing new proper predictors could also improve the model performance. In fact, the model concept is very flexible. Any improvement of any part of the workflow can be easily integrated.
In the future, besides the ideas above-mentioned, it is interesting to involve more measurement data, especially from different 620 terrain types, to further study if the characteristics of the wind evolution found out in this work commonly exist and what are the physical principles behind them. Another question needs to be answered is if it is possible to achieve a generally the prediction accuracy of the model. Furthermore, consider the application of the parameterization model using real-time measurement data as predictors, an additional model will be needed to determine whether the current data meets the quality requirements to be input into the parameterization model. This will be one of the research focuses in the near future.
Last but not least, as mentioned above, the model concept is very flexible and the methodology can be applied in different situations. For example, for other lidar trajectories or even other measurement devices, the model concept can be modified 630 by replacing the method of the estimation of coherence. The wind evolution model and the regression model can also be changed. Basically, one can achieve a parameterization model to meet own specific requirements following the concept and the methodology presented in this paper.  and Energy (BMWi). This publication is supported by the Open Access Publications funding of University of Stuttgart. Figure A4. Comparison of R 2 of the fitting to the two-parameter wind evolution model (subscription "2par") and that of the fitting to the one-parameter one (subscription "1par").