Improving wind farm flow models by learning from operational data

This paper describes a method to improve and correct an engineering wind farm flow model by using operational data. Wind farm models represent an approximation of reality and therefore often lack accuracy and suffer from unmodeled physical effects. It is shown here that, by surgically inserting error terms in the model equations and learning the associated parameters from operational data, the performance of a baseline model can be improved significantly. Compared to a purely data-driven approach, the resulting model encapsulates prior knowledge beyond that contained in the training data set, which has a number of advantages. To assure a wide applicability of the method – also including existing assets – learning here is purely driven by standard operational (SCADA) data. The proposed method is demonstrated first using a cluster of three scaled wind turbines operated in a boundary layer wind tunnel. Given that inflow, wakes, and operational conditions can be precisely measured in the repeatable and controllable environment of the wind tunnel, this first application serves the purpose of showing that the correct error terms can indeed be identified. Next, the method is applied to a real wind farm situated in a complex terrain environment. Here again learning from operational data is shown to improve the prediction capabilities of the baseline model.


Introduction
Knowledge of the flow at the rotor disk of each wind turbine in a wind power plant enables several applications, including wind farm control, the provision of grid services, predictive maintenance, the estimation of life consumption, the feed-in to digital twins, and power forecasting, among others. This paper describes a new method to improve a wind farm flow model directly from standard operational data. The main idea pursued here is to use an existing wind farm flow model to provide a baseline predictive capability; however, as all models contain approximations and may lack the description of some physical phenomena, the baseline model is improved (or "augmented", which is the term used in this work) by adding parametric correction terms. In turn, these extra elements of the model are learned by using operational data. The correction terms capture effects that are typically not present in standard flow models (such as, for example, secondary steering, Fleming et al., 2018; or wind farm blockage, Bleeg et al., 2018) or that are highly dependent on a specific site or difficult to model upfront (such as, for example, nonuniform inflow caused by local orography and vegetation).
Various wind farm flow models have been developed and are described in the literature. Whereas direct numerical simulation (DNS) is still out of reach for practical applications due to its overwhelming computational cost, large-eddy simulation (LES) methods are now routinely used for the modeling of wind farm flows Breton et al., 2017). Although invaluable for the understanding of the behavior of the atmospheric boundary layer and of wakes, LES is however still very expensive, so that its use outside of some specialized applications is limited. To reduce cost, one can resort to lower-fidelity computational fluid dynamics (CFD) models (Boersma et al., 2019), or to the extraction of reduced-order models (ROMs) from higher-fidelity ones (Bastine et al., 2014). Instead of deriving models from first principles, another widely adopted approach is to use engineering models, which are expressed in the form of parametric analytical formulas with a limited number of degrees of freedom and hence a much reduced numerical complexity (Frandsen et al., 2006;Gebraad et al., 2014;Bastankhah and Porté-Agel, 2016). The present paper uses this last family of methods, although ideas similar to the ones developed here could also be applicable to higher-fidelity models.
Even though engineering models are constantly improved and refined (Fleming et al., 2018), they will most likely always exhibit only a limited accuracy in many practical applications, for example whenever an important role is played by effects such as orography, (seasonal) vegetation, spatial variability of the wind, sea state roughness, the erection of other neighboring wind turbines, the presence of obstacles, and others. In addition, low-fidelity models often lack some physics, e.g. the flow acceleration caused by wake and rotor blockage, secondary steering, or others. The idea pursued in this paper is then to take a rather pragmatic approach: based on the realization that it will always be difficult -if not altogether impossible -to include all effects and all physics in a model of limited numerical complexity, a given model is corrected by unknown parametric terms, which are then learned by using operational data.
The idea of improving an existing model based on measurements is hardly new, and it is actually an important topic in the areas of controls and system identification. For example, in the field of wind farm flows, a Kalman filtering approach has been proposed by Doekemeijer et al. (2017) to update model predictions based on lidar measurements. Here again the present paper takes a more pragmatic approach, and model updating is based exclusively on data provided by the standard supervisory control and data acquisition (SCADA) systems that are typically available on contemporary wind turbines. On the one hand this has the advantage that the proposed method is applicable to existing assets, as it does not necessitate extra sensors. On the other hand, given that stored SCADA data typically represent 10 min averages, this also implies that the models obtained by this technique are of a steady-state nature. Although unsteady effects in wind farms are clearly important, steady-state models are still very valuable and can support many of the applications listed above. In addition, nothing prevents the generalization of the proposed approach to unsteady flow models, assuming that the relevant higher-frequency data sets are available, which is already the subject of ongoing work from these authors.
The contemporary literature -and not only in the field of wind energy -indicates an increasing interest in data-driven approaches. Just to give one single example related to wake modeling, a purely data-driven approach has been recently described by Göçmen and Giebel (2018). However, the current enthusiasm for data should not make one forget that physics-based and analytical models are also extremely valuable because they often encapsulate significant knowledge on a given problem, often corroborated by long experience. In fact, purely data-driven approaches suffer from a number of limitations that descend directly from a very simple and inevitable fact: a model that is exclusively based on data can only know what is contained in the data set that was used to build it. Typically, this means that a very significant amount of data is necessary to obtain a model that is sufficiently general and accurate. Furthermore, the data have to cover the entire spectrum of operation of the system. This also means that the model might have very poor knowledge (and hence poor performance) for rare situations or conditions that take place at the boundaries of the operating envelope, where few if any data points might be available.
An alternative to the purely data-driven approach is presented in this work, where a reference baseline model is augmented with parametric error terms, which are then identified using data. The baseline model already includes prior knowledge based on physics, empirical observations, and experience. Therefore, even prior to the use of data, a minimum performance can be guaranteed. The model is augmented with parametric error terms, whose choice is driven by physics and the knowledge of the limitations of the baseline model. Once the errors are identified using operational data, their inspection can clarify the causes of discrepancy between model and measurements. Eventually, this can be used to improve the underlying baseline model. Furthermore, by looking at the magnitude of the identified errors, significant deviations from the baseline model can be flagged to highlight issues with the model itself, the data, or the training process.
Finally, it should be noted that the identification of the error terms can be combined with the tuning of the parameters of the baseline model. This addresses yet another problem: tuning the parameters of a model that lacks some physics may lead to unreasonable values for the parameters, as the model is "stretched" to represent phenomena that it does not contain. By the proposed hybrid approach, the simultaneous identification of the parameters of the baseline model together with the ones of the error terms eases this problem, as unmodeled phenomena can be captured by the modelaugmenting terms, thereby reducing the chances of nonphysical tuning of the baseline parameters.
The baseline model parameters and the extra correction terms have a different functional form in the augmented governing equations. Hence, they should be distinguishable from each other, as they imply different effects on the model. However, as for many identification problems, it is in general not possible to guarantee that all unknown parameters are observable and noncollinear given a set of measurements and, hence, given a certain informational content. To address this problem, the method proposed by Bottasso et al. (2014a) is used here, where the original unknown parameters are recast into a new set of statistically uncorrelated variables by using the singular value decomposition (SVD) of the inverse Fisher information matrix. Once the problem has been solved in the space of the orthogonal uncorrelated parameters, the solution is mapped back onto the original physical space. This approach not only avoids the ill-posedness of the original problem, but also allows one to clarify which physical parameters are visible given a certain data set.
The paper is organized as follows. First, the baseline model is introduced in Sect. 2.1, together with a detailed description of the proposed parametric corrections in Sect. 2.2. Next, the SVD-based parameter identification method is presented in Sect. 2.3. The approach is then applied in Sect. 3.1 to a cluster of scaled wind turbines operating in the atmospheric test section of the wind tunnel of the Politecnico di Milano (Bottasso et al., 2014b). The goal of this first application is to show that a correct identification of the error terms can be achieved. This is indeed possible in the controllable and repeatable conditions of a wind tunnel, where inflow and wake characteristics can be precisely measured, something that is hardly possible today in the field. Specifically, it is shown that the method can correctly learn the lack of uniformity of the wind tunnel inflow, which is akin to what happens in a real wind farm because of orographic effects. Similarly, it is shown that secondary steering, which is completely absent from the baseline model used here, can be learned by using turbine power measurements only. A more extended view on the wind tunnel results is reported in Appendix A. After having demonstrated the method in the known and controlled wind tunnel environment, a second application is developed in Sect. 3.2 that targets a real 43-turbine wind farm. Here results indicate that the augmented model has a markedly improved prediction capability when compared to the baseline one, thanks primarily to the identification of orographic effects on the inflow and the tuning of other model parameters. Finally, conclusions are drawn in Sect. 4.

Baseline wind farm flow model
The proposed method is applied here to the baseline wake model of Bastankhah and Porté-Agel (2016), implemented within the FLORIS framework (Doekemeijer and Storm, 2018). Given ambient wind conditions, steady-state velocities within a wind farm can be computed by this model, together with the corresponding operating states and power outputs of all its turbines. First, ambient conditions are estimated from un-waked machines operating in free stream, which are identified by the turbine yaw orientations and the wake model (Schreiber et al., 2018). Then, power and thrust of the upstream turbines are computed based on the turbine aerodynamic characteristics, regulation strategy, and alignment with the local wind direction. Next, the wakes shed by these turbines are calculated in terms of their trajectory and speed deficit. In turn, this yields the velocity at the rotor disks of the turbines immediately downstream. In the case of multiple wake impingements on a rotor, a combination model is used to superimpose multiple wake deficits. Similarly, an added turbulence model is used to estimate the turbulence intensity at a downstream turbine rotor disk, as this local ambient parameter affects the expansion of the wake. This process is repeated marching downstream throughout the wind farm until the last downstream turbine is reached.
In this work, the implementation uses the selfSimilar FLORIS velocity deficit model, the rans deflection model, the quadraticRotorVelocity wake combination model, and the crespoHernandez added turbulence model. The interested reader is referred to Bastankhah and Porté-Agel (2016), Crespo andHernández (1996), andDoekemeijer et al. (2019) and references therein for detailed descriptions and derivations of these models.
Engineering wake models depend on a number of parameters, which should be tuned in order to obtain accurate predictions. For the specific model used in this work, these tunable factors are the wake parameters α, β, k a , k b , a d , and b d and the turbulence model parameters TI a , TI b , TI c , and TI d (Bastankhah and Porté-Agel, 2016).
In this work, the parameters are first set to an initial value, either taken from the literature or identified with ad hoc measurements; these initial values are held fixed throughout the analysis and not changed further. Corrections to the initial values are then expressed as where k is a model parameter, k * its initial value, and p k the correction. Although this is not strictly necessary, this redundant notation helps highlight the changes to the nominal model parameters obtained by the proposed procedure.

Model augmentation
The engineering model described earlier is a rather simple approximation of a flow through a wind power plant and it is therefore bound to have only a limited fidelity to reality, with a consequently only limited predictive accuracy. Even for more sophisticated future models, it is difficult to imagine that all relevant physics will ever be precisely accounted for. But even if such a model existed, in practice one might simply not have all necessary detailed information on the relevant boundary and operating conditions that would be required. For example, one might not know with precision the conditions of the vegetation around and within a wind farm, with its effects on roughness and, hence, on the flow characteristics. In other words, it is safe to assume that all models are in error to some extent and probably always will be. To address this problem, the model can be pragmatically augmented with correction terms. Here one could take two alternative approaches: either a generic all-encompassing error term is added to the model or "surgical" errors are introduced at ad hoc locations in the model to target specific presumed deficiencies. The first approach could be treated with a brute-force parametric modeling approach, for example by using a neural network. Here, the second approach was used, as it allows for more insight into the nature of the identified corrections. The specific parametric corrections used in the present paper are reviewed next. It is clear that these are only some of the many corrections that could be applied to the present baseline model, so that the following does not pretend to be a comprehensive treatment of the topic. Nonetheless, results indicate that some of these corrections are indeed significant and provide for a marked improvement of the baseline model.
-Nonuniform inflow. The inflow to a wind farm can exhibit spatial variability, mostly because of orographic and local effects, especially in complex terrain conditions. For example, commercial wind resource assessment tools include topographic speedup ratios customarily computed by CFD models (Jacobsen, 2019). In contrast to this established practice, no direct or equivalent modeling of orographic effects is at present available in engineering wake models. Another reason for inflow variability may be due to wind farm blockage effects (Bleeg et al., 2018). Indeed, current wake models such as the one used here assume that upstream turbines affect downstream ones through their wakes but do not model the effects of downstream machines on the upstream ones. In a wind farm, depending on the wind direction and cross-wind location considered, the number and operating state of downstream turbines vary, which may induce a cross-wind speed variability in the inflow.
To capture some of these effects, the model ambient flow speed V ∞ is expressed here as a function of height above ground Z, cross-wind lateral position Y , and ambient wind direction as where V ∞,0 is the reference (baseline uncorrected) ambient flow speed and z h the reference height of the vertically sheared flow with exponent α vs . Function f augm,speed (Y, , c speed , p speed ) is the speed correction term. This function is defined in the 2D space For each value of the ambient wind direction , Y is a lateral coordinate orthogonal to it that spans the width of the farm; hence, by selecting min and max a lateral inflow nonuniformity can be modeled for a given sector or the whole wind rose of directions. The (Y, ) space is discretized into rectangular cells with corner nodes c speed = [. . .; (Y i , i ); . . .] (for an example, see Fig. 16). The corresponding unknown error nodal values are stored in vector p speed , and bilinear shape functions interpolate the error in each cell based on the nodal values at its corners. Equation (2) could be extended to also include a longitudinal wind-aligned coordinate, similarly to the localized speedup ratios of Jacobsen (2019), to model wind farm blockage effects.
Local orographic effects and blockage may also induce variability in the wind direction . Similarly, the vertical shear exponent α vs and turbulence intensity I may vary, for example on account of nonuniform roughness induced by vegetation or other obstacles. To include these effects in the farm flow model, the baseline quantities are augmented as In all these expressions, (·) ref indicates a baseline reference quantity, while function f augm,(·) is a correction term. This function is defined on the 1D space ∈ [ min , max ], discretized with nodes c (·) = [. . .; i ; . . .] (·) , using linear shape functions to interpolate the corresponding nodal values p (·) . Here again, by selecting min and max , corrections can be applied to the whole wind rose or just to a sector.
-Secondary steering. By misaligning a wind turbine rotor with respect to the incoming flow direction, the rotor thrust force is tilted, thereby generating a cross-flow force that laterally deflects the wake. As shown with the help of numerical simulations by Fleming et al. (2018), this cross-flow force induces two counter-rotating vortices that, combining with the wake swirl induced by the rotor torque, lead to a curled wake shape. As observed experimentally by Wang et al. (2018), the effects of these vortices result in additional lateral flow speed components, which are not limited to the wake itself but also extend outside of it. By this phenomenon, the flow direction within and around a deflected wake is tilted with respect to the upstream undisturbed direction. Therefore, when a turbine is operating within or close to a deflected wake, its own wake undergoes a change of trajectory -termed secondary steeringinduced by the locally modified wind direction. Although models of this phenomenon are being developed (Martínez-Tossas et al., 2019), they significantly increase the computational cost and are not yet available in standard implementations of engineering wake models such as the one used here.
The change of wind direction at a downstream turbine induced by secondary steering (indicated by the subscript ss) is modeled here as where f augm,ss is the correction term andỹ = Y − y wc is the lateral distance to the wake centerline (see Fig. 1), defined in the baseline wind farm model as the locus of the points of minimum flow speed. According to the notation used in Eq. (6.12) of Bastankhah and Porté-Agel (2016), init indicates the initial wake direction of the closest upstream turbine. The correction term is expressed as the difference of two Gaussian functions and more precisely f augm,ss ỹ, init , p ss = init p ss,1 exp −0.5 ỹ + sgn( init )p ss,3 p ss,2 2 −p ss,4 exp −0.5 ỹ + sgn( init )p ss,6 p ss,5 where p ss = (p ss,1 , p ss,2 , p ss,3 , p ss,4 , p ss,5 , p ss,6 ) is the vector of free parameters, where parameters 1 and 4 are related to the amplitude, 3 and 6 to the standard deviation, and 2 and 5 to the location of the correction functions. Since the Gaussian functions are not centered at the wake centerline and the effect of secondary steering is assumed to be symmetric with respect to the misalignment angle, the correction term also depends on the direction of wake deflection sgn( init ).
This particular choice of the shape functions is motivated by the results shown in Fig. 8b of Wang et al. (2018). Indeed, LES simulations and measurements reveal the presence of a stronger lateral velocity component directed towards the wake on the leeward side of the wake itself, and of an opposite and weaker lateral component on the windward side. Such a distribution can be approximated by two Gaussian functions using Eq. (5).
Note that the change in local wind direction also leads to a slight lateral deflection of the nonuniform wind farm inflow introduced previously. More precisely, for a turbine that is located X behind an upstream turbine, the nonuniform inflow expressed by Eq.
(2) is evaluated at Y + X sin( ) instead of Y . Figure 1a shows the hub height flow speed for two wind turbines modeled in FLORIS, with the turbine rotor disks being indicated with thick black lines. The wake centerlines and the undisturbed free-stream wind direction are indicated by black dotted and dashed lines, respectively. The upstream turbine is misaligned with respect to the incoming flow, and therefore its wake is deflected laterally. Using the baseline wake model, the downstream turbine wake develops along the freestream wind direction. Panel (b) of the same figure shows the effects of the secondary steering correction term given by Eq. (5). The plot clearly shows that the downstream turbine wake path is affected by the locally changed wind direction.
-Non-Gaussian wake and flow acceleration. Engineering wake models are based, among other hypotheses, on assumed shapes of the speed deficit. For example, the present baseline model assumes a Gaussian distribution of the speed deficit within the wake. Another assumption is that the flow outside the wake is undisturbed and equal to the free stream. However, these assumptions can, at times, not be exactly satisfied, as already observed by Xie and Archer (2017) and Martínez-Tossas et al. (2019), among others. For example, aisle jets are local accelerations of the flow outside of the wake, produced by local blocking in the neighborhood of an operating turbine. It has been reported that aisle jets can induce local flow speedups in excess of 10 % of the undisturbed inflow (Dörenkämper et al., 2015).
To account for such effects, the wake velocity V wake of the baseline model is corrected as where V wake,FLORIS is the baseline Gaussian wake speed profile, d wc is the absolute distance to the wake center (which, at hub height, is equivalent to |ỹ|), and f augm,acc represents the correction term, which -similarly to the previous corrections -is modeled with linear shape functions characterized by node locations c acc (in terms of d wc ) and nodal values p acc .
-Reduced power extraction due to nonuniform wind turbine inflow. Numerical simulations conducted in FAST (Jonkman and Jonkman, 2018) using its blade element momentum (BEM) implementation yielded a slight reduction in the rotor power coefficient for horizontally sheared flow, when compared to unsheared conditions with the same hub wind speed. Even though BEM can only give a rough indication for such an effect, a correction of the power coefficient of the baseline model is introduced here in the form where C P,κ=0 is the nominal power coefficient, κ the equivalent horizontal linear shear coefficient on the rotor disk, and p κ the free correction parameter. The linear shear κ is either due to a lack of lateral uniformity of the inflow or due to the impingement of a wake, and it is evaluated accordingly within the farm model.
-Wind-speed-dependent power loss in yaw misalignment. The baseline formulation models the power extraction of a misaligned wind turbine using the cosine law C P (γ ) = C P cos(γ ) p P , where C P is the power coefficient of the wind-aligned turbine, γ the misalignment angle with respect to the local flow direction, and p P the power loss exponent. Different power loss exponents have been reported in the literature, ranging from the value of 1.4 found by Fleming et al. (2017) to 1.8 according to Schreiber et al. (2017), 1.9 for Gebraad et al. (2015), and all the way to the ideal value of 3 that is expected if only the rotor-orthogonal ambient flow component contributes to power extraction (Boersma et al., 2019). In addition, p P might also depend on the regulation strategy used by the turbine controller. Here, the power coefficient in misaligned operation is augmented as where C P is the power coefficient of the flow-aligned turbine (possibly reduced by shear effects, as argued above), p P0 is the misalignment angle at which the turbine produces maximum power, and V and V rated are, respectively, the rotor effective and rated wind speeds. Finally, p P is the baseline exponent, while p P,a and p P,b are free parameters that model a linear wind speed dependency of the cosine law.

Parameter identification method
The parameters of the baseline model and of its correction terms are identified with the method developed by Bottasso et al. (2014a). The formulation of the parameter estimation problem is independent of whether the parameters belong to the baseline model or to its correction factors. In this sense, one can use the same method to just tune the baseline parameters without considering the correction terms, just identify the correction terms at the frozen baseline model, or concurrently identify both sets. The formulation is based on the classical likelihood function, which describes the probability that a given set of noisy observations can be explained by a specific set of model parameters. By numerically maximizing this function, a set of parameters is identified that most probably explains the measurements. Bound constraints are used to guide the process and ensure convergence to meaningful results.
The accuracy with which the parameters can be estimated depends on how flat the likelihood function is with respect to changes in the parameters. For example, a flat maximum of the function implies that different nearby values of the model parameters are associated with similar values of the likelihood. These characteristics of the solution space are captured by the Fisher information matrix, which can be interpreted as a measure of the curvature of the likelihood function. Furthermore, it can be shown that the variance of the estimates is bound from below (Cramér-Rao bound) by the inverse of the Fisher matrix (Jategaonkar, 2015). Although the analysis of the Fisher information is useful for the understanding of the well-posedness of an estimation problem and of the quality of the identified model, it does not offer a constructive way of reformulating a given ill-posed problem. Indeed, a flat solution space and collinear parameters are to be expected in the present case, given the complex couplings and dependencies that may exist among the various parameters of a wind farm flow model and its correction terms.
To overcome this limitation of the classical maximum likelihood formulation, following Bottasso et al. (2014a), the original physical parameters of the model are transformed into an orthogonal parameter space, by diagonalizing the Fisher matrix using the SVD. This way, as the parameters are now statistically decoupled, one can set a lower observability threshold and in the analysis retain only the ones that are in fact observable given the available set of measurements. Once the problem is solved, the uncorrelated parameters are mapped back onto the original physical space.
As shown later on, this approach achieves multiple goals: it allows one to successfully solve a maximization problem with many free parameters, some of which might be interdependent on one another or not observable in a given data set; it reduces the problem size, retaining only the orthogonal parameters that are indeed observable; it highlights, through the singular vectors, the interdependencies that may exist among some parameters of the model, which provides for a useful interpretation tool that may guide the reformulation of parts of the model and its correction terms.

Maximum likelihood estimation of model parameters
A steady-state wind farm model can be mathematically expressed as where f (·, ·, ·) is the nonlinear static function describing the wind farm model, which depends on free parameters p ∈ R n . These parameters can include both wake model parameters and/or model augmentation parameters. The model inputs u ∈ R n u include ambient wind conditions (i.e. ambient wind speed, direction, air density, turbulence intensity) and control inputs (i.e. yaw misalignment, partialization factor, blade pitch, rotor speed of each turbine). The model outputs y ∈ R m represent quantities of interest for which measurements are available, in the present work these being the power outputs of each wind turbine in the farm. Experimental observations z of the simulated outputs y will in general result in a residual r ∈ R m , caused by measurement and process noise (e.g. plant-model mismatch), so that Given a set S = {z 1 , z 2 , . . ., z N } of N independent observations, the likelihood function (Jategaonkar, 2015) can be defined as where p(·) is the probability of S given p. Assuming the residuals r with covariance R to be statistically independent within the set of measurements (i.e. E[r i r T j ] = R δ i,j , where δ i,j is the Kronecker delta), the likelihood function can be written, following Jategaonkar (2015), as Maximizing L (or minimizing its negative logarithm), a maximum likelihood estimate of the parameters can be obtained as where J (p) = − ln(L(S p ). The measurement noise covariance matrix R can be estimated under mild hypotheses as , leading to an iteration between a solution at given covariance and a covariance update step (Jategaonkar, 2015). However, in this paper the measurement noise covariance matrix is estimated a priori and therefore assumed to be known. The cost function therefore becomes To ensure reasonable and physically viable solutions, parameters can be forced to stay within predefined upper (subscript ub) and lower (subscript lb) bounds, by adding the corresponding inequality constraints p lb ≤ p ≤ p ub to problem (13). As the parameter values and constraints can differ in magnitude, it is a good practice to scale all parameters such that a value of 1 corresponds to the upper bound p ub and a value of −1 to the lower one p lb . The optimization problem can finally be solved numerically by a suitable algorithm, such as sequential quadratic programming (SQP) (Nocedal and Wright, 2006).

Identifiability of parameters
The Fisher information matrix F ∈ R n×n is defined as and describes the curvature of the likelihood function. It can be shown (Jategaonkar, 2015) that a lower bound (termed Cramér-Rao bound) of the covariance of the estimated parameter is given by where p true represents the true but unknown parameters. The kth diagonal element of P is a lower bound on the variance of the kth estimated parameter, while the correlation between different parameters is captured by the off-diagonal terms of that same matrix. The correlation coefficient between two parameters i and j is defined as where P i,j denotes the i, j th element (row, column) of P. By analyzing the estimated parameter variance, as well as the correlation between parameters, valuable insight into the well-posedness of the parameter identification problem can be readily obtained.

Problem transformation and untangling using the SVD
When some parameters are highly correlated or have large variance, the problem is ill-posed: it might exhibit sluggish convergence, or no convergence at all, and small changes in the inputs may lead to large changes in the estimates. Such situations are difficult to solve in physical space, because parameters are typically coupled together to some degree through the model. To untangle the parameters, one may resort to the SVD (Golub and van Loan, 2013). By this approach (Hansen, 1987;Waiboer, 2007;Bottasso et al., 2014a), the original parameters are mapped into a new set of uncorrelated (orthogonal) parameters. Since the new unknowns are uncorrelated, one can set a threshold to their variance by using the Cramér-Rao bound and only retain those in the optimization that are observable within the given data set.
The Fisher matrix F is first factorized as F = M T M, where M ∈ R Nm×n is defined as Assuming a larger number of measurements than parameters (Nm > n), matrix M can be decomposed into where U ∈ R Nm×Nm and V ∈ R n×n are the matrices of left and right, respectively, singular vectors, while where S ∈ R n×n is a diagonal matrix, whose entries s i are the singular values sorted in descending order. By using Eq. (19) and the factorization of F, the inverse of the Fisher information matrix can be written as Note that the columns of the orthogonal matrix V are also the eigenvectors of P and s −2 i the corresponding eigenvalues. Furthermore, P and F are symmetric and, based on the spectral theorem, diagonalizable.
The physical parameters p can now be transformed into a new set of orthogonal parameters by a rotation performed with the right singular values: For the transformed set of parameters, the Cramér-Rao bound on the variance of the estimates is the diagonal matrix S −2 ≤ Var( MLE − true ). Therefore, a small singular value s i corresponds to a large uncertainty in the corresponding orthogonal parameter estimation.
To remove parameters that cannot be estimated with sufficient accuracy, matrix S can be partitioned as where S ID contains the identifiable singular values, i.e. those such that s −2 i < σ 2 t , σ t being a threshold on the highest acceptable standard deviation in the estimate. On the other hand, matrix S NID contains singular values associated with parameters that cannot be identified with sufficient accuracy and are therefore discarded. Accordingly, V is also partitioned as V = [V ID , V NID ], while the orthogonal parameters are partitioned as = [ T ID , T NID ] T . Finally, the physical parameters are expressed in terms of the sole identifiable orthogonal parameters: Given that the Fisher matrix depends on the values of the parameters p, an iterative procedure should be followed, where the diagonalization of the problem is repeated at each update of the parameter vector.

Identification method with variable measurement weights
In some cases, it may be useful to increase the importance of some measurements in the parameter estimation problem. This can be readily obtained by simply treating an observation with weight w as if it appeared w times in the observation data set (Karampatziakis and Langford, 2011). Cost function (14) then becomes where w i is the relative weight of observation i and Similarly, the Fisher matrix becomes and its factorization is The remainder of the formulation is not affected by the introduction of weights.

Results
The proposed method is first applied in Sect. 3.1 to a wind tunnel experiment with a small cluster of three wind turbines and then in Sect. 3.2 to a real wind farm consisting of 43 wind turbines. The former example aims at a verification of the correctness of the identified augmentations, given the known and controllable conditions of the scaled experiments, whereas the latter is meant to offer a first glimpse of the practical applicability of the new method in the field.

Wind tunnel verification
Whether identified model corrections are indeed physical or only an artifact of the model-measurement mismatch is difficult to prove in general. From this point of view, wind tunnel experiments provide a unique opportunity to verify the concept proposed in this paper. Indeed, the overall flow within a cluster of turbines can be measured with good accuracy, and the experiments can be repeated in multiple desired operating conditions. The aim of this section is then to show that, even in the presence of multiple possibly overlapping model terms, the correct improvements to a baseline model can be learned from operational data only.

Experimental setup
The experimental setup is composed of a scaled cluster of three G1 wind turbines, each of them equipped with active yaw, pitch, and torque control. The turbines were operated in the boundary layer test section of the wind tunnel of the Politecnico di Milano. Details on the models and the wind tunnel are reported, among other publications, in Campagnolo et al. (2016a, b, c). The turbines are labeled WT1, WT2, and WT3, starting from the most upstream one and moving downstream. The machines are mounted on a turntable, whose rotation is used to change the wind direction with respect to the wind farm layout. In the nominal configuration, i.e. for a turntable rotation γ TT = 0 • , the three turbines are aligned with the wind tunnel main axis -and hence with the flow velocity vector. The turbines are installed with a longitudinal spacing of 5 diameters (D), as shown in Fig. 2 with a view looking down towards the wind tunnel floor. As indicated in the figure, positive turntable rotations are clockwise. For γ TT = 0 • , the longitudinal distance between the turbines decreases slightly. However, considering that in this work the largest investigated turntable angle was ±11.5 • , the longitudinal distance varied only between 4.9D and 5D.
A pitot probe was placed at hub height, 3D upstream of the first G1 in the nominal configuration. The probe was there- fore not placed on the turntable, and its position remained fixed with respect to the wind tunnel test section. A windtunnel-fixed reference frame, used in the following to discuss the results, is also depicted in Fig. 2. Its origin is placed at the turntable center, while the frame x axis is aligned with the wind direction; the y axis points left, looking downstream; and hence Z points vertically up from the floor to complete a right-handed triad.
The yaw angle γ WTi of the ith wind turbine is positive for a counterclockwise rotation looking down onto the floor, as shown for WT1 in Fig. 2, and null when the rotor disk is orthogonal to X and, therefore, to the nominal wind direction. Figure 3 shows a photo of the cluster of turbines, looking downstream with WT1 in the foreground. The wind tunnel floor is blue, whereas the turntable is black.
The ambient wind speed V ∞,0 measured by the pitot tube was, for all conducted experiments, between 5.20 and 5.75 m s −1 , which corresponds to slightly below-rated conditions. The ambient turbulence intensity was equal to 6.12 %, while the vertical shear was α vs = 0.144.

Model setup
The FLORIS model implementation used in this work is the one available online (Doekemeijer and Storm, 2018). All baseline model parameters are reported in Table 1 and taken from Campagnolo et al. (2019), where they were identified based on wake measurements of a single isolated G1 turbine. Figure 4 shows the G1 power C P and thrust C T coefficients as functions of wind speed V . The curves were obtained from dynamic simulations conducted in turbulent inflow, using the same controllers implemented on the scaled models. The C P and C T vs. tip speed ratio (TSR) and blade pitch setting curves were obtained with a BEM formulation using experimentally tuned airfoil polars (Bottasso et al., 2014a). As the turbine controller does not consider variations in air density ρ, the coefficients shown in the figure exhibit a slight dependency on this ambient parameter. Within FLORIS, this effect is taken into account by interpolating within the coefficients based on the actual density measured in the wind tunnel during each experiment. For all reported test conditions, air density varied in the range ρ ∈ [1.159, 1.185] kg m −3 . The power loss exponent in misaligned conditions was evaluated experimentally to be p P = 2.1741, while for thrust the coefficient was found to be p T = 1.4248.
The ambient wind speed was determined from the pitot tube. It was observed that, by using this value, the power of a free-stream turbine predicted by the FLORIS model was slightly underestimated, most probably due to the sheared flow. To correct for this effect, measurements provided by the pitot tube were scaled by the factor 1.0176, which was computed in order to match simulated and measured power. Furthermore, in the original FLORIS implementation the power of a turbine is computed as P = 1/2ρAV 3 avg C P , where V avg is the average wind speed at the rotor disk and A the rotor disk area. Here, power was computed by integrating over the rotor disk area, i.e. P = 1/2ρ A V 3 C P dA, which is probably slightly more accurate even though it involves a minor increase in computational effort.

Ranking of correction terms
To initially assess the role of the various parameters, a ranking analysis was conducted. The parameters were clustered in sets, depending on their role in the model. A first identification was performed using all parameter sets, yielding the presumed best value, denoted J ref , of the cost function expressed by Eq. (14). The analysis was then repeated multiple times, each time removing one parameter set from the optimization. By looking at the resulting change in the value of the cost function, one may then rank the various parameter sets in order of importance. The analysis is based on a total of 190 experimental observations, as described in greater detail in the following.
All augmentation terms described in Sect. 2.2 were considered, except for the lateral variation in wind direction and the wind-direction-dependent vertical shear, as they are not applicable to the wind tunnel experiments. The nonuniform flow speed was modeled using five nodes located at c speed (Y ) = [−3, −2, −1, 0, 1] m (which correspond to approximatively [−2.7, −1.8, −0.9, 0, 0.9]D) and also indicated in Fig. 2 using × symbols. As only the turbine positions with respect to the flow are modified by rotating the turntable, a wind direction dependency was not included in this correction term. Table 2 reports the initial values and lower and upper bounds -chosen based on an educated guess -for the nonuniform inflow and secondary steering correction terms. Figure 5 shows the relative increase in the cost function when eliminating one parameter set at a time. The figure clearly indicates that the most important parameters are the ones modeling laterally nonuniform speed and secondary steering. Indeed, this particular wind tunnel, due to its internal configuration and large width, does present a significant nonuniform flow speed, as already discussed by Campagnolo et al. (2019). Likewise, the effect of secondary steering is particularly important and should not be neglected for accurate predictions in misaligned conditions, as already reported in various publications. Based on these results, in the following only nonuniform inflow and secondary steering corrections are considered.

Results
A total of 451 observations were available, including 11 different turntable positions and thus wind farm layouts, with turbine yaw misalignments ranging from −40 to +40 • . A total of 190 observations were used to identify the five parameters associated with nonuniform inflow speed and the six associated with secondary steering, whereas the remaining data points were used for model validation. The various  Among all the available measurements gathered at each operating condition, only the steady-state power of the wind turbines was utilized, mimicking what could be done at full scale in the field using SCADA data. The model outputs y (see Eq. 9) are defined as where P WTi is the power of the ith wind turbine and P ref = 37.6W is a reference value used as the scaling factor. Based on experience, a diagonal measurement noise covariance matrix R with all three terms equal to σ 2 = 0.025 2 was specified.
The threshold of the highest acceptable standard variance σ 2 t for the orthogonal parameters was set to 0.01. As the parameters are scaled within a range of [−1, 1], the threshold corresponds to a relative variance of 2 %. Wind-aligned operating conditions (i.e. γ WT1 = γ WT2 = γ WT3 = 0 • ) were weighted with a factor of 2, to increase their importance in the parameter estimation process.
The constrained optimization problem (13) was solved in MATLAB using the fmincon function with the interior-point algorithm (Mathworks, 2019). As the baseline model with its initial nominal values (p = p init ) is far away from the optimal solution, a first optimization was performed including only the inflow correction. Afterwards, three iterations were conducted including all 11 parameters. At each iteration, a total of eight orthogonal parameters could be identified within the specified variance threshold. The method converged very quickly, as the identified parameters and the residual did not change significantly after the first iteration. Figure 6a shows the initial variance of all 11 orthogonal parameters, and panel (b) shows the variance computed after the first iteration. The horizontal black line indicates the threshold σ 2 t . Interestingly, the 11th orthogonal parameter seems to have a very low observability. Table 3 shows the transformation matrix V T that links the physical parameters to the orthogonal ones ( = V T p; see Eq. 22). The 11th orthogonal parameter is almost entirely associated with p speed,5 , which corresponds to the inflow speed augmentation node at position Y = 1 m. Indeed, the location of this node is such that it has only a very marginal effect on the turbine outputs and, hence, a very low observability, as shown later in Fig. 7. The transformation matrix reported in Table 3 also shows that the other two orthogonal parameters with low observability (9 and 10) represent secondary steering modes, mainly associated with the second Gaussian function of the correction term. Table 4 presents the correlation matrix (see Eq. 17) and shows a clear and to be expected dependency among neighboring inflow parameters. Among the secondary steering pa- Figure 6. Variance of the orthogonal parameters before (a) and after (b) the first iteration. The identifiable orthogonal parameters are shown in red, whereas all others are shown in blue. Table 3. Transformation matrix V T after the first iteration. Each row corresponds to a different orthogonal parameter. rameters, strong but less obvious correlations are present, which suggest that a simplification of the assumed correction term might be possible. Figure 7 shows the identified inflow augmentation function. In the picture, whiskers indicate the parameter uncertainty σ i , computed based on the Cramér-Rao lower error bound as σ = diag(P) (see Eq. 16). The same figure also reports measurements obtained with hot-wire probes in the empty wind tunnel at three different heights above the floor. These measurements, and especially the ones at hub height, are in good agreement with the estimates provided by the proposed method. The figure also reports (with × symbols) the lateral position of the upstream turbine for the investigated turntable rotations. Noting that all points are shifted to the left helps explain why the parameter associated with the inflow node at Y = 1 m has a very low -but still finiteobservability.
The identified secondary steering augmentation term is visualized in Fig. 8. The plot shows the wind direction change as a function of the distanceỹ to the wake centerline for a turbine misalignment of 20 • . The gray shaded area shows the uncertainty band p opt,i ± σ i . Consistently with the findings of Wang et al. (2018), the maximum change in wind direction is found at approximatively 0.3D on the leeward side of a deflected wake. The maximum magnitude of secondary steering in this operating condition is 1.9 • , which is again comparable to the results of Wang et al. (2018).
The validity of the augmentation terms, identified as explained, was assessed by comparing the results of the simulation model with experimental wake measurements from a  different test campaign. The setup was identical to the one considered here, except for the fact that only the first two upstream wind turbines were installed in the wind tunnel. At the downstream distance where the third wind turbine should have been installed, flow velocity measurements were obtained at turbine hub height using hot-wire probes. Figure 9 shows wake profiles for the turntable position γ TT = 0 • for various combinations of turbine yaw misalignments, as indicated by the subplot titles. Each subplot is accompanied by two flow visualizations, one based on the baseline FLORIS model and the other on its augmented version. The figures Figure 8. Identified wind direction change due to secondary steering as a function of distanceỹ to the wake centerline for a turbine misalignment of 20 • . The gray shaded area shows the uncertainty band. also include the points at which the flow was measured with the probes.
In the left subplots, the improvements of the augmented model with respect to the baseline FLORIS are exclusively due to the inflow correction, as the upstream turbine is aligned with the flow and therefore there are no secondary steering effects. In the right subplots, the upstream turbine is misaligned (γ WT1 = 30 • ) and secondary steering effects are present. Taking into account that model augmentation was obtained exclusively by turbine power measurements, the improved matching of the wake profiles is remarkable. Still, even with the extra correction terms some small model mismatches are present; these might be caused by the wake combination model, which was not augmented in this study.
The turbine power coefficients are computed as where V ∞ is the augmented inflow function given by Eq.
(2), evaluated at the respective turbine position Y WTi and hub height z h . A detailed overview of the results is offered by the figures of Appendix A, which report the power outputs and the model errors for all wind farm configurations. For readability, here a more synthetic overview of the results is presented, by condensing the information contained in Figs. A1, A2, and A3 in the probability density plots of Fig. 10. This figure shows the results for the baseline FLORIS model using a black dashed line, for the 11-parameter augmented model (i.e. including only nonuniform inflow speed and secondary steering corrections) using a red solid line, and for the 27-parameter augmented model (i.e. including all additional augmentation terms presented earlier) using a red dotted line. The root-mean-squared errors RMS are shown in the respective legends. Note that the FLORIS error distribution shows two peaks for WT1 and WT3, indicating the presence of two uncorrelated errors. The 11-parameter model removes these peaks, even though a smaller pair of peaks remains for WT2 and WT3, indicating additional errors that only the 27-parameter augmented model is able to capture.
Here again the trend is clear: the addition of nonuniform speed and secondary steering substantially increases the accuracy of the baseline model, with additional small -but not insignificant -gains offered by the additional correction terms. Finally, there is still room for improvement, possibly through extra correction terms not yet explored.

Field application
In this section the model augmentation and identification method is applied to a full-scale wind farm, to test its applicability and usability in a realistic scenario. In such conditions, it is often difficult to assess whether the identified model corrections are indeed physical or not, due to a lack of knowledge of the actual ground truth. To deal with this problem, the classical approach of splitting the data set was used here: first, a relatively small subset of measurements is used for model and error identification; then, the rest of the data set is used for a verification of the generality of the identified model and of its improved performance with respect to the baseline one.

Wind farm and data preprocessing
The onshore wind farm is situated close to Sedini, on the Italian island of Sardinia, and it consists of 43 GE1.5s and GE1.5sle wind turbines, as specified in Table 5. The wind farm is located at a rather complex site, as shown in Fig. 11. Blue turbines are of the type GE1.5sle and black and red turbines are of the type GE1.5s, the latter being used as sensing turbines as explained later. Figure 12 shows a top view of the wind farm, including the turbine identifiers.
Historical 10 min SCADA data were made available for this research for a period of 24 months, throughout the years 2015 and 2016. The recorded turbine yaw orientations exhibit sudden jumps and long-term drifts. An ad hoc algorithm was developed for detecting and correcting these data issues. On average, for each turbine 45 % of the data points were missing, and 23 % were discarded because of low power output (< 5 kW) or rotor speed (< 1 rpm). As a result, about 33 700 data points were available for each turbine. Regarding the missing data points, it is unknown whether the turbines were operating or just not reporting. To avoid eliminating a large fraction of the data set, it was assumed that the turbines were indeed operational and thus shedding wakes. This way, even if recordings of one or more turbines were missing at a specific time instance, the data points of the other turbines could still be used.
As no direct measurements of ambient conditions were available, the method described by Schreiber et al. (2018) was used to identify ambient wind speed and direction. The procedure works as follows. First, the ambient wind direction is estimated from turbine yaw orientations. Second, the ambient wind speed is estimated from the rotor effective wind speed of the free-stream turbines, computed from the turbine power curve below rated wind speed. For this purpose, the three sensing turbines A5-24, A5-25, and A5-26 indicated in red in Fig. 12 were used, checking that they were unwaked by using the flow model; the average of these speeds was attributed to the location of turbine A5-25. This way, 5667 ambient wind conditions could be processed for a range of wind directions ∈ [184 • , 320 • ]. Based on the ambient wind conditions, the data of all turbines were aggregated in twodimensional bins: ambient wind speed (bin width of 2 m s −1 ) and ambient wind direction (bin width of 5 • ). Figure 13 shows the scaled number of measurements in each bin between 6 and 12 m s −1 .

Model setup
Here again the FLORIS implementation was based on the version available online (Doekemeijer and Storm, 2019).
The required turbine power and thrust versus wind speed curves were provided by the turbine manufacturer. The vertical shear exponent of the inflow was set to α vs = 0.143 and the turbulence intensity to 14 %, which represent annual average values measured at 65 m of height by an on-site met mast. Air density was set to the constant value ρ = 1.177 kg m −3 .
The different turbine foundation heights were accounted for by accordingly increasing the tower heights, using the lowest foundation height as reference (turbine A1-02). Indeed, power measurements of the upstream turbines show a correlation with the actual turbine hub height with respect to sea level (SL), as shown in Fig. 14. As indicated by the only approximate correlation shown by the figure, it is clear that such simple correction might not provide satisfactory results for all wind directions and all turbines, because complex orthographic flow effects might also play a role. Nonetheless, this approximate correction seems to be a step in the right direction. In addition, some of these effects may be corrected by the lateral nonuniformity terms added to the augmented model. The reference height of the sheared inflow z h (see Eq. 2) was set to the hub height of the sensing turbine A5-25.

Ranking of correction terms
As for the wind tunnel experiments, here again a first analysis was aimed at ranking the various correction terms. However, since the turbines were operated with a conventional wind-aligned strategy, secondary steering corrections were neglected. The ranking is based on data points in the range V ∈ [8, 10] m s −1 , as described in greater detail in the following. Figure 15 shows the relative increase in the cost function after optimization eliminating one set of parameters at a time. The results clearly indicate that the nonuniform wind farm inflow speed p speed is the most important correction. In fact, this was to be expected, given that the Sedini wind farm is located at a rather complex site. Results also indicate a nonnegligible effect of the wake deflection parameters for nonmisaligned operation (a d , b d ).
On the other hand, the additional model augmentation parameters (p TI , p winddir , p acc , p shear ) do not seem to contribute to a significant extent. Note also the slight retuning of parameters (α, β, k a , k b ) and (TI a , TI b , TI c , TI d ), which can be explained with the fact that their initial values were taken from the literature and therefore apply to different turbine types and sites.
Given these results, the rest of the analysis is based only on the subset of parameters p inflow , (p a d , p b d ), (p α , p β ), (p k a , p k b ), and (p TI a , p TI b , p TI c , p TI d ). The augmentation term for nonuniform inflow speed is modeled using five nodes along the lateral position Y located at [−2000; −1000; 0; 1000; 2000] m (which is approximatively   nodes. The Y -coordinate axis is orthogonal to the wind direction and its origin Y = 0 m is located at the position of wind turbine A5-25, as shown in Fig. 12. The definitions of the correction parameter, together with their bounds and converged values, are reported in Table 7. Note that all parameters were set to zero at the beginning of the identification process.

Results
To identify the 40 parameters of Table 7, only aggregated mean power measurements for wind speeds V ∈ [8, 10] m s −1 were used. In addition, only one-third of all wind direction bins were employed, The model outputs y (see Eq. 9) were defined as where P WTi is the power of wind turbine i and P ref = 1.11 MW a reference wind turbine value used as a scaling factor. A diagonal measurement noise covariance matrix R was used, with all diagonal terms equal to σ 2 = 0.01 2 . The threshold of the highest acceptable variance in the orthogonal  parameter estimate was set to σ 2 t = 0.01, which corresponds to a relative variance of 2 %. The relative weight of each observation was set proportional to the number of measurement points within the respective bin. In a first iteration, 29 orthogonal parameters could be identified. In the second and third iterations only 23 and 25 orthogonal parameters fell below the threshold, although results changed only marginally after the first iteration.
The identified optimal parameter values p opt,i are included in Table 7 and, for the inflow augmentation, are also reported in Fig. 16. The plot shows, according to the color map, the inflow augmentation function values f augm,speed (Y, , c speed , p speed ) in panel (a). Each nodal point is indicated by a circle marker. The figure shows that significant variations in the inflow speed have been detected: for example, considering = 270 • , the inflow speed at Y = +1000 m (approximately at the location of wind turbines A3-19, A3-20, and A3-21) is 3.5 % smaller than the one mea-sured at the reference turbines A5-24, A5-25, and A5-26. For the same wind direction, the speed at Y = −1000 m (approximately located at the wind turbines A4-36, A4-37, and A4-38) is 4.8 % larger. These variations are expected to be mainly caused by terrain effects. Panel (b) of Fig. 16 shows the parameter uncertainty (Cramér-Rao bounds). The parameter at the nodal point (Y = −2000 m; = 330 • ) is completely unobservable, because it lies far outside of the wind farm perimeter (see Fig. 12). Some of the outer nodal points at Y = ±2000 m do show significantly increased uncertainties. However, the corresponding augmentation parameters (panel a) are approximatively zero. Figure 17 shows the power coefficient of each individual wind turbine, as indicated by the subplot title, as a function of wind direction. The power coefficient is computed as C P = P /(0.5ρAV 3 ), where ρ = 1.177 kg m −3 is the constant air density, A = π (70.5/2) 2 m 2 a reference rotor area, and V the corresponding estimated ambient wind speed. Blue crosses indicate SCADA data points, with the ones used for identification circled. The gray shaded area indicates the standard deviation within the binned measurements. The FLORIS (non-augmented) power estimates are shown by black dashed lines, whereas the augmented model results are shown using red solid lines.
Even though the baseline FLORIS power estimates already exhibit a reasonable correlation with the measurements for many turbines and wind directions, a significant improvement is achieved by the augmented model. Note that for < 210 • and > 300 • the number of measurement points within each bin is reduced (see Fig. 13), limiting the measurement quality and trustworthiness. More specifically, the augmented model shows improvements in the modeling of the free-stream turbine power, due to the effects of the wind farm inflow augmentation terms. Furthermore, the predictions of the wake-induced power deficits are corrected, improving in many cases the deficit depth as well as the deficit location in terms of wind direction. The same results of Fig. 17 are also presented in a more synthetic form in terms of error probability densities in Fig. 18, where the error is defined as = C P,Meas. − Figure 18. Error probability density functions for different wind speed ranges.
C P,FLORIS/Augm. . Each subplot shows the results for a different wind speed range. Note that the modeling error is also reduced for wind speed ranges that have not been used for model identification (i.e. V ∈ [6, 8] m s −1 and V ∈ [10, 12] m s −1 ). The overall root-mean-squared error is reported within the legend, showing error reductions of 14 %, 22 %, and 19 %, highlighting the generality of the identified model and augmentation parameters.

Conclusions
This paper has presented a new method to calibrate and augment parametric wind farm models. The proposed approach builds on the vast body of knowledge and experience embedded in available reduced wind farm flow models. However, recognizing that any such model will always have only a limited prediction accuracy, the present approach augments a baseline model with extra ad hoc terms designed to correct some of its presumed specific deficiencies. These additional elements of the model are then learned from operational data. Optionally, the baseline model parameters can also be tuned within a single integrated process. By design, the method has been exclusively based here on SCADA power measurements; therefore, it is readily applicable to most operational wind farms, whenever such data are available. However, the concept of model augmentation is very general and could clearly also be used with other measurements.
To limit the number of free parameters and to overcome the fact that the identification problem can be overparameterized and hence ill-posed, a parameter transforma-tion into an orthogonal space has been used. Thereby, only parameters that are sufficiently visible within a given data set enter into the identification process.
The method was first applied to a large data set obtained with scaled wind turbines operating in a boundary layer wind tunnel. Thereby, it was shown that a correct learning of the extra modeling terms is achieved. These conclusions are made possible by the fact that, in this case, the flow and wake characteristics are known with good accuracy. Next, the method was tested on a real wind farm, in a realistic and highly complex situation.
Based on the results shown here, the following conclusions can be drawn.
-Within the wind tunnel environment, a correct learning of nonuniform wind farm inflow speed and of secondary steering effects has been achieved. In particular, the latter shows a good match with detailed wake measurements in wind-misaligned conditions. It is remarkable, and very promising, that such detailed features of the solution could be inferred purely from operational power data, even when starting from a baseline model that does not at all consider secondary steering.
-The application to field data has shown that, as expected for the complex-terrain site analyzed here, orographic effects play a driving role. A marked model improvement could be observed, even in conditions where the model was used for extrapolating outside of the training conditions. It is worth noting that, in many practical onshore applications, orographic effects will be present, and the fact that one can learn them from simple and readily available operational data is very encouraging. Again, it should be explicitly pointed out that the baseline model did not include any orographic corrections.
-It has been shown that model tuning and the learning of extra correction terms can be achieved simultaneously. This reduces the risk of adapting the baseline parameters beyond their reasonable limits, driven by unmodeled physics.
-Although the augmented models show a much improved accuracy with respect to the baseline, some model mismatch still remains. Although these remaining errors may often be caused by issues in the data rather than in the model, additional improvements are thought to be possible.
Future work will apply the proposed method to other wind farms, to increase confidence in the obtained results. From longer and richer data sets, possibly in conjunction with meteorological reanalyses, it is presumed that yearly and seasonal variations could be observed. The integration of CFD analyses can be used to support and confirm the identification of orographic effects. Attention should also be paid to improved and additional forms of model corrections, including wake overlap models. Finally, it is worth pointing out again that an improved knowledge of the flow within a wind farm finds applicability in a potentially large range of digitally driven applications, including wind farm control, lifetime estimation, power forecasting, predictive maintenance, and others. Therefore, it is expected that methods for highaccuracy flow predictions in wind farms will be the subject of significant future research efforts.