A physically interpretable data-driven surrogate model for wake steering

. Wake steering models for control purposes are typically based on analytical wake descriptions tuned to match experimental or numerical data. This study explores whether a data-driven surrogate model with a high degree of physical interpretation can accurately describe the redirected wake. A linear model trained with large eddy simulation data estimates wake parameters such as deficit, center location and curliness from measurable inflow and turbine variables. These wake parameters are then used to generate vertical cross sections of the wake at desired downstream locations. In a validation 5 considering eight boundary layers ranging from neutral to stable conditions, the far wake’s trajectory, curl and available power are accurately estimated. A significant improvement in accuracy is shown in a benchmark study against two analytical wake models, especially under derated operating conditions and stable atmospheric stratifications. Even though the results are not directly generalizable to all atmospheric conditions, locations or turbine types, the outcome of this study is encouraging.

potential of this surrogate model is systematically assessed by evaluating its performance with large eddy simulations (LES) for a reference wind speed under a range of atmospheric conditions and subsequently benchmarking it against two analytical wake models (GAUS and GCH). Lastly, the surrogate model's generalizability to all atmospheric conditions, locations and 60 turbine types is discussed.

Large eddy simulations
In this study data are generated with revision 3475 of the PArallelized Large eddy simulation Model (PALM, Maronga et al. (2020)), which uses a non-hydrostatic incompressible Boussinesq approximation of the Navier-Stokes equations and the Monin-Obukhov Similarity Theory to describe surface fluxes. In the boundary layer, the grid on a right-handed Cartesian coordinate system is regularly spaced with ∆ = 5 m, while above the boundary layer height the vertical grid size increases with 6 % per cell to save computational costs. The Coriolis parameter corresponds to that at 55 • N. Default numerical schemes are used, the main ones being a third-order Runge-Kutta scheme for time integration, a fifth-order Wicker-Skamarock advection scheme for the momentum equations, Deardorff's 1.5-order turbulence closure parameterization for subgrid-scale turbulence and an iterative multigrid scheme for the horizontal boundary conditions. The simulation chain consists of a precursor simula-70 tion to generate realistic inflow conditions and a subsequent main simulation that contains one turbine.

Precursor simulations
Inflow conditions with realistic turbulent features are generated from an initially laminar flow by adding random perturbations in a precursor simulation with cyclic horizontal boundary conditions. To study the potential of DART under different inflow conditions, eight boundary layers (BLs) ranging from a neutral to a strongly stable BL are used as reference inflow conditions, 75 all having approximately the same wind direction and wind speed at hub height. As reported by Vollmer et al. (2016), wake steering is ineffective in a convective boundary layer, which is therefore not considered in this study. Due to the large computational expense it was not possible to increase the number of simulations. Although these eight BLs do not capture the great variability of the free field, it is considered sufficient to provide a proof of concept for data-driven models.
The total domain size and simulation time vary between the BLs and are determined empirically until convergence to a sta-80 tionary state occurs and are dependent on the size of the largest eddies that explicitly need to be resolved. The details of the precursor simulations are summarized in Table 1. BL1 and BL2 portray neutral conditions with roughness lengths representing low crops (z 0 = 0.1 m) and parkland (z 0 = 0.5 m), which are typical landscapes found in Northern Germany. Following Basu et al. (2008), constant cooling rates at the surface are prescribed to generate stable BLs, where BL3 and BL4 represent weakly stable (∂θ ∂t −1 = −0.125 K h −1 ), and BL5 and BL6 Table 1. Summary of simulation parameters and classification into neutral (NBL), near neutral (NNBL), weakly stable (WSBL) and stable (SBL) boundary layers. The size (Lx,p, Ly, Lz) of the domains is normalized by the rotor diameter (D = 126 m). All parameters are identical in precursor and main simulations, except for the domain size which is extended in the streamwise direction (Lx,m). tp is the simulated time of the precursor run, ug and vg the geostrophic wind, ∂θ ∂t −1 the heating rate, H the sensible heat flux and z0 the surface roughness length. tions are assumed to be undisturbed, hence far enough from the turbine that induction does not play a role. Typical inflow parameters are displayed in Fig. 1, indicating that the wind speed is comparable for all simulations, but that the atmospheric conditions vary. A more stable boundary layer, indicated by a larger Obukhov stability parameter (z L −1 ), typically has a higher shear (α) and veer (∂α) and lower turbulence intensity (T I). The spread of the parameters between the main simulations (see Sect. 2.2) in the same boundary layer, indicated by the standard deviation as whiskers in Fig. 1, is small enough to be neglected.

Main simulations
After generating stationary inflow conditions with a precursor, simulations with one turbine are performed. Information on turbulence characteristics from the precursor simulation is fed to the main simulation by adding a turbulent signal to a fixed mean inflow (turbulence recycling method) far upstream of the turbine. A radiation boundary condition ensures undisturbed 100 outflow downstream of the simulated turbine. The size of the recycling area is equal to the domain size of the precursor simulation and the domain size of the main simulation is only extended in streamwise direction by placing a turbine at x = 6 D downstream of the recycling area. Wake data until x = 10 D are used for analysis, but the domain is extended to x = 13 D to eliminate blockage effects. The total simulation time is 80 minutes: the first 20 minutes are considered spin-up time, the last 60 minutes are used for analysis.

105
The simulated turbine is an Actuator Disc Model with Rotation (ADMR) representing a 5MW NREL turbine, having a hub height of 90 m and a rotor diameter D of 126 m (Jonkman et al., 2009), as implemented in Dörenkämper et al. (2015). Turbine Figure 1. Summary of inflow parameters (60 min averages), given as mean (dots) and standard deviation (whiskers) over the 15 main simulations performed in each BL (5 yaw angles times 3 pitch angles). Considered are wind speed (u h ) and turbulence intensity at hub height (T I), wind shear (α) and veer (δα) over the rotor area and the Obukhov stability parameter (z L −1 ) at z = 2.5 m. Equations for these variables can be found in Table 3. yaw angles (ϕ) of -30 • , -15 • , 0 • , 15 • , and 30 • are simulated, where a positive yaw angle is here defined as a clockwise rotation of the turbine when looking from above. Pitch angles (β) of 0 • , 2.5 • , and 5 • are simulated to study the effect of the thrust force on downstream wake characteristics. This adds up to a total of 120 main simulations with one turbine, i.e. 15 turbine settings 110 (5 yaw angles times 3 pitch angles) for each of the 8 inflow conditions. The effect of ϕ and β on the thrust coefficient C T is illustrated in Fig. 2, illustrating that the effect of the turbine yaw angle of the thrust coefficient is approximately symmetric around zero.
It should be noted that smaller step sizes for yaw and pitch angles would be preferred, as these step sizes could be too coarse away from these set points (e.g. for ϕ = 7.5 • ). Increasing the step size would, however, lead to more simulations, which was computationally not feasible and not deemed necessary to show proof of concept.
The wake is described using the normalized wake deficit, defined as u nd = u wake −u∞ , where u wake represents the observed wind speed in the wake, u ∞ the undisturbed inflow 2.5 D upstream at the same height and u ∞,h the undisturbed inflow at hub height. It is assumed that the advection velocity is constant in streamwise direction (assumption of frozen turbulence) and that 120 the wake behaves as a passive tracer in the ambient wind (Larsen et al., 2008).

Development of the Data-driven wAke steeRing surrogaTe model
This section describes the development of the Data-driven wAke steeRing surrogaTe model (DART). It should be noted that many different kinds of data-driven models exist. For the purpose of this exploratory study, the focus was to develop a simple regression model that performs well on small data sets without the risk of overfitting. This matrix can be used in the execution (testing) of the model to estimate the key wake steering parameters for new inflow conditions. The same operations as in the training procedure are done on the input variables to obtain the input parameters,

Defining key wake steering parameters
A data-driven model will not be able to produce a full multidimensional flow field, but rather estimate parameters describing 140 the wake at desired downstream positions. Since curled wakes are considered, key wake steering parameters are in this study retrieved with the Multiple 1D Gaussian method (Sengers et al., 2020). In the example below, the wake of a turbine with a +30 • yaw angle in BL1 at x = 5 D is considered (Fig. 4a). This method fits a simple 1D Gaussian at every vertical level (k = 1...K) where information is available to obtain a set of local normalized wake center deficits (A = A 1 ...A K ), wake center positions (µ = µ 1 ...µ K ) and wake widths (σ = σ 1 ...σ K ). Subsequently, another Gaussian can be fitted through the local wake center 145 deficits in the vertical (Fig. 4b) to find the overall normalized wake center deficit (A z ) and vertical position with respect to hub height (µ z ), as well as the vertical extension of the wake (σ z ). The local wake center position and width at vertical level k that corresponds to µ z are subsequently considered as lateral wake center position (µ y ) relative to the turbine location and wake width (σ y ). Next, by fitting a second order polynomial through the local wake center positions between upper and lower tip height ( Fig. 4c), one obtains a measure for the curl (coefficient of quadratic term) and tilt (coefficient of linear term) of the 150 wake. An expression for the wake width as a function of height is found by repeating this step for the local wake widths ( Fig.   4d) to obtain coefficients s a and s b . After this procedure, the wake can be described by the set of dimensionless parameters displayed in Table 2.
Note that this method cannot accurately capture the splitting of the wake in two separate cells, which might occur under strong Table 2. Defined dimensionless key wake steering parameters. The normalized wake deficit is computed as described in Sect. 2.2. All length parameters are nondimensionalized by the rotor diameter D.

Scalar Parameter Symbol
Amplitude normalized wake deficit Az

Lateral wake center displacement µy
Vertical wake center displacement µz Width wake center height σy

Vertical extend σz
Curl curl

Tilt tilt
Quadratic wake width parameter sa Linear wake width parameter s b veer as discussed in Vollmer et al. (2016). Such cases will result in inaccurate values for the key wake steering parameters and 155 should be filtered out before applying the regression model described in Sect. 3.3.

Input parameters
A regression model (Sect. 3.3) is used to estimate the key wake steering parameters in Table 2. A set of measurable inflow and turbine variables is used as input parameters, which are made dimensionless to make the model more universally applicable, at least within the variability found between the simulations in this study. This set of parameters is presented in Table 3. with, see Sect. 3.5. Although a high correlation between input variables is usually undesirable in regression problems due to multicollinearity, Sect. 3.3 explains that this is not an issue due to the regression model used in this study.
This regression model is linear, so to include nonlinear relations the input variables can be transformed using reciprocal

Variable Symbol Calculated
Turbine yaw angle ϕ ϕ polynomial and interaction terms are added, as well as an intercept (unity), extending the set of input parameters.

Regression model
Since the LES data set has a relatively small sample size, a linear model is chosen as they perform well on small sample sizes, reduce the risk of overfitting compared to more complex Machine Learning models and are highly interpretable (Hastie et al., The regression is formulated as a linear model in matrix form . (1) which estimates the output variable Y based on the design matrix X and coefficient matrix B. Matrix dimensions indicated in Eq.
(1) represent the sample size n, number of downstream distances d and number of input parameters p. Note that p contains the transformed variables, their second-order and interaction terms, as well as intercepts. Since these parameters 185 are highly correlated and not all relevant, the coefficients are determined based on a Lasso regression method as introduced by Tibshirani (1996). This guarantees a shrinkage of the number of variables through a regularization parameter found by cross-validation. Relevant input parameters are isolated from irrelevant parameters by multiplying the latter with a coefficient of zero, effectively eliminating them from Eq. 1. Multicollinearity is therefore not an issue in Lasso, contrary to Ordinary Least Squares, as typically only one parameter is chosen from a set of highly correlated input parameters. This reduces the 190 number of input parameters, increasing the interpretability of the model. Additionally, it is desired that the same set of input parameters is used to estimate the output variable at all downstream distances. This is guaranteed in the multi-task Lasso method introduced by Obozinski et al. (2006), which is implemented in the multi-task Lasso algorithm from the scikit-learn Python library (Pedregosa et al., 2011). See Appendix A for further explanation.
Whereas fitting the regression coefficients is more complex than Ordinary Least Squares fitting, the estimations of the key 195 wake steering variables in the testing or execution phase are generated through simple matrix multiplication as shown in Eq.
(1). The algorithm is therefore highly interpretable, easy to implement and computational inexpensive.

Wake composition: reversed Multiple 1D Gaussian
The coefficient matrix B can be used to estimate the key wake steering parameters in Table 2 from inflow variables. This information is used to compose a vertical cross section of the wake deficit using the reverse of the Multiple 1D Gaussian 200 method described in Sect. 3.1. The amplitude of the normalized wake deficitÂ at each height (k = 1..K) can be computed by simply filling out the Gaussian function using A z , µ z , σ z . Similarly, local wake center positionsμ and local wake widthsσ can be found by filling out a second order polynomial. Additional assumptions outside of the rotor area are that the curl continues (red dashed line in Fig. 4c) and the wake width can be described by an ellipse between lower tip and surface and between upper tip and wake top (red dashed line in Fig. 4d). Finally, a simple 1D Gaussian can be filled out at every vertical level using 205 the information fromÂ,μ,σ, resulting in a two-dimensional grid filled with u nd values (Fig. 4e). Comparing this composed wake to the original LES in Fig. 4a, one can see that this simple description still contains much of the original information. The shape of the wake is conserved, as well as the displacement of the wake center. The maximum deficit of the composed wake center appears to be slightly larger than in LES. Additionally, in the composition the maximum wake deficit is always in the center (definition of a Gaussian), which is not necessarily true in LES or reality.
210 Figure 6. Accuracy of the wake composition procedure expressed as a percentage error of available power of a virtual downstream turbine.
At each downstream distance, data from all 120 simulations are considered. The subplot in the top right zooms in to 4 D ≤ x ≤ 10 D. Axes labels correspond to those of the main plot.

Wake composition validation
The procedure described in Sect. 3.4 is repeated for all 120 simulations and 1 D ≤ x ≤ 10 D at every D. The metric used here to evaluate the accuracy of this method is the percentage error of available power in the rotor area of the composed wake relative to when computed with the original LES wind field (P E [%] = (P comp − P LES )/P LES * 100). A few things can be noted by studying the results shown in Fig. 6. The composition shows a large systematic positive bias in the near wake (x ≤ 3 215 D). This is due to the so-called double bell shape of the near wake, with a speed up region around hub height. When attempting to fit this with a simple 1D Gaussian, the deficit in the rotor area is underestimated, resulting in a positive percentage error.
For this reason, the near wake will be excluded from analysis in the remainder of this work. Further downstream (x ≥ 8 D) a small negative systematic bias can be identified, which is due to the 'top-hat' shape of the wake deficit as a result of temporal averaging. This is not captured by a Gaussian function and will on average result in an overestimation of the wake deficit 220 amplitude. The large (negative) outliers typically indicate cases where the wake does not have a Gaussian shape, such as the separation in two cells under strong veer. The median error in the region 4 D ≤ x ≤ 10 D is, however, smaller than 1 %.

Feature selection
Numerous combinations of input parameters are possible. This includes choosing from the variables presented in Table 3, as well as which of the five transformations proposed in Sect. 3.2 to use. In order to find the most accurate solution, all combina-225 tions are tested. The combination that provides the minimum absolute percentage error of available power over all training data, i.e. all considered simulations and downstream distances (4 D ≤ x ≤ 10 D), is sought. When using all eight variables presented in Table 3,  occur. This results in a total of 3 1 * 5 7 = 234375 possible combinations. Not only is using all variables computationally expensive, as will be discussed in Sect. 4.1, operationally it is also unlikely that all variables are routinely obtained due to high costs.
As hypothesized in Sect. 3.2, using one variable from each input cluster is already expected to produce accurate results. To test this, a version of DART with only three variables, denoted DART-3, is considered. Allowing each variable to be chosen and transformed, the total number of possible combinations is a multiplication of the possible combinations of each input cluster.

235
In total, (1 * 3) * (4 * 5) * (3 * 5) = 900 possible combinations are tested to find the optimal set of input parameters. It should be noted that other feature selection procedures could be considered to reduce the computational expense needed for training, but enhancing the training procedure was considered outside of the scope of the current work.

Benchmark models
DART is benchmarked against the Gaussian (GAUS) and the Gaussian-Curl Hybrid (GCH) models present in version 2.2.2 of 240 the FLORIS framework (NREL, 2020). Although secondary steering is not studied here, the GCH is still included because of its incorporation of initial wake deflection and the added wake recovery term. Both models share the same tuning parameters for the far wake onset (α floris , β floris ) and wake recovery rate (k a,floris , k b,floris ). Analogous to the training of DART discussed in Sect. 3.5, the values of the tuning parameters is determined by minimizing the APE of available power over all considered simulations and downstream distances (4 D ≤ x ≤ 10 D). Information on inflow (e.g. u h , T I) is taken from the LES data. The 245 models are trained independently of each other and will therefore have different values for the tuning parameters.
The data used for the tuning include simulations with yaw and pitch angles. FLORIS adjusts the thrust coefficient numerically for yaw angles, but not for pitch angles. For this reason, the thrust coefficient lookup table was adjusted by the ratio C T,pitch /C T,nopitch found in LES (Fig. 2).

Performance on training data
This section displays the performance of the Data-driven wAke steeRing surrogaTe model (DART) and the benchmark models when using all 120 simulations for training or tuning. In Sect. 4.2 and 4.3 a validation of the model with testing data will be shown.
Following the feature selection procedure as described in Sect. 3.5, the optimal combinations of input parameters of DART-255 8 was found to be the set {ϕ, δα, α −1 , ln(z L −1 ), T I −1 , C −1 T , C Q , T SR −1 }, and for DART-3 {ϕ, δα −1 , ln(C T )}. Figure 7 compares the performance of these versions to that of the benchmark models as a function of downstream distance.
The shaded areas indicate a significant improvement (green), insignificant difference (yellow) or significant decline (red) of the DART accuracy compared to the best performing benchmark model. Statistical significance is determined using an independent Welch's t-test on the absolute percentage error with a p-value < 0.05. This test assumes a normal distribution, but can deal with unequal variances between data sets. From Fig. 7 it is clear that both DART-8 and DART-3 consistently provide significantly more accurate results than GAUS and GCH. Most striking is the variability of the benchmark models that is an order of magnitude larger than that of DART. The reason for this will be systematically evaluated in Sect. 4.2 and 4.3. The systematic error, indicated by the median, is however very similar for all models. Comparing the two benchmark models, it is clear that GCH consistently gives a higher power 265 estimate than GAUS due to the added wake recovery term. The accuracy of DART-8 is higher than that of DART-3, especially closer to the turbine. This is attributed to the stronger wake deficit closer to the turbine, as the wake center deficit A z exhibits a larger range of possible values closer to the turbine. For instance in the training data at x = 4 D, the range is -0.6 ≤ A z ≤ -0.21, whereas at x = 10 D the range is -0.27 ≤ A z ≤ -0.09. Estimations with the same relative error therefore bear a larger absolute error closer the turbine. Having access to more information, DART-8 consistently has a smaller relative error estimating A z 270 than DART-3, which has a larger effect on the available power estimates closer to the turbine.
The order of magnitude of computational costs needed to train the models on a single node is displayed in Table 4 Table 4. Model training (DART) or tuning (GAUS and GCH) time using all 120 simulations and 7 (4 D ≤ x ≤ 10 D) downstream distances.
Iteration times are expressed as the mean over the first 100 iterations. DART's Total is a simple multiplication of Iteration and Combinations.

Performance on testing data
A simple leave-one-out cross-validation technique is used to discuss the performance of DART compared to the benchmark models. The models are trained or tuned with seven out of the eight BLs (Fig. 1) and tested on the remaining one representing a new inflow condition. Eight evaluations can therefore be performed, i.e. each BL being tested once. Note that for each evaluation a set of optimal parameters and transformations are determined, which can differ from DART-3 in Fig. 7. Similarly,

290
GAUS and GCH are tuned again, resulting in new values for their tuning parameters. Since the models show similar behavior in relation to the downstream distance as discussed in Sect. 4.1, here only the collective result over 4 D ≤ x ≤ 10 D is discussed. Figure 8 presents the results of this validation procedure. For all BLs, DART-3 shows a significant improvement over GAUS and GCH. The systematic biases (indicated by the medians) are similar for all models in the order of a few percent, but the variability is greatly reduced in DART-3. The main reason for this is that the benchmark models do not include a pitch angle 295 parameter β. Although the C T tables in the models are corrected in this study, the tunable parameters do not account for this. To clarify, LES finds a decreasing wake size (in both horizontal and vertical extent) with increasing β. This is accurately captured by DART-3, but GAUS and GCH produce a wake of similar size independent of β or C T . The inclusion of this effect is a notable improvement of DART that is important for control strategies such as axial induction control (e.g. Corten and Schaak, 2003;van der Hoek et al., 2019).

300
Furthermore, BL5 contains the worst results for all models. Figure 1 indicates that this is an extreme case as it has the highest Obukhov stability parameter and veer along with the lowest turbulence intensity. This is problematic for the models, since it is an inflow condition unlike anything it was trained for. This indicates a limited generalizability of all models and caution is needed when applying them in conditions that differ greatly from those used for training. This will be further discussed in Sect.

Operation without derating
For a fair comparison between DART-3 and the benchmark models, this section only considers simulations representing operation without derating the turbine (β = 0 • ). The training (selection of parameters for DART-3) and tuning (tuning parameters of GAUS and GCH) has been repeated and the results of the leave-one-out cross-validation technique are displayed in Fig. 9.
The variability of the benchmark models in (near) neutral conditions (BL 1, 2, 7 and 8) decreases considerably, but DART-3 310 still produces significantly more accurate results. In (weakly) stable boundary layers (BLs 3 to 6) GAUS and GCH still show a large variability and occasionally a large systematic bias, which is not true for DART-3. These results suggest that DART-3 outperforms the benchmark models especially under stable stratifications, those conditions where wake steering is deemed most effective. Furthermore, the model performance is assessed for partial wake operation. Figure 10 compares the models when the down-315 stream turbine is moved 0.5 D to the left (from the upstream observer's point of view). Generally, the variability is greatly reduced since the deficit is smaller. The benchmark models display a systematic negative bias in all BLs, which is not true for DART-3. Only in BL8 DART-3 does not show a significant improvement over the benchmark models, but no satisfying explanation has been found why exactly this BL displays this behavior.
A case study is displayed in Fig. 11a that presents the LES wake in a weakly stable boundary layer (BL3) for a turbine with 320 Figure 9. Same as Fig. 8, but only for cases with β = 0 • , i.e. without derating the turbine. Figure 10. Same as Fig. 9, but for partial wake operation, i.e. with a virtual downstream turbine moved 0.5 D to the left. ϕ = +30 • . The wake has a clearly defined curl and a wake center left of the hub. The DART-3 wind field in Figure 11b shows that the wake shape and center position are well presented. The GAUS model (Fig. 11c), however, produces a circular wake shape and a larger wake deflection to the left. The percentage errors indicated in the top of the figure show that DART-3 has a high accuracy for both virtual turbines, but GAUS has large biases due to the misplacement of the wake center. Under stable conditions the wind veer is relatively high, adding a crosswise force pointing towards to right above hub height. This force has already been pointed out in Fleming et al. (2015); Vollmer et al. (2016); Sengers et al. (2020). This effect is implicitly included in DART-3, but not in the benchmark models. Figure 11d illustrates that these models show an ever further deflecting wake, whereas DART-3 settles at a smaller lateral displacement close to LES. Not only does this explain the negative bias of 330 the benchmark models in Fig. 10, but also their larger spread observed in Fig. 9. This result strengthens the previous indication that DART-3 is superior under stable stratifications.

Generalizability
Although the results presented in Sect. 4 are encouraging and are believed to show proof of concept, they are not directly 335 generalizable. A data-driven surrogate model is sensitive to the data used for training, and encountering situations that vary greatly from those used for training can result in large errors. This includes very dissimilar atmospheric conditions, as already illustrated by the strongly stable BL5 in Fig. 8, but extends to other locations (e.g. topography, wind farm layout) and turbine type. Generating a numerical database with more atmospheric conditions, tailored to each location and turbine type is not possible due to the high computational expense of these high fidelity models. This limits a large-scale implementation of data-340 driven surrogate models trained with numerical data. Potentially, field measurements could be used, either in isolation or in combination with numerical data. Wake data could possibly be obtained from long-range lidars (Brugger et al., 2020) or strain measurements from the turbine's blades (Bottasso et al., 2018). Exploration of these possibilities is deemed an important task for future research.
In this exploratory study, the development of DART was limited to the far wake and a two-turbine setup. If desired, further 345 development of the model is needed to include the near wake, which can for instance be done by including the super-Gaussian description (e.g. Shapiro et al., 2019;Blondel and Cathelain, 2020). An extension to wind farm level could be achieved by for instance applying the superposition principle as done in GAUS and GCH, although the accuracy of DART under disturbed inflow needs attention. Figure 12. Regression coefficients of DART-3 estimating µy at x = 6 D using scaled input parameters. Since all input parameters are dimensionless, the corresponding coefficients are also dimensionless. Variable y0 indicates the intercept or systematic offset.

350
As mentioned in Sect. 1, analytical models such as GAUS and GCH are presumed to be more robust than purely data-driven models. However, when properly trained, the accuracy of DART is expected to be significantly higher than that of analytical models, as it is specifically tailored to certain scenarios. This can easily be understood by looking at the number of fitted or tuned parameters. Since DART includes second-order polynomial and interaction terms, adding more input variables exponentially increases the size of coefficient matrix B (Eq. 1). This means that for DART-3, having only 3 input variables, B contains 10 355 coefficients, but with 8 input variables DART-8's B already contains 45 coefficients. When comparing this to the 4 tuning parameters of the benchmark models, one can understand why the latter are more robust, but also are expected to have a lower maximum achievable accuracy.
To demonstrate DART's interpretability, Figure 12 illustrates DART-3's fitted regression coefficients for all 10 input parameters for µ y at x = 6 D. Since the order of magnitude of the input parameters can vary greatly, for this example the input parameters 360 were scaled between -1 and 1 before regression fitting. Consequently, the fitted coefficients indicate how important each input parameter is in estimating the output variable. For the lateral wake center displacement it can easily be seen that ϕ is the dominant parameter, which intuitively makes sense. Other import parameters are the interaction term ϕ * ln(C T ) (turbine variable cluster), y 0 (intercept), δα −1 and δα −2 (atmospheric inflow cluster), while other parameters only slightly affect the wake center displacement.

365
Alternative to the interpretable Lasso model, more complex black-box models (e.g. neural networks) could be considered as they are expected to have a higher accuracy when abundant data is available. Simpler models are, however, always preferred because they are less prone to overfitting, which is especially true for small sample sizes as used in this study. In addition, a model's inpretability typically diminishes with increasing complexity.  Table 5 shows that when producing results for the whole region considered in this study (4 D ≤ x ≤ 10 D), the run time of DART is comparable to GCH and slightly higher than GAUS. When simulating only one downstream distance, for instance exactly where a turbine is located, DART performs similarly to GAUS. These results suggest that DART is quick enough to be used for controlling purposes.

Conclusions
This study explores the potential of a Data-driven wAke steeRing surrogaTe model (DART) that retains a high degree of physical interpretation. After training with large eddy simulation data, a model consisting of only linear equations is able to accurately describe the far wake in terms of trajectory, curl and available power. As input parameters, it uses measurable inflow and turbine variables that are commonly studied in literature. The highest accuracy is obtained when including all available 385 input variables, but the model's training time becomes very large. When using only three measurable input variables, the surrogate model displays a slight accuracy loss, but the training time is greatly reduced. In a benchmark against the Gaussian and Gaussian-Curl Hybrid models, the data-driven model with three input variables typically shows a significantly higher accuracy. In particular it performs better under derated operating conditions and stable atmospheric stratifications, since it implicitly includes the effect of turbine derating on wake size, as well as the effect of veer on the wake center position. These 390 results are not directly generalizable to all atmospheric conditions, other locations or new turbine types, which presents a challenge for a large-scale implementation of data-driven surrogate models. The results shown in this study are, however, believed to show proof of concept for physically interpretable data-driven surrogate models for wake steering purposes.
Appendix A: Multi-task Lasso algorithm The original Lasso implementation from Tibshirani (1996) seeks to find the coefficients B based on in which n is the sample size and p the input parameter. It uses the regularization parameter λ, leading to sparse coefficients for the coefficient vector B. The multi-task setting from Obozinski et al. (2006) extends the Lasso regression to estimate 400 d (distance downstream) outputs simultaneously, penalizing the blocks of coefficients over the tasks. The loss function is therefore extended and finds the coefficient matrix B based on In contrast to Eq. (A1), the multi-task Lasso implementation does not only penalize the single coefficients, but also the blocks of coefficients over all tasks represented by the Euclidean norm. Note that if d = 1, Eq. (A2) reduces to the standard Lasso 405 estimate of Eq. (A1).
An exemplary result is illustrated in Fig. A1. Whereas the original Lasso model selects a new set of variables for each distance, the multi-task Lasso always takes the same set. This makes physically more sense and leads to fewer variables in total, therefore reducing the risk of overfitting. The model is optimized using the cyclical descent algorithm implemented in Pedregosa et al. (2011).