Automatic controller tuning using a zeroth-order optimization algorithm

We develop an automated controller tuning procedure for wind turbines that uses the results of nonlinear, aeroelastic simulations to arrive at an optimal solution. Using a zeroth-order optimization algorithm, simulations using controllers with randomly generated parameters are used to estimate the gradient and converge to an optimal set of those parameters. We use kriging to visualize the design space and estimate the uncertainty, providing a level of confidence in the result. The procedure is applied to three problems in wind turbine control. First, the below-rated torque control is optimized for 5 power capture. Next, the parameters of a proportional-integral blade pitch controller are optimized to minimize structural loads with a constraint on the maximum generator speed; the procedure is tested on rotors from 40 to 400 m in diameter and compared with the results of a grid search optimization. Finally, we present an algorithm that uses a series of parameter optimizations to tune the lookup table for the minimum pitch setting of the above-rated pitch controller, considering peak loads and power capture. Using experience gained from the applications, we present a generalized design procedure and guidelines 10 for implementing similar automated controller tuning tasks.


Introduction
In this article, we present a data-driven, simulation-based optimization procedure for tuning wind turbine controllers using measures that are directly related to component design. Controller tuning influences the power capture and structural loading on wind turbines, which are directly related to the cost of the wind energy generated. At the same time, different turbine 15 models require different control parameters. As rotor designs are iterated upon and also customized, e.g., with larger towers, tip extensions, or for site-specific turbulence, an updated (and ideally optimized) controller is required for component design and cost specification. Given the aeroelastic turbine model, the algorithm presented in this article automatically finds the optimized parameters of the predefined control architecture, reducing the effort required of the control designer.
The wind turbine control tuning procedure is normally a manual process and often requires expert knowledge of the con- 20 troller and turbine operation. An automated procedure could reduce the design cycle time of a manufacturer's research and development process or aid researchers in other disciplines of wind engineering that require a well-tuned controller without needing to worry about its finer details. Several control parameters are directly related to the performance of the turbine and must be tuned for each design iteration or model update. The simplest method to tune a controller using simulation information is to exhaustively search the design space and then make an educated design choice of the parameter.
A systematic, simulation-based parameter search of the pitch control gains for generator speed control was first published in Hand and Balas (2000). On a single turbine, turbulent simulations were used to sample and visualize the design space against competing design measures: generator speed regulation versus blade pitch actuation. A similar data processing flow was used in Hansen et al. (2005) while the problem was formulated in a numerical optimization framework for structural load reduction; the authors concluded that a good initial guess was only marginally worse than the optimized result and that the effort required 5 to set up the optimization procedure was not profitable for the benefit in structural loading. Shortly thereafter, an adaptive control framework was found to be beneficial for reconciling plant-model mismatches in field testing, especially for control parameters that affect power production (Johnson et al., 2006), where even small benefits are profitable to the operator.
As the wind industry has matured and computational cost has decreased, wind turbine design increasingly relies on simulation of power capture and structural loads for design analysis. As a result, system engineering tools for wind turbine design 10 have been developed and refined, leading to updated efforts in automated controller development, with the aim of deploying tuning methods for many different turbines. One approach is to use a model-based control scheme in order to limit the control tuning effort (Bottasso et al., 2012); these authors also proposed the use of a multi-objective optimization because the value of power capture versus load reduction can vary over the turbine's lifetime. A scalar cost function was presented in Tibaldi et al. (2014) using measures that are directly related to wind turbine component design, like peak and fatigue loads. The cost 15 function included terms for each turbine part, with factors to capture its relative cost to the turbine.
Using measures directly related to the component design, like fatigue and extreme loads in turbulent simulations, is ideal because it most accurately reflects the eventual component design, but these simulations require more detailed and computationally expensive methods to generate the measures. Simulation-based optimization has been used to solve these problems, where solving for the value of the cost function is expensive compared to the optimization procedure. One approach to solv- 20 ing these types of problems is using Response Surface Methodology (RSM) (Fu, 2014). RSM was originally developed for experimental design (Box and Wilson, 1951), but has increasingly been used with simulation information; it works by fitting a cost function to the simulation results, finding the local gradient of the fit, and optimizing the fitted cost function. The question of how to sample the parameter space remains an open question. Samples can be generated using a grid search or random sampling. An example in the wind energy community by Moustakis et al. (2019) samples the parameter space based on a cost 25 function that considers both where the cost is expected to be optimal and also where it is unknown. Our approach to sampling the parameter space is based on stochastic approximation or "zeroth-order optimization," which uses the sampled cost function to estimate the local gradient and then optimizes the function with proven convergence results (Ghadimi and Lan, 2013). In one of the original stochastic approximations methods, Kiefer and Wolfowitz (1952), each dimension of the decision variable is perturbed and a finite difference method is used to approximate the gradient. Multi-point methods were developed for higher 30 dimensional cases, where the decision variable can be perturbed in a "direction" containing multiple dimensions and the directional derivative is used to estimate the gradient (Duchi et al., 2015). If multiple directions are randomly sampled using a normal Gaussian distribution and then averaged to find the directional derivative, it is known as Gaussian smoothing, which has been shown to improve convergence rates (Hajinezhad et al., 2017).
We use a Gaussian smoothing approach to generate samples, estimate the gradient, and identify a (possibly local) minimum point. Then we use the samples to visualize the design space and provide a level of confidence in the result. Previous work in controller optimization usually only provides the cost function and goals of the optimization, whereas this work explicitly details the method for determining the sample simulations and how their results are used to iterate on control designs.
Instead of using a single cost function that attempts to account for all aspects of wind turbine design, our work solves specific 5 wind turbine control problems that are directly related to the cost of energy. First, the optimization procedure is demonstrated on below-rated torque control to increase power capture. Next, the pitch control parameters for above-rated pitch control are optimized to reduce fatigue or extreme loads on the tower or blades with a maximum generator speed constraint. Finally, the minimum pitch setting of the pitch controller is optimized in a series of parameter optimizations aimed at reducing peak blade loads.

10
This article is organized as follows. The optimization algorithm and visualization method are presented in Section 2. Applications of the algorithm for wind turbine control tuning are presented in Section 3, followed by a generalization of the design procedure and guidelines for parameter selection in Section 4. Conclusions are presented in Section 5 and the generalized wind turbine controller that is tuned in Section 3 is described in Appendix A.

15
Superscript notation will be used to index the stage r of the zeroth-order optimization algorithm: e.g., z r . Additionally, F T is the transpose of F . If the power of any value is computed, the base will appear in parentheses: e.g., (γ) r .

Method: Zeroth-Order Optimization
The zeroth-order optimization algorithm uses J random samples near the current iteration to estimate a gradient. Then, a typical first-order method ensues: using the estimated gradient, the descent direction and step size are chosen to produce the 20 next iterate. The process is repeated for a number of stages N stage until convergence is observed. Using the cost function samples, an estimation of the design space is generated to visually verify the results. In each control tuning example presented in this article, the following unconstrained optimization problem is solved: where z ∈ X ⊂ R M is the M -dimensional parameter or vector of control parameters, constrained to be in a set X that is convex 25 and compact; C : R M → R is the cost function, which we assume is differentiable, bounded from below, and its gradient is Lipschitz (Ghadimi and Lan, 2013). However, we only have access to the cost function via samples of potentially noisy simulation results.

Generating Samples
The algorithm begins with an initial guess z 1 . During each stage r = 1, 2, . . . , N stage , the gradient is estimated by randomly sampling the design space. During each stage, sample directions φ r j ∈ R M , j = 1, 2, · · · , J are parameters drawn from a random distribution. In the most general case, a standard normal distribution is used for each dimension and the vectors are normalized to have a unit magnitude; this results in a uniformly random distribution of directions in M -dimensions (Hajinezhad et al., 5 2017). In this article, we focus on 1-and 2-dimensional parameter optimizations and will make changes to the generation of the sample directions φ r j to ensure an even distribution for a small number of samples. A search sample z r j in stage r is generated according to where µ is a smoothing parameter that determines the amount of space over which the parameter space is searched. A large 10 value for µ helps to estimate the value of the cost function over a larger area (Section 2.6), but smaller values of µ tend to result in more accurate convergence.
The number of stages and samples-per-stage must also be chosen by the designer. A large number of samples-per-stage gives the best estimate for the gradient, but requires more simulations. During the development of this work, it was found that a smaller number of samples-per-stage and more stages resulted in better convergence using the same total number of 15 simulations (e.g., in Section 3.2.2).

Gradient Estimation
At each stage r, the cost C(z) is computed via simulation at each search sample and used to estimate the gradient Note that (3) differs from a finite difference method of estimating the gradient, where the factor φ r j would be in the denominator. 20 Because there is uncertainty expected in the computed cost, small perturbations (µφ r j ) and a non-smooth cost function C(z) could result in noisy gradients. The gradient estimator in (3) is referred to as the random directions gradient estimator (Fu, 2014, p. 110), and maintains the convergence criterion when used in a first-order algorithm (Hajinezhad et al., 2017).

Determine Descent Direction
From the estimated gradient, the possible descent direction is computed: where a diagonal matrix D of positive scalars is used to relatively increase d r in the directions where the sensitivities of the cost function to parameter changes are smallest, providing a diagonal approximation to Newton's method and improving convergence rates (Bertsekas, 1999), which leads to the following stage gain: where α is the step size and Proj X {y} := arg min x∈X x − y 2 2 finds the closest point within the parameter bounds X , offset with ρ = µ so that search samples in the next stage can be generated within the parameter bounds. The algorithm described in (2)-(5) is proven to converge to a ball centered around an optimal solution (Hajinezhad et al., 2017). Next, we describe two 5 adjustments to the original algorithm that improve performance when used in the control tuning applications presented in this article.

Adjustment 1: Decreasing Step Size and Line Search
A decreasing step size rule ensures convergence and a line search is used to so that the cost function does not increase in successive iterations. After choosing a base step size α 0 , the cost of test samples are evaluated (through simulation) along the descent direction, where the step size decreases for a number of iterations k ≤ k max . An upper limit k max on the number of step size samples is chosen to cap the number of simulations that may be performed along directions that could increase the cost function. To ensure that the cost 15 function is non-increasing during each iteration, the Armijo rule for step size is used (Boyd and Vandenberghe, 2004): where, for all examples in the following section, σ = 0.05 is chosen, a conservative value that only requires a small decrease in the cost function.

20
Once an adequate step size is found, the parameter z is updated using which is the z r a,k in (6) with the first k that satisfies (8); since this value has already been computed, the simulation for determining C(z r ) for r > 1 does not need to be performed.
If the maximum number of step size simulations (k = k max ) are performed and the step size rule in (8) is not satisfied, the 25 next iteration of the parameter z is chosen as where C(r) is the enumeration of the cost function at all stage samples z r , all search samples z r j , and all step size samples z r a,k within the parameter bounds defined by X , up until the current stage r: where C(r) ∈ R nsamp has n samp ≤ N stage × J × k max elements, r = {1, . . . , r}, j = {1, . . . , J}, and k = {1, . . . , k max }. As before, with the step size sample, since the value of the cost at this point has already been computed, it is unnecessary to compute 5 it again for r > 1. Since the resetting parameter update in (10) results in a sequence of C(z r ) that is non-increasing, the convergence properties of the original algorithm are maintained; the same argument applies to Adjustment 1 in Section 2.4.
The solution z soln of the zeroth-order optimization is determined by the updated parameter of the final stage:

10
To provide confidence in the result of the zeroth-order optimization, we visualize the cost, and the measures associated with it, over the parameter space. If the minimum of the zeroth-order parameter optimization matches that of the visualization, the user can be confident in the result. The visualization method also provides a quantitative measure of the uncertainty of the estimated cost over the parameter space.
To estimate the cost and its variance over the parameter space, we use ordinary kriging. Kriging was originally developed 15 for mining applications, where sparsely sampled information over a geographical space was used to estimate the quantity over the whole area. More recent applications of kriging include engineering design and computer experiments.
Kriging, or Gaussian process regression, is a method of interpolation that incorporates uncertainty in the area between samples. Using all the observed data from the zeroth-order parameter search at stage r, C(r) from (11), ensuring there are no repeated values, the estimated cost at z is where the first term in (13) is the generalized least squares estimatê Since we are using ordinary kriging, which assumes a constant mean across the parameter space, the regression basis function

25
where Z is the enumeration of all sample points like in (11). The correlation matrix Ψ represents the influence that nearby samples have on each other; it has the form and is made up of scalar Gaussian correlation functions where ν i is the distance at which the influence is e −1 or 37% in the i th dimension (Martin and Simpson, 2008). The second term in (13) interpolates or "pulls" the estimate towards the observed values using the correlation vector

5
The mean squared error, or variance, of the cost at z is determined by where is the process variance. As the unobserved point z moves away from the observed samples, the second term in (19) approaches zero and the variance approaches σ 2 proc . The correlation function parameters ν i are estimated using a maximum likelihood estimator to be consistent with the observed data. To perform this optimization, we use the ooDACE toolbox to fit the correlation function and kriging model (Couckuyt et al., 2013). Problems can arise when using kriging for simulation-based optimization because of ill-conditioned correlation matrices (Booker et al., 1999). When samples cluster near the optimal solution, closely spaced samples with different 15 values can result in very small values of ν i and ill-conditioned correlation matrices. One solution is to add a constant to the diagonal of Ψ (Sasena, 2002). We implement this using "stochastic kriging", where the samples are assumed to have uncertainty and is equivalent adding their variance to the diagonal of Ψ (Couckuyt et al., 2013). Additionally, the lower and upper bounds on the values of ν i depend on the minimum and maximum spacing of the distance between samples, respectively (Martin and Simpson, 2008). 20

Settling Function
To measure the number of stages the optimization procedure requires to find the minimum of the cost function, we define the settling function which is a linear transformation that represents the fraction of change in cost at each stage C(z r ), compared to the overall 25 change in cost function. The initial cost C(z 1 ) is mapped to s(1) = 1 and the cost of the solution C(z soln ) is mapped to s(N stages + 1) = 0. Often, we perform more stages than is necessary and use this settling function to determine how many stages are required to achieve some percentage of the change in cost function.

Applications in Wind Turbine Control Tuning
In this section, we present three examples of using zeroth-order parameter optimization to tune the parameters of wind turbine controllers. As an initial demonstration, we optimize a one-dimensional parameter to maximize power capture through torque control in below-rated operation. Next, we present the motivating example for this work, a two-dimensional parameter optimization for a standard pitch controller, with the goal of regulating generator speed so that loads are minimized, subject to a 5 constraint on the maximum generator speed. Finally, we demonstrate how a series of one-dimensional parameter optimizations can be used to determine the minimum pitch setting of the pitch controller for controlling peak blade loads.

Optimal Torque Control Gain
In below-rated (Region II) operation, the generator torque is typically controlled using τ g = k opt ω 2 g , which controls the rotor speed to its optimal tip speed ratio, where ω g is the generator speed. The optimal gain k opt depends on a number of aerodynamic 10 properties (Johnson et al., 2006): where ρ air is the air density, R is the rotor radius, C P,max is the maximum power coefficient, λ opt is the optimal tip speed ratio, and G is the gearbox ratio. We add a multiplicative factor to account for uncertainties in the aerodynamic properties and to allow the gain to be increased or decreased, resulting in the control law In practice, a value other than k fact = 1 is found to be optimal for a realistic turbulent wind input.
The goal of this optimization procedure is to find the gain k fact that results in the greatest energy capture. To maintain the form of a minimization problem, we solve 20 where the cost function C(z) = −P (z) is the negative of the mean generator power, and the optimization parameter z = k fact .
At each stage, J = 2 samples are simulated to compute the cost function and estimate the gradient. In this one-dimensional problem, no dimensional scaling is required, thus D = 1. With J = 2, we set the search direction to The optimal value of k fact is expected to be between 0.3 and 1.7, so these values are set as hard bounds. A search range of µ = 0.05 is set to adequately search the space and estimate meaningful gradients. For tuning controllers of turbines with different power ratings, the base step size is scaled with the inverse of the initial simulation's average powerP ( -1780 Figure 1. The first iteration of the one-dimensional parameter tuning for the optimal torque control gain. Starting with the initial parameter z 1 , random samples z 1 1 and z 1 2 are generated to estimated the gradient. Note that µ = 0.1 in this figure for clarity. Test samples z 1 a,1 and z 1 a,2 are evaluated in the gradient direction until the cost decreases and the next stage parameter z 2 is determined. The estimated cost and uncertainty are found using (13) and (19) size should be reduced to maintain the same rate of descent. Note that a positive step size is required, even though the cost is negative, and a maximum of three step size sample simulations are performed (k max = 3). A summary of the parameters used in the torque control parameter optimization are shown in Table 1 and an illustration of the first iteration is shown in Fig. 1.
The parameter optimization was performed on the NREL-5MW reference turbine with the standard lookup-table-based torque controller in Jonkman et al. (2009). The algorithm finds the optimal k fact and converges in 7 stages (Fig. 2b). The 5 full procedure, with 7 stages, performs 31 simulations in total, which includes step size samples and the initial guess; the average power is increased by 0.67%, compared to k fact = 1. The use of turbulent simulations contributes noise to the signal that determines the cost function, which is apparent by the non-smooth behavior of the cost samples with respect to the gain factor parameter in Fig. 2a. However, the algorithm appears to be robust to these uncertainties.

10
In this section, we optimize the parameters of an above-rated blade pitch controller for load reduction and generator speed regulation. Each time a new rotor is designed, the pitch controller should be tuned so that the structural loads can be computed to design the various hardware components of the wind turbine. As will be seen, the pitch controller affects the loads that drive turbine design. The procedure for tuning the gain-scheduled proportional-integral (PI) controller is detailed in Appendix A.
First, steady-state simulations at above-rated wind speeds are used to determine the turbine operating points and aerodynamic parameters at various pitch angles, which parameterizes the gain scheduling. The final, and most involved, step is to tune the natural frequency (ω reg ) and damping ratio (ζ reg ) of the "regulator mode," which represents the generator speed response to a 5 disturbance (wind) input. The following optimization procedure aims to find an optimal set of parameters (ω reg , ζ reg ) so that structural loads are minimized and adequate generator regulation is maintained.
In general, reducing the bandwidth of the pitch controller by choosing a lower natural frequency ω reg reduces the structural loading, which we denote with M in the following. However, controllers with lower natural frequencies allow greater generator speed transients, which is acceptable up to some maximum constraint. If the generator speed exceeds some threshold ω g,hard , 10 most turbines enter into a shutdown procedure to avoid further damage, which reduces the availability of the turbine and the net annual energy production; this must be avoided.
First, we reformulate the constrained optimization 15 as an unconstrained problem to use the algorithm described in Section 2. The cost function is augmented so that the optimization problem has the form in (1), namely Figure 2. One-dimensional parameter tuning for the optimal torque control gain of the 5MW reference turbine in Class A turbulence using the negative mean generator power (−P ) as the cost function. In below-rated control, the generator speed (ωg) is controlled using the generator torque (τg). The estimated cost and uncertainty are found using (13) and (19), respectively, where σ(z) = MSE[Ĉ(z)]. The settling function s(r) is defined in (21). where B(z) is a boundary function that penalizes samples that have a maximum generator speed that exceeds some "soft" generator speed constraint ω g,soft , A quadratic boundary function is used so that the cost is differentiable, even when an non-feasible solution is sampled. The factor k B is chosen to provide a sufficient penalty on high generator speeds, but not so high that exceedingly large gradients 5 are determined from the gradient estimation in (3), which can be problematic for the algorithm.
ensures that the barrier function B(z) = c max when the maximum generator speed equals the hard generator speed constraint ω g,hard .
To adapt the cost function to different rotors and load measures, where M (z 1 ) is the load measure of the initial stage sample and the factor 1 12 based on experience gained using the algorithm with simulation results. A smaller factor does not penalize maximum generator speeds enough, leading to possibly infeasible solutions that violate (27). Factors greater than 1 12 were found to create large gradients that lead the iterates away from the 5 constraint boundary; typically, the optimal solution is found close to that boundary.
In most cases, the initial parameter set z 1 is chosen to be near values that were tuned manually, but offset (usually with a higher natural frequency) to allow the algorithm to converge properly. If the parameters were not previously tuned, the values suggested in the NREL-5MW reference manual (Jonkman et al., 2009) are chosen as the initial parameter set.
where Step Size Sample Figure 3. First iteration of the zeroth-order parameter optimization algorithm for the two-dimensional pitch control tuning. Random samples z 1 j , j = 1...4, are generated near the initial guess z 1 to estimate the gradient. Note that µ = 0.15 in this figure for clarity. The sample z 1 a,k , k = 1 is tested along the descent direction until a sample with a decreasing cost function is found, which becomes the next guess z 2 for the pitch control parameter. is used to evenly space the samples in the two dimensions (ω reg , ζ reg ), and ψ 0 is randomly generated according to ψ 0 ∼ U (0, 2π), resulting in the generated samples z 1 j in Fig. 3. The cost function is more sensitive to changes in ω reg than it is to changes in ζ reg , so D = diag([0.25, 1]) was chosen to relatively increase the search direction in the ζ reg dimension. Hard bounds on (ω reg , ζ reg ) are chosen to avoid unstable parameter sets. The base step size α 0 scales with the inverse of the initial load M (z 1 ) so that the algorithm works for turbine models of 5 different sizes, with initial loads specified in Table 3.
The algorithm was tested on a range of rotor models with different wind classes and load measures. First, the pitch control parameters of the NREL-5MW reference model are optimized, starting from the parameters specified by the NREL-5MW reference manual (Jonkman et al., 2009), and using the tower base moment (fore-aft) damage equivalent load (DEL) as the load measure. For three different wind classes (1A, 1B, and 1C), with different turbulence levels (A-highest and C-lowest), the 10 parameters were optimized and an example is shown in Fig. 4. In Fig. 5, the estimated cost (a), load (c), and maximum generator speed (e) across the parameter space is shown, along with the estimated uncertainty in (b), (d), and (f). The lowest turbulence level (Class 1C) has the lowest optimal natural frequency ω reg , since the reduced turbulence results in lower generator speed Table 3. Summary of test cases and results from a single zeroth-order parameter tuning for speed regulation control using the parameters in Table 2 with Nstage = 7 and J = 3.

Test Case
Load transients. In each case of the NREL-5MW reference model, the optimized parameters have a lower natural frequency and higher damping ratio than the original setting.
The optimization procedure was also performed for each rotor design in the Segmented Ultralight Morphing Rotor (SUMR) project (Loth et al., 2016); for these rotors, the design driving load case for blade design was the maximum blade root bending moment. In practice, the combined edgewise and flapwise load is used for design, but since the edgewise load is deterministic, 5 we used the maximum flapwise load as the load measure for optimization, which is a good indicator of maximum combined loads. The SUMR rotor radii range in size from 22 m to 240 m (Table B1), and the same optimization parameters (Table 2) were used for each optimization procedure, albeit with different initial conditions and loads, which adapt the cost function and step size accordingly. The optimization procedure generally settles on a lower natural frequency and higher damping ratio than the initial guesses (Table 3), which has the effect of producing the lowest control bandwidth (for reducing loads) but only to 10 the point so that the generator speed constraint is not exceeded.

Discussion of Low Natural Frequency, High Damping Ratio Regulator Mode
For most of the rotors in this section, the baseline rotor speed proportional-integral (PI) control parameters are optimized to have a regulator mode with a lower natural frequency and higher damping ratio than the initial guess (Table 3). To understand why this is the case, we must consider the cost function of the optimization. In this tuning procedure, our goal is to minimize 15 structural loading with a constraint on the maximum generator speed. From Figs. 4(c) and (d), we see that the collective blade pitch angle θ c has a large effect on the thrust-based structural loading; this includes tower fatigue and blade peak loads. Fig. 4 shows that, in most cases, pitch and loads mirror each other: when pitch increases, loads decrease, and vice versa. A good . Results of using the zeroth-order optimization for tuning the pitch control regulator mode (natural frequency ωreg and damping ratio ζreg) of the 5MW reference turbine in Class B turbulence using the tower base fore-aft (mty) DEL as the load measure, which is indirectly controlled via θc, the collective blade pitch control; θc is primarily responsible for regulating the generator speed ωg. The settling function s(r) is defined in (21).
example occurs between 230-250 seconds in the timeseries of Fig. 4. The direct effect of the blade pitch signal on the load signal is the primary reason for the optimal PI gains (or regulator mode parameters) found in this section.
PI gains derived from a regulator mode with a low natural frequency result in less pitch actuation, thus less change in the load. Higher natural frequencies result in faster and more frequent pitch control variations, which translate to the structural load signals and increase fatigue loading. A controller with a high natural frequency can also be problematic when the wind 5 speed decreases. Because the underlying controller is trying to regulate the generator speed, the pitch will decrease during a wind lull to maintain the generator speed at its rated value, which can also lead to large peak loads, especially when an increase in wind speed follows.
High damping ratios are also found to be optimal when using the described cost function. A generator speed response and pitch control response with a high damping ratio lacks any overshoot and secondary transients when the system is subjected the integral gain is reduced because it causes transients in the blade pitch and structural loads. Since our primary goal is not to regulate the generator speed to some fixed set point, but instead constrain its maximum value, integral control is less important.
If the cost function included a term related to regulating the generator speed to its rated set point, using e.g., mean squared error compared to the rated generator speed, the optimal integral gains might be larger. However, we believe that a controller that constrains extreme events and maximizes power capture better reflects the overall wind turbine design goals.

Results: Comparison with Grid Searching
To quantify the performance of the zeroth-order optimization (ZOO), we compare it, in terms of the number of simulations and optimal cost, with a grid search optimization for tuning the pitch controller of the NREL-5MW reference turbine in Class 1A turbulence. The same area spanned by the hard bounds of the zeroth-order method (Table 2)  The ZOO procedure outlined in Section 2 is performed three times for each of the following cases. Each procedure uses randomly generated samples that should result in different optimal parameters for each instance. We use four different initial conditions, distributed so that one is in each of the four quadrants spanning the bounded parameter space. The starting 25 location in each quadrant was generated randomly, except for the bottom, right quadrant in Fig. 6(d), which is the suggested parameter set, z sug = (ω reg , ζ reg ) = (0.6, 0.7), defined in the NREL-5MW reference manual (Jonkman et al., 2009). The ZOO was performed with N stage = 7 stages using J = 3, 4, and 10 samples-per-stage and also with N stage = 12 using J = 3 samplesper-stage. Theoretical results suggest that better gradient estimates (from a larger number of samples-per-stage J) result in convergence within a ball with a smaller radius centered around an optimal solution (Hajinezhad et al., 2017). The optimal 30 parameter set z opt found in each instance of the ZOO is shown in Figs. 6(a)-(d). We compare the cost of the ZOO method with the grid search optimization in terms of the defined cost function in (28)-(31) (Fig. 6). The results are normalized to the cost function found using the suggested parameter set z sug .  Table 2 and normalized to the cost when z 1 = z sug . Fig. (e) shows the optimal solutions of three different grid search resolutions. Fig. (f) compares the optimal costs, normalized to the initial cost when z 1 = z sug , compared with the number of simulations used to find the result.
Compared to z sug , all of the methods result in a 20% to 26% reduction in the cost function, with about a 1% standard deviation in the results. For fewer than 80 simulations, ZOO performs better than the grid search benchmark in almost every case. The optimal cost decreases with increasing J and total number of simulations on average, but not necessarily always.
However, in terms of efficiency on a per-simulation basis, J = 3 (blue in Fig. 6) achieves similar results to those found using J = 10 (yellow in Fig. 6). Additional stages (N stage = 12) with J = 3 (purple in Fig. 6) decrease the cost function further; the 5 optimal cost of this case is the best we tested in terms of the number of simulations and cost reduction. By a small margin (1-2%), using the initial parameter z 1 = z sug in the bottom, right quadrant in Fig. 6(d) performed better than the other z 1 locations shown in Figs. 6(a)-(c). In Figs. 6(a)-(d), we see that if we use a z 1 that is closer to the area where z opt is found, there is less variation in z opt ; there is also less variation the minimum cost C(z opt ).

10
In this final example, a series of 1-dimensional parameter optimizations will be used to tune the minimum pitch setting of the above-rated pitch controller described in Appendix A and shown in Fig. A1. Increasing the minimum pitch setting can reduce the peak blade and tower loads. However, it also slightly reduces power capture. To represent this trade-off, the cost function will be minimized, quantifying the relative importance κ between reducing peak loads and reducing power capture, where 15 z = θ min (u) is the minimum pitch setting at wind speed u, M is the peak blade load, andP is the mean generator power of a turbulent simulation with mean wind speed u. A value of κ = 0.01 is used, which represents a 10% reduction in peak load being roughly equal to a 0.1% decrease in power capture; this parameter can be tuned by the control designer based on the goals of the design, however, the feasibility of the optimization problem should be verified.
During the load analysis of a control design, a number (N seeds ) of randomly generated turbulent seeds are used to simulate 20 the turbine across wind speeds to identify peak loads on the various components. Often, peak loads on the blade and tower occur in situations where there is first a lull in the wind speed, which causes the pitch angle to decrease, followed by an increase in wind speed. If the pitch controller does not react in time, the combination of high wind speeds and low pitch angles causes a large thrust on the rotor. However, if the minimum allowable pitch is increased, the peak loads resulting from wind speed lulls can be reduced. An example is shown in Fig. 7(c) and (d) at 200 seconds. While these events are fairly common in simulation, 25 not all produce equal peak loads; the minimum pitch setting is optimized for the worst case simulation.
A wind speed estimate, which can be found using, e.g., one of the methods in Soltani et al. (2013), is used to determine the minimum pitch setting of the above-rated pitch controller. A smooth lookup table is generated using a cubic spline interpolation and a table of three minimum blade pitch settings with breakpoints at above-rated wind speeds, in addition to a breakpoint in below-rated wind speeds and one above the cut-out wind speed. The minimum pitch setting is non-decreasing with respect to 30 wind speed. An example is shown in Fig. 8. The minimum pitch at each above-rated breakpoint will be optimized using the zeroth-order optimization procedure previously described.    Table   1: Initialize: Start with an initial guess for the pitch lookup table: θmin(ui), e.g. in Table 4. Set starting maximum load Mu i (z 0 ) = −∞, for all actively optimized breakpoints i = {1, . . . , Nbp}, where Nbp is the number of wind speed breakpoints.
2: for Each breakpoint ui ∈ U , i = {1, . . . , Nbp} do 3: Step 1: Simulate Nseeds random turbulent seeds with a mean wind speed of ui and the initial lookup table. Find the worst-case seed nmax with the maximum load Mu i (z 0 ) over all seeds at breakpoint ui; the starting powerPu i (z 0 ) is the mean generator power of this simulation.

11:
The initial condition to the optimization procedure z 1 is set to enable an adequate search of the parameter space, i.e. The zeroth-order optimization procedure described in Section 2 is used to find where z = θmin(ui) is the optimization parameter and the cost function C(z) is defined in (34).

21:
Step 3: (Optional) Re-check the random turbulent seeds n = {1, . . . , Nseeds} using the new optimal pitch table and compute maximum load of each Mr[n], finding the new worst-case seed nr.

22:
if There is a new worst case: nr = nmax then 23:

Return to
Step 2 with z 0 = θmin(ui) from (35). The algorithm (presented in Algorithm 1) is initialized by choosing an initial lookup table for the minimum blade pitch, Table 4. In Step 1, N seeds random seeds are simulated to find the worst-case seed n max with the maximum load at that breakpoint M ui (z 0 ). The optimization procedure in Step 2 is only performed if the current breakpoint has a problematic peak load: one that is greater than loads seen at the other mean wind speeds (Line 4 of Algorithm 1). The starting loads are initialized to M ui (z 0 ) = −∞ so that at least the first active break point is optimized. At the low wind speed breakpoint 5 u 0 = 5 ms −1 , the minimum pitch angle is set to the aerodynamically optimal angle θ fine ; and at the high wind speed breakpoint u Nbp+1 = 50 ms −1 , the optimal minimum pitch angle is set to the feather pitch angle. Neither u 0 nor u Nbp+1 is an actively optimized breakpoint; they are, however, used as lower and upper bounds for the first u 1 and last u Nbp active breakpoints. In Step 2, the initial guess that is used by the optimization procedure (z 1 ) is offset from the lower and upper bounds by the sample search area µ (lines 11 -19 of Algorithm 1).
Step 3 is optionally performed to re-check the other random turbulent 10 seeds using the new, optimized minimum pitch lookup table. In some cases, a different random seed will have a peak load that exceeds that of the wind input that was originally optimized; if this is the case, Step 2 is repeated up to three times, using the previously optimized pitch angle as a lower bound. Algorithm 1 is used to optimize the minimum pitch table in Fig. 8 using the parameters in Table 4 (19), associated with minimum pitch setting optimization: cost relative to initial guess C/C(z 0 ), load relative to initial guess M/M (z 0 ), and power relative to initial guess P/P (z 0 ). minimum pitch setting results in a 10.9% decrease in the tower load, a 8.51% decrease in the maximum blade load, and only a 0.03% decrease in the average generator power. These results hold for the full set of DLC 1.2 simulations (International Electrotechnical Commission, 2005).

Generalized Design Procedure
In this section, we present guidelines for performing similar optimization procedures. Experience gained in problem formula-5 tion, the usefulness of performing a preliminary "offline" analysis, and determining the parameters of the solver is shared.

Determine Problem and Goals
Using the zeroth-order optimization procedure described in this article for determining control parameters through simulation requires effort in setting up the problem and developing software. In order to justify the up-front effort, the task would ideally be one that is repeated for many different rotor models, like the examples in Section 3. A task that is repeatedly performed also 10 allows the designer to gain a deeper understanding for how control inputs (gains, parameters) affect simulation outputs of the wind turbine.
It is important to determine how the turbine should be simulated in order to generate the measures that are used for the optimization; they should highlight some problematic or indicative case that the control solution is trying to solve. For example, when optimizing the torque control gain for below-rated operation in Section 3.1, a below-rated wind field should be used and the power should be used as the cost function. The cost function should reflect the goals of wind turbine design (e.g. increasing power capture or decreasing loads), have a basis in the reality of wind turbine operation (e.g. using gains that provide a stable 5 control input), and also have a feasible solution. The optimization procedure presented in this article is only useful if the cost function represents the design goal, is represented well by the simulation information, and is simulated in a realistic environment.

Perform Preliminary Analysis
It is often helpful to perform a preliminary "offline" analysis to fine tune the cost function and optimization parameters. In 10 an offline analysis, a grid search of the optimization parameter is used to estimate the output space of the simulations (e.g., maximum generator speed and blade loads), using a linear or quadratic estimate of the cost function. To clarify, the results in Section 3 are of "online" optimizations, where actual simulation data is used to compute the cost function and perform the optimization procedure. While one of the goals in developing this optimization procedure is to eliminate the large number of simulations associated with grid searching, a grid search does help fine-tune the parameters of future, similar control tuning 15 procedures that use zeroth-order optimization. If multiple measures are used in the cost function (e.g., in the pitch control tuning of Section 3.2), it is important to determine whether the cost function has a minimum within the parameter bounds.
Otherwise, the cost function must be further refined. A preliminary offline analysis can be used to more quickly determine the optimization parameters (e.g., step size or smoothing parameter) that converge in the fewest number of simulations to some "ground truth" determined from the estimated cost function determined using the initial grid search.

Set Simulation Parameters
As the examples of Section 3 illustrate, each optimization procedure requires slightly different parameters. While the parameters presented in Tables 1, 2, and 4 may not necessarily be the best ones, they have been fine-tuned through extensive offline testing and evaluating "online" tests that use actual simulation data as the measures used in the cost function. The goal of this section is to provide general guidelines and rules-of-thumb, where possible, for choosing the parameters of the optimization 25 procedure.

Sample search range and Newton's approximation
The smoothing parameter µ should be based on the optimization parameter z. The sample z + φµ should result in an adequate change to the cost function so that "good" gradients can be used for the descent algorithm; note that the magnitude of the and z(i) is the ith element of the parameter set z. The elements D i of D can be determined from offline simulation analysis, where (36) can be estimated by finding a quadratic regression of the cost space. Alternatively, D i can be manually tuned, i.e., 10 if the dimension i is not being adequately searched, D i should be increased.
For example, in the pitch controller tuning (Section 3.2), the cost function (shown in Fig. 3) is less sensitive to the damping ratio z(2) = ζ reg than it is to the natural frequency z(1) = ω reg , so we use D = diag ([0.25, 1]). If only the first-order (estimated) information were used and the direction of the maximum gradient were exactly followed, the solution would zigzag in the ω reg direction and take longer to converge to the optimal solution in both the ω reg and ζ reg directions.

Step Size
The initial step size α 0 is an important parameter to test offline and also fine tune when using online simulations to compute the gradient. It was found that for all optimization examples in this article, the product of the initial step size and the norm of the gradient should be on the order of a magnitude of 1, namely ∂C ∂z · α 0 ≈ 0.5 to 1.5. 20 The parameters used in the Armijo step size rule were the same for all examples. Conservative values were used, which essentially only ensures a non-increasing cost function without a requirement on the rate of descent of the cost function.

Stages and Samples-per-Stage
Enough stages should be evaluated so that the cost function converges to some value; this is typically learned through offline analysis or by trial-and-error in online tests. For example, when analyzing the pitch control tuning results of Fig. 4, the results 25 suggest that the procedure could be performed with fewer stages, whereas it seems more stages could be used in the minimum pitch control tuning of Fig. 7. In general, it is found that fewer samples-per-stage (along with more stages) result in the fastest convergence with respect to the total number of simulations.

Parameter Bounds and Initial Guess
Hard constraints on the parameter should reflect the set of feasible parameters for the control task being optimized. However, the bounds should not be so small as to restrict the space and possibly miss non-obvious control solutions. The initial guess provided to the algorithm should also allow for the space to be adequately searched.

5
After performing initial, offline analysis and running the zeroth-order optimization algorithm using online simulation data, the whole procedure should be evaluated with the following questions: 1. Does the algorithm converge to a feasible solution?
2. Does the optimized parameter appear to be near the minimum of the visualized cost over the parameter space?
An affirmative answer to both of these questions should provide confidence in the optimized result.

Conclusions
In this article, we developed a data-driven approach for optimizing controller parameters using simulation results. By using a zeroth-order optimization algorithm, random samples are generated near an initial guess, which are used to compute the local gradient. A standard gradient descent method ensues, where a step size rule is used to ensure convergence and attempt to decrease in the cost function before the next guess is chosen and the process is repeated. We also use ordinary kriging to 15 visualize the design space and its uncertainty to provide a level of confidence in the optimized result.
The zeroth-order algorithm was applied to three different applications in wind turbine control. To demonstrate the process on a one-dimensional parameter optimization, the torque control gain was tuned to optimize power capture in below-rated operation. The baseline pitch controller parameters were tuned in a two-dimensional optimization problem with the goal of minimizing structural loads and includes a constraint on the maximum generator speed. Using an adaptable cost function and 20 step size, the algorithm was able to tune the baseline rotor speed control for rotors ranging from 40 to 400 meters in diameter.
We compare the results, in terms of accuracy, convergence, and number of function evaluations (simulations) for different optimization parameters and against the standard grid search method. In a series of one-dimensional parameter optimizations, we also determined the settings of a lookup table for the minimum pitch limit of the pitch controller, reflecting the overall blade design process and system-level goals.

25
Since each optimization procedure depends on the specific control problem, we have provided a set of guidelines based on the experience gained during this study for developing future, similar optimization procedures. The methods presented in this article automate a usually manual process, reduce designer effort, and require fewer simulations compared with grid searching methods. These methods can be used for repeatable control tuning processes that are required for continually updating designs that must be evaluated in simulation using a well-functioning controller.  Figure A1. Proportional-integral control with anti-windup scheme used for above-rated control. The difference between the pitch control setpoint ωrat and the generator speed ωg is multiplied with the gain-correction factor GK(θavg), which is a function of current collective blade pitch θc. The proportional-integral gains at zero pitch, kP,0 and kI,0, are derived in (A15) and (A16). The pitch command θcmd is saturated to some minimum pitch setting θmin and the output θsat is the input to the blade pitch actuator.

Appendix A: Generalized Baseline Rotor Speed Pitch Controller
The pitch controller described in Sections 3.2 and 3.3 is based on the controller presented in the NREL-5MW reference manual (Jonkman et al., 2009); this standard control scheme is widely used as a reference for comparing control schemes and evaluating different aspects of turbine design. As shown in Fig. A1, the controller is a gain-scheduled proportional-integral (PI) controller. The PI control architecture allows the generator speed dynamics to be represented as a 2 nd -order system. Since the 5 sensitivity of aerodynamic torque to blade pitch changes with the blade pitch, the PI gains are scheduled on the blade pitch.
The pitch command is saturated to some minimum setting to control power or reduce blade loads; thus, an anti-windup scheme is necessary.

A1 Regulator Mode and PI Gains
To derive the PI gains for a generic rotor model, a rigid model of the drivetrain is used: where ω g is the generator speed, J tot is the total drivetrain inertia, including the rotor and generator components, G is the gearbox ratio between the low-speed rotor shaft and the high-speed generator shaft, τ a is the aerodynamic rotor torque caused by the wind and controlled via blade pitch, and τ g is the generator torque, which is a control input. The rotor torque is nonlinearly dependent on the blade pitch θ. The linearization with respect to a perturbation in blade pitch δθ is 15 δω g = G J tot ∂τ a ∂θ δθ, where the differential torque δτ g = 0 because the torque is constant in above-rated operation. The sensitivity of the aerodynamic torque to rotor speed ( ∂τa ∂ω ) is omitted since it has a much smaller magnitude than ∂τa ∂θ . In terms of power P : where ω rat is the rated generator speed, which is a constant operating point since it is the desired set-point of the controller. The proportional-integral control is 5 δθ = k P δω g + k I δω g dt, where k P and k I are the proportional and integral control gains, respectively, and δω g represents a generator speed perturbation.
By defining a new state,φ = δω g , and combining equations (A2), (A3), and (A4), the generator speed dynamics are which can be represented by a second-order dynamic system in the form of 10 M regφ + D regφ + K reg φ = 0, where M reg , D reg , and K reg are the mass, damping, and stiffness of the "regulator mode," respectively. Alternatively, the regulator mode can be represented by its natural frequency ω reg and damping ratio ζ reg , defined by ω reg = K reg M reg and ζ reg = D reg 2ω reg M reg .
By defining the desired properties of the generator speed dynamics, ω reg and ζ reg , the proportional and integral gains are 15 defined as follows: k P = 2J tot ω rat ω reg ζ reg which we will define as S(θ) because it is a function of the blade pitch. Simulations in FAST are used to determine the pitch operating points at various above-rated wind speeds. The operating points are used in FAST linearizations, with all the degrees-of-freedom disabled, providing the input-output sensitivity from pitch to power. The results of performing this sensitivity analysis for several rotors is shown in Fig. A2.
Because of the near-linear relationship with blade pitch θ, the sensitivity can be parameterized by where S(0) is the sensitivity at θ = 0 degrees and θ k is the pitch angle at which the sensitivity doubles: 5 S(θ k ) = 2S(0).
From simulation results like those in Fig. A2, the parameters in (A11) can be estimated; they are used to define the gaincorrection factor GK(θ) = 1 1 + θ θ k (A13) and the final, gain-scheduled PI gains: