CFD-based curved tip shape design for wind turbine blades

. This work presents a high-ﬁdelity shape optimization framework based on computational ﬂuid dynamics (CFD). The presented work is the ﬁrst comprehensive curved tip shape study of a wind turbine rotor to date using a direct CFD-based approach. Preceding the study is a thorough literature survey particularly focused on wind turbine blade tips in order to place the present work in its context. Then follows a comprehensive analysis to quantify mesh dependency and to present needed mesh modiﬁcations ensuring a deep convergence of the ﬂow ﬁeld at each design iteration. The presented modiﬁcations allow the framework to produce up to six-digit-accurate ﬁnite difference gradients which are veriﬁed using the machine-accurate Complex-Step method. The accurate gradients result in a tightly converged design optimization problem in which the studied problem is to maximize power using 12 design variables while satisfying constraints on geometry, as well as on the bending moment at 90 % blade length. The optimized shape has about 1% r/R blade extension, 2% r/R ﬂapwise displacement, and slightly below 2% r/R edgewise displacement resulting in a 1.12 % increase in power. Importantly, the inboard part of the tip is de-loaded using twist and chord design variables as the blade is extended, ensuring that the baseline steady-state loads are not exceeded. For both analysis and optimization an industrial-scale mesh resolution of above 14 × 10 6 cells is used, which underlines the maturity of the framework.


Introduction
The wind energy industry has for decades focused on minimizing the levelized cost of energy (LCoE) (Ning et al., 2014;Kalken and Ceyhan, 2017;Matheswaran et al., 2019). Innovative design has helped to keep the mass increase low as rotor size has increased to generate more annual energy production (AEP) which in turn lowers the LCoE. While manufacturing new wind turbines with diameters now exceeding 200 m helps the industry meet the present day energy demand, one should also look to already installed wind turbines which may hold promise for an increase in AEP as well. A first approach could be to completely refurbish these older wind turbines with new rotors, although the associated costs are considerable. A less invasive operation would be to only modify the very tip of the blades with a sleeve-like solution. Indeed, using this approach design engineers have successfully met the challenge of avoiding to compromise the integrity of the original rotor and at the same time found an increase in power of up to several percent; thus, a further investigation is warranted Matheswaran et al., 2019;Barlas et al., 2021b).
In light of the above, this study investigates the promise of re-designing existing wind turbine blades by optimizing the shape of the blade tip. To maintain structural feasibility the performance gain will be brought about without exceed-ing the load envelope. The present work should be placed in the context of the SmartTip project which in several efforts (e.g., Zahle et al., 2018;Li et al., 2018;Barlas et al., 2021b) studied innovative blade tip modeling and design, in particular how blade tips could yield a load-neutral performance gain. Zahle et al. (2018) find a 2.6 % increase in power using 12 design variables with an aerodynamic surrogate model, whereas Barlas et al. (2021b) report an up to 6 % increase in AEP using 11 design variables for an aeroelastic surrogate approach. Given that the above-mentioned works vary in included disciplines and model fidelities one should take care when comparing the final results. However, since the present study uses the exact same overall design problem which was addressed in the surrogate-based design study by Zahle et al. (2018), a comparison across flow model fidelities is given later on. The present study should in this context be seen as a contribution considering aerodynamics only but with a direct high-fidelity modeling approach using a computational fluid dynamics (CFD) solver. Thus, the aim for the present study is to test the developed CFD-based design framework using the finite difference method in an industrial-scale setting for the first time to quantify how viable this approach is and if its limitations are outweighed by the ease of implementation associated with the method. Reasons for choosing a CFD-based approach with a gradient-based optimization algorithm are given below, after which the content of the remaining paper is outlined.
Lower-fidelity methods based on the blade element momentum (BEM) theory rely on engineering assumptions and can as a result handle time-dependent simulations efficiently, which explains why they are relied on heavily in the wind energy community. High-fidelity CFD-based approaches, on the other hand, allow for investigations that do not depend on underlying engineering assumptions by solving the full Reynolds-averaged Navier-Stokes (RANS) equations on a geometrically resolved rotor configuration. Directly resolving the geometrical features at the blade tip ensures a correct modeling of the highly complex 3-D flow phenomena at the blade tip. However, also disadvantages such as an increase in computation time, as well as in implementation effort, are to be expected. Furthermore, as high-fidelity models typically are used in steady state, it is currently difficult to arrive at realistic design-driving load cases with this approach. For these reasons, one should favor a complementary use of the two approaches and only use high-fidelity methods if needed. Given that the area in question in the present study, i.e., the blade tip, is indeed an area of difficulty for lower-fidelity approaches, the use of a CFD-based model is warranted despite the increase in computation time.
To accurately describe the blade tip it was in the present study necessary to use 12 design variables. This is a considerable amount of design variables compared to many of the other tip studies mentioned below which typically use less than five. Due to the increased size of the parameter space a gradient-based optimization algorithm was preferred. More-over, a proper step size was chosen through a gradient verification study using the Complex-Step method (Lyness, 1967;Lyness and Moler, 1967) as a machine-accurate reference gradient. The accurate gradients led to a tight optimization problem convergence.
While the presented CFD-based approach using the finite difference method is functioning well, it is not feasible to undertake a full simultaneous design of airfoils and blade planform. For this, the associated cost of computing the gradient is too high since 3-D shape optimizations of full rotor configurations involve hundreds of design variables (Nielsen and Diskin, 2012;Madsen et al., 2019). To carry out a design optimization study of a full rotor configuration using finitedifference-based gradients, one could apply a more conventional approach in which a sequential procedure in two steps should work (Barrett and Ning, 2018): first, the airfoils could be optimized with a method of choice (e.g., a panel method). Then, the planform could be optimized using the present CFD-based methodology. In practice, however, one would likely need to iterate between the two steps to arrive at a final design. See Barrett and Ning (2018) for further details and alternative approaches.
The remaining paper is structured as follows: a literature review of shape optimizations focusing on wind turbine blade tips is given in Sect. 2, whereafter the methodology used in the present study is presented in Sect. 3. Then follows a comprehensive analysis of the baseline rotor (Sect. 4) followed by a presentation of the design optimization problem (Sect. 5) before all optimization results are presented in Sect. 6. Finally, a conclusion is given (Sect. 7) in which overall findings are summarized.

Literature review
This literature review focuses exclusively on shape optimization of the blade tip from purely aerodynamic works. A literature review on general high-fidelity shape optimization of full rotor configurations within wind energy research is offered elsewhere (Madsen et al., 2019, their Sect. 2) in which an updated overview table can be found in a more recent work (Madsen, 2020, their Table 3.1). With respect to literature surveys focused on tips and winglets there are already a few present in the wind energy community (e.g., Gertz et al., 2012). However, some are several years old, and none are as comprehensive as the below-given literature survey. While some of the works mentioned below are indeed experimental, the focus is on numerical studies given that the present work is purely numerical. A literature survey including further experimental studies is presented by Mühle et al. (2020).
In order to structure the covered works the survey is split into the following sections: For an up front overview of all central works for the present study across the above-mentioned subsections one can consult Table 1. From this table it is possible to obtain rough estimates of, for example, what a reasonable performance increase for curved tip shapes is. It is also evident from Table 1 that only a few works on industrial-scale wind turbines exist, in which in particular actual optimization works seem to be few in number.
Care must be taken when comparing all the works listed in Table 1. Many works (Hansen and Mühle, 2018;Khalafallah et al., 2019;Khaled et al., 2019;Mourad et al., 2020;Papadopoulos et al., 2020;Aju et al., 2020) are focusing on smaller turbines, whereas only three other works (Kalken and Ceyhan, 2017;Matheswaran et al., 2019;Zahle et al., 2018) focus on larger megawatt turbines, as is the case in the present study. Furthermore, as will be evident in Sect. 2.2-2.3 there are few actual optimizations available, whereas the majority of works are parametric studies, and thus betterperforming tips could very likely be found in the vicinity of the selected parameter settings for these rotor configurations. Still, Table 1 does provide the reader with an overall survey of the related available literature and should help temper the expectations with respect to the possible performance increase when optimizing the tip. Importantly, there are three works (bold rows) that manage to provide a performanceenhancing tip design that does not violate the initial load envelope. These studies can therefore not be compared directly to the studies without load constraints. Finally, it should be noted that a clear progression in level of model fidelity can be seen over time. Thus, many of the later works rely exclusively on CFD allowing researchers to analyze the finer details of the 3-D complex flow phenomena present at the blade tip regions.

Early works on wind turbine blade tips
The concept of a winglet can be dated more than a century back to the English engineer Frederick W. Lanchester's 1897 patent application for fixed-wing aircraft (Lanchester, 1897). However, more recent history picks up in the 1970s when Richard Travis Whitcomb (Whitcomb, 1976) further refined the idea (https://appel.nasa.gov/2014/07/22/this-month-innasa-history-winglets-helped-save-an-industry/, last access: 30 September 2021). It has been known since the pioneering days of Whitcomb that even a small winglet can limit the spanwise velocity component and reduce downwash by displacing the tip vortex which ideally diffuses or is smeared out (Whitcomb, 1976). Furthermore, with respect to the winglet orientation Whitcomb says that lower (i.e., pressure side) winglets should be as effective as upper (i.e., suction side) winglets (Whitcomb, 1976, p. 8). In aerospace, the former are typically the smaller ones due to concerns of ground clear-ance. If anything, this is quite the opposite for wind turbines for which it is the winglet on the suction side, i.e., an "upper" winglet in aerospace terminology, which for wind turbines can cause a tower strike and therefore should be kept small. Another considerable difference is that the system of interest in wind energy is rotating, which adds some complexity to the induced velocity seen by the blade from the tip vortex. There are also other differences to study when transitioning from one sister science (aerospace) to another (wind energy). For a concise recap of essential works on winglets for fixed wings one can consult the introduction in Gaunaa et al. (2011).
About a decade after Whitcomb's 1976 study, Lissaman andGyatt (1985) present the perhaps first comprehensive study for wing tip devices dedicated to wind turbines. As seen from their reference list a considerable amount of research on these tip devices had already been carried out in the aerospace community at this point. However, as Lissaman and Gyatt (1985) point out, no studies focus directly on wind turbines. Using both field testing and numerical analysis codes (vortex methods) they analyze three tip shapes. However, they find that none of the shapes resulted in actual performance improvement. Interestingly, they also focus on noise and report that both the standard winglet and the split winglet produce significantly more noise compared to the baseline configuration, which they attribute to added turbulent flow. For future improvements, Lissaman and Gyatt (1985) see a need for increased model fidelity in the computer code used, and, as a result, they believe the most costeffective approach is further field or wind tunnel tests. Furthermore, means of visualizing the flow seem of interest. Not surprisingly, such techniques have more recently been heavily used, such as smoke (Mühle et al., 2020, their Fig. 6), particle image velocimetry (Aju et al., 2020, their Fig. 9), and oil (Andersen et al., 2001, their Fig. 7).
More than a decade after the earliest cited wind energy work (Lissaman and Gyatt, 1985) one will find the first significant increase in numerical model fidelity in which Madsen and Fuglsang (1997) present actual CFD-based results of novel tip designs. This is to the best of the authors' knowledge the earliest 3-D CFD-based study. Madsen and Fuglsang (1997) motivate their use of a high-fidelity CFD model by stating that the BEM-based approaches are more uncertain in the tip region due to complex flow phenomena. Using the presented methodology they are able to present a design with a non-separating tip vortex, thus lowering the tip noise. Thanks to the higher model fidelity, they can use streamlines to visualize the finer details of the flow. This allows them to align the twist angles of the winglet to match the streamlines thereby avoiding separation (Madsen and Fuglsang, 1997, their Figs. 6-11). Impressively, the final design was also tested in an experimental setup (Andersen et al., 2001) a few years later, thus lending further credibility to the results.  Imamura et al. (1998) use a vortex lattice method with free wake modeling to compare the performance of rotors with and without winglets. Five different downstream winglets were tested, as well as a baseline blade and a pure tip extension. The numerical results (Imamura et al., 1998, their Fig. 10) show that all winglets indeed have a beneficial effect on the power coefficient, C p , where no quantification is given. However, the pure blade extension is not far removed from the baseline performance. It should be noted that increases in bending moment are also reported (Imamura et al., 1998, Fig. 11). Again, there is no quantification. This effect is worst for straight extensions, and the bending moment increase is seen to be somewhat mitigated as the winglet increases its bending towards the downstream direction.
Already at this point, many of the key topics evident throughout the literature have been presented: it is not straightforward to gain a performance increase unless leveraging a meticulous design. Even in the cases when this is achieved, there are many other characteristics such as noise production which are important to consider. Perhaps for this very reason, many works start to focus on a select few parameters in order to narrow down the related effect for each parameter change. This trend is particularly evident in the next section.

Parametric studies
In the beginning of the 2000s, more works of medium-and high-fidelity models start to emerge. The vast majority of the works covered in this literature review are so-called parametric studies which also are known as design of experiments. For these works, the parameters are methodically varied in order to explore the underlying design space. Several works use the terms "optimizer" and "optimization" rather loosely, and thus, some works below (Farhan et al., 2019;Papadopoulos et al., 2020, etc.) will describe their work as optimizations. However, if the design parameter variations are chosen beforehand and there is no individual optimizer component that traverses the design space, they will in the present literature survey be categorized as parametric studies. Johansen and Sørensen (2006) study five wind turbine configurations with different camber and twist distributions in a parameter study using CFD. All five configurations had the exact same chord distribution. Both upstream and downstream winglets were analyzed, and the latter type is found to outperform the former. They use cant angle and sweep angle to describe their winglet geometries, in which cant angle refers to the angle of incidence the flapwise displaced part (i.e., the out-of-plane part) of the tip makes with a straight blade reference axis (i.e., a 0 • cant angle is a straight blade, and 90 • is a true winglet). Similarly, the sweep is the incident angle for the part of the tip that lies in the rotor plane. In this study cant angle (90 • ) and sweep (0 • ) are not altered but merely used to describe the designed winglets. The various tested twist and camber settings result in power increases of 0.6 %-1.4 %, whereas trust increases 1.0 %-1.6 % for winglets of about 1.5 % height.
Not long after, Johansen and Sørensen (2007) again carry out a CFD-based parameter study, this time showing that the tested winglets bring about a 1.0 %-2.8 % increase in power at the cost of a 1.2 %-3.6 % increase in thrust. The varied winglet parameters are height, curvature radius, sweep, and twist, having the following definitions. Height refers to the distance that the winglet protrudes in the out-of-plane region (i.e., the distance to the rotor plane). Curvature radius is a related measure given in percentage of the winglet height which describes how smooth the transition from straight blade to winglet occurs (0 % means a 90 • kink and 100 % means a very smoothly transitioning blade that first at the very tip of the winglet reaches the maximum projected blade length). In light of their previous results (Johansen and Sørensen, 2006) they only test downstream winglets. They test a total of 10 different rotor configurations. In relation to the present study it is relevant to mention that they find no effect on power for sweep and that only limited effect from changing the twist can be observed. Ferrer and Munduate (2007) also use CFD and analyze three tip configurations for the NREL Phase VI wind turbine rotor. The three configurations are a square tip, a highly tapered tip where the tip ends at the pitch axis, and a highly tapered tip where the trailing edges were aligned resulting in a swept-back tip. Here, the pitch axis should be taken as the quarter-chord location. They find that both tapered tips outperform the rectangular tip which has a large tip vortex. Of the tapered tips it is the configuration with its tip on the pitch axis that has the best torque to thrust ratio. Furthermore, they specifically point to a complementary use of CFD in lowerfidelity design approaches. Gaunaa and Johansen (2007) show that the increase in C p resulting from a winglet is owed to a reduction in tip effects (i.e., tip loss) and not as previously thought due to a downstream shift in the wake vorticity. However, in the same work a comparison between the developed free wake lifting line model and a CFD reference was not entirely successful leading to a follow-up study  in which said comparison was improved. In both works they advocate for downstream winglets which they find to be more efficient than their upstream counterparts.
A few years later, Gaunaa et al. (2011) used computationally lighter models based on lifting line theory to analyze blades with winglets. The tested models are a free wake model and a much faster prescribed wake model. The study relates highly to previous work by the authors (Gaunaa and  in which the free wake model was used. Moreover, the validation of the developed prescribed wake model is carried out against CFD-based results from Johansen and Sørensen (2006). By comparing results from both the free wake model and the prescribed wake model to CFD results, they conclude that these faster model types successfully can predict the effect of adding winglets to wind turbine rotors (errors: 4 %-16 % and 17 %-28 %, respectively; Gaunaa et al., 2011, their Table 2). Considering how well the wake models approximate the results from the CFD solver it would be very interesting to see these codes applied in an optimization context in future works.
Using high-fidelity models Sørensen et al. (2011) sought a deeper understanding of the underlying flow phenomena related to several tip shapes. This is, for example, a necessity when a thorough understanding of the generated noise level is needed. Furthermore, they used access to both highand lower-fidelity models to carry out a detailed comparison across fidelities. This allowed Sørensen et al. (2011) to implement an improved tip correction for lifting line models.
Aravindkumar (2014) use experimental field tests to compare with a CFD model in order to investigate performance increase and noise reductions. They find that adding an upstream winglet increases the generator power by 2.0 %, whereas the noise is reduced by 25 %. Tobin et al. (2015) use wind tunnel experiments to compare one rotor configuration without a winglet with a rotor configuration including a winglet and observe a C p increase of 8.2 %. A related thrust (i.e., C t ) increase of 15.0 % was observed. Having studied the literature they see a need for further insight in wake performance enhancement resulting from winglet design. They find that winglets do not significantly alter tip vortex strengths. Ariffudin et al. (2016) use a CFD model to investigate four different configurations; a straight extension, a swept extension that lies in the rotor plane, and configurations with winglets directed either upstream or downstream. Between the two first configurations it is the swept configuration that outperforms the straight configuration with a 9.1 % and a 7.3 % C p increase, respectively. As for the winglets it is the downstream configuration that results in the highest performance increase at 3.2 % compared to just a 1.8 % increase for the upstream winglet. No quantification of similar changes in the thrust coefficient, C t , is offered. Zhu et al. (2017) carry out a study aiming at determining the best direction for a winglet. Upstream, downstream, and even split winglets are tested. The split winglet pointing both upstream and downstream is found to be the best performing device, and a C p increase of up to 4.0 % is observed. Comparing just the pressure side and suction side winglets the former outperforms the latter since the pressure side winglet results in an up to 3.8 % increase, whereas the suction side winglet only reaches a 3.4 % increase (Zhu et al., 2017, their Table 5). They also conclude that the angle of the winglet should match the incoming flow angle as much as possible (something focused on already in the very early works referenced above; Madsen and Fuglsang, 1997). Finally, they mention that an actual optimization would further increase the performance. Kalken and Ceyhan (2017) study three different tip concepts (turbulators, winglet, and conventional tip) and submit the final designs to experimental testing on a 2.5 MW wind turbine for a final validation. The early design phase leverages BEM and lifting line models combined with free wake models, whereas CFD is used as an analysis tool on the finalized tip design. Kalken and Ceyhan (2017) report based on measurements that a more than 4 % power increase is owed to a simple blade extension, whereas the benefit of using different blade tip shapes result in a 2 %-9 % power increase. The CFD analysis shows that a conventional extension results in a higher power to thrust ratio gain than the studied winglet. However, they point to other beneficial side effects for choosing a winglet (noise reduction, height restriction, etc.).
The work by Matheswaran et al. (2019) is particularly interesting as it is one of the rare studies with a reported load-neutral tip design. This recent work has even resulted in a patent (https://patentscope.wipo.int/search/ en/detail.jsf?docId=US242151286&recNum=7&docAn= 16186876&queryString=(IC/F03D)&maxRec=92796, last access: 30 September 2021, USPTO. Application number 16186876), stressing the industrial relevance of novel tip designs that do not compromise the load envelope for the baseline wind turbine. Matheswaran et al. (2019) motivate their study by stressing the point that most tip designs for wind turbines do not focus on, for example, the added flapwise bending moment. As a result, the initially minimally invasive operation of retrofitting the very tip of the blade may become an intractable proposition altogether. By balancing the centrifugal force with the aerodynamic forces generated by the winglet itself, they present a lightweight winglet which focuses on minimizing the bending moments. The presented design methodology is based on the vortex lattice method which, unlike traditional BEM approaches, manages to model the flow at the very tip of the blade. The model has a prescribed wake, although Matheswaran et al. (2019) do point out that a "free wake" approach would be ideal, albeit also computationally more demanding. After having validated their vortex code against both experimental and numerical results they introduce an LCoE cost model and carry out a parametric study using four parameters (height, taper ratio, and twist and cant angles). Only suction side winglet designs are investigated. Having studied various configurations they decide on a design with the best compromise between a respectable increase in C p (2.5 %), while the force ratio between aerodynamic loads and centrifugal forces is kept manageable (1.4-1.8). They carry out a structural analysis to prove the designs can withstand the required loads. For future work they point to the need for higher model fidelity for which they specifically mention using a CFD model. Khalafallah et al. (2019) present a parametric study of winglets' possible effect on wind turbine power production. Both swept and straight blades are tested using a CFD model with up to 1.8 million cell meshes with periodic boundary conditions modeling only one of three blades for the rotor. Upstream and downstream winglet configurations with various cant angles and twist angles are tested, and the best result is achieved for the upstream configuration with a 4.4 % power coefficient increase. Using results from a previous study they select a few well-performing swept baseline blades and investigate the effect of adding winglets on the resulting power coefficient. A total of 18 straight blade configurations and 33 swept blade configurations were simulated. In general, they find that downstream winglets outperform upstream winglets for straight blades (Khalafallah et al., 2019, their Table 3). However, for swept blades (Khalafallah et al., 2019, their Tables 4-5) the conclusions are more ambiguous. Indeed, the best swept winglet design is an upstream-directed winglet yielding a 4.4 % increase in C p . Overall, they conclude winglets can indeed be used to enhance rotor performance and that they may as well lead to a reduced thrust coefficient. It should be noted that a reduction in thrust coefficient is only observed in the comparison to a swept baseline blade (Khalafallah et al., 2019, Fig. 13), whereas it is quite clear that the coefficient of thrust for the winglet coefficients in general increase compared to the straight blade baseline. Farhan et al. (2019) investigate winglet planform and airfoils' importance for winglet design using CFD. They vary height, cant angle, and planform using two different airfoils and find that the best performance increase is owed to about a distance equivalent to 3 % of the blade length, which from now on will be written as 3 % d/R for brevity. The configuration had a rectangular winglet with a 45 • cant angle. To choose the best turbulence model they start out by comparing two RANS models, i.e., the Spalart-Allmaras model and the shear stress transport (SST) model by Menter (1992), and find that the latter outperforms the former in emerging stall regimes. Finally, they test two airfoils and vary cant angle and winglet length (24 combinations) and report a −13.5 % to +9.8 % change in power depending on the wind speed and configuration (Farhan et al., 2019, their Table 3). The tested winglets were pointing in the downstream direction. Having chosen winglet height and cant angle, they then fix the airfoil shape to S809 and study planform effects (rectangular versus elliptical winglets). They find that both winglet length and cant angle are amongst the most important design variables for improving performance. Mourad et al. (2020) use an initial literature survey covering 10 references to conclude that (i) there is no agreement on optimum winglet configuration, (ii) height is the most effective winglet parameter, and (iii) an upstream-directed winglet should be preferred over downstream configurations. Since they found no study covering toe angle, i.e., the angle of attack between tangential velocity component and winglet profile, they carry out a parameter study using height and toe angle as their two parameters. The analysis showed that of the winglet heights ranging from 0.8 % to 8.0 % d/R it was the smaller winglet height of 0.8 % d/R that gave the largest power increase. Furthermore, Mourad et al. (2020) report that a downstream-directed winglet is not useful for wind turbine rotors. The poor performance of the winglet with 8 % height is not easily aligned with previously reported results by Gertz and Johnson (2011) and Gertz et al. (2012) stating a 5 % increase in power for that winglet height -albeit for different overall parametrization. Finally, it is found that from toe angles ranging from −30 to +30 • it is the (upstream-directed) +20 • toe-angled winglets that have the most beneficial effect, in which a 2 %-6 % increase in power coefficient is observed depending on the tip speed ratio. However, a related increase in thrust coefficient of 4.6 %-9.8 % is reported, meaning that the initial load envelope is compromised. Kulak et al. (2021) present both experimental and numerical results in a study of a small wind turbine rotor (20 cm radius) in which the aim is to raise overall power output. Wind tunnel tests of configurations with and without winglets were carried out to quantify the differences. In total, one baseline configuration, one suction side winglet (4 % height), and three pressure side winglets (3 %, 4 %, and 5 % height) were tested. The numerical results for the comparison were only generated on the pressure side winglet with 4 % height, as well as for the baseline rotor. A nice detail in this study is that they add a transitional model to increase model fidelity. In agreement with Gupta and Amano (2012) an increase in C p for particularly pressure side winglets is observed in which an up to 6 % increase is reported. No efficiency increase is seen for suction side winglets. Instead, a decrease in C p is observed compared to the baseline performance. Relating the experimental results to the numerical findings shows a misalignment since the numerically investigated pressure side winglet performs worse than the baseline rotor. Thus, experimental results do not agree with numerical results. They conclude that the precise winglet geometrical features must be carefully defined in order to gain the desired performance increase. Sy et al. (2020) use CFD to study a split winglet design and its ability to lower the induced drag. In total, four different designs are analyzed (baseline, straight, suction side winglet, and split winglet). The three modified meshes all had a 1.5 % r/R extension, and winglets had a 45 • angle offset. They observe a 1.23 % and a 2.53 % increase in power for ordinary winglet and split winglet, respectively. However, also thrust increases by 0.83 % and 2.05 % for the designs in question. Mühle et al. (2020) investigate the promise of using winglets to enhance wake recovery using an experimental setup. These types of studies view a wind turbine not only as an isolated system but also as part of a whole, for example, a wind farm. Thus, improved wake recovery in one turbine will lead to a performance increase in the next one. The investigated rotor is a two-bladed model-scale rotor in which the wing tip can be exchanged with a downstream winglet tip to compare performance characteristics. This is the same rotor which was investigated by Hansen and Mühle (2018). The wind tunnel measurements show that a winglet not only can be used to increase power production but also to provoke earlier tip vortex interaction resulting in a faster wake recovery. Papadopoulos et al. (2020) perform a numerical analysis of six rotor configurations of a small rotor. They set twist and toe angle to 0 • on the basis that available literature point to a minimal effect. The winglet height was set to 5 % blade radius. They find that the cant angles of ±45 • are better than the ±90 • angles and that sweep only has a limited effect. However, as they also document (Papadopoulos et al., 2020, Fig. 2) the corresponding projected in-plane area of the rotor is slightly increased for ±45 • angles, which should give these configurations an advantage. A nice detail in this study is the use of a four-equation transition model allowing for finer flow modeling than assuming a fully turbulent flow. The maximum observed effect was an 11 % increase in C p . For the configurations that did not increase the projected area the increase in C p (3.7 %) was more modest. Unfortunately, the impact on thrust is not reported in this study. With respect to the discussion on suction side versus pressure side winglets it is worth mentioning that the above-mentioned two configurations are pressure side and suction side winglets, respectively (+45 and −90 • ). They point to future studies on additional parameters in order to further the understanding of winglet parametrization. Aju et al. (2020) use an experimental setup to investigate the promise of using downstream winglet pitching as a means to lower turbine rotation and reduce thrust coefficient. This has relevance when, for example, wind turbines should be protected during extreme weather conditions. Given that the winglet mass only makes up 1.8 % of the blade the investigated method should provide a much faster response time compared to pitching the entire blade. Also, winglet pitching is shown to accelerate flow recovery in wake regions. Aju et al. (2020) agree with Mühle et al. (2020) in that there is great promise for winglet use in wake recovery.
Of the numerous parameter studies mentioned above many salient winglet features such as power enhancement and load mitigation have been investigated. However, several contradicting works were also found. Much can be owed to a misalignment in the studies, and one efficient way to unify the efforts would be to agree on design optimization problems to solve for. Not surprisingly, the need for actual optimizations is also brought up in some of the works (e.g., Zhu et al., 2017). In the next section these few but important works in the literature are discussed.

Design optimization studies
Design optimization studies often demand meticulous implementation with a great attention to detail, making them few in number in the wind energy literature. Still, some can be found. In fact, the very first work including CFD (Madsen and Fuglsang, 1997) also includes both single-point and multipoint shape optimizations. However, most of the optimization studies are by far more recent works carried out within the last 5 years as detailed below.
From 2011 to 2015 a series of studies emerge in which CFD is used to investigate possible winglet shapes on the NREL Phase VI rotor which has a 10 m diameter (Elfarra, 2011;Elfarra et al., 2014Elfarra et al., , 2015. Given that one of the studies (Elfarra et al., 2015) does not contain an actual optimization the focus will be placed on the other two works. In both studies containing optimizations (Elfarra, 2011;Elfarra et al., 2014) a direct CFD-based approach is deemed too computationally expensive, and surrogate models in the form of artificial neural networks are favored along with a gradient-free genetic optimization algorithm. Using 24 CFD evaluations to train the artificial neural network they are able to carry out winglet optimizations using cant and twist angle as design variables and obtain about a 9 % increase in power. A multipoint objective is targeted which gives a more robust design. However, as seen in Table 1 it has a rather coarse mesh resolution. While this may be due to the fact that it is indeed an early work it is fair to contemplate whether the used meshes are of adequate resolution as training material for the surrogate. After the optimization, a comparison of the flow characteristics for the baseline winglet design and the final winglet design reveal that the new design manages to attach the flow farther outboard, resulting in the reported improvement. The ensuing parameter study (Elfarra et al., 2015) focuses on power enhancement using 16 winglet configurations to study how winglet direction, sweep, cant angle, and twist influence the generated power. After having validated the numerical setup against experimental data, they test the 16 configurations, whereafter they conclude that suction side winglets generate more power than pressure side winglets. Depending on configuration and wind speed they observe an up to 10.43 % power increase (Elfarra et al., 2015, their Table 6). Interestingly, this particular configuration did not have a 90 • winglet but a 45 • winglet. The winglet was twisted +2 • (towards lower angle of attack). In general, this study sequence exemplifies how high-fidelity models such as CFD solvers can be used to visually inspect the flow and analyze the underlying flow phenomena at play.
Another sequence of relevant works is that by Hansen (2017), who applies evolutionary optimization algorithms in winglet design. The two works, Hansen (2018) and Hansen and Mühle (2018), focus on wind turbine airfoil and winglet design, respectively. The resulting optimization framework from the airfoil study (Hansen, 2018) is used in the final winglet study (Hansen and Mühle, 2018) to prepare the new airfoils needed for the ensuing wind tunnel tests. In the winglet optimization study (Hansen and Mühle, 2018), a Kriging surrogate model is trained on CFD evaluations of a model-scale wind turbine with a winglet. The optimization involves six design variables, and a total of 100 shapes are traversed. Finally, the wind turbine performance with and without winglet is validated using a 3-D printed experimental model in the NTNU wind tunnel (https://www.ntnu.edu/ept/ laboratories/aerodynamic, last access: 30 September 2021). Power is numerically predicted to increase 7.8 %, and a subsequent wind tunnel experiment of the winglet reports a 8.9 %-10.3 % power increase depending on the inflow turbulence. The 7.8 % power increase came at the cost of a related 6.3 % increase in thrust. Hansen and Mühle (2018) point out that a better shape could be obtained by including more design variables to further refine the design parameter space. Zahle et al. (2018) carry out a surrogate-based tip study in which the aim is to increase AEP without increasing the load envelope for the baseline blade considering only steadystate normal operating conditions. It is important to clarify that they optimize the tips for improved AEP, but the final results are reported as improvements in power. Using a gradient-based approach to efficiently manage the 12 design variables, they achieve a 2.6 % and a 0.76 % power increase for winglet-like and straight tip extensions, respectively. These numbers are not far off compared to the report made several decades before by Whitcomb (1976) stating that a winglet can result in a lift-to-drag ratio improvement more than twice that of a straight blade extension (Whitcomb, 1976, p. 13). The surrogate-based approach allows for a very efficient optimization process, although an added difficulty is that the final design may prove to (slightly) violate the bending moment for the underlying CFD model. This study has the same design optimization problem as is used in the present CFD-based design study which is further explained in Sect. 5. Zahle et al. (2018) also study the effect that each of the three tip-dedicated design variables have on the mechanical power and find that the achievable mechanical power improvement should increase as curvature, flapwise displacement, and edgewise displacement (i.e., sweep) of the tip are increased. Noticeably, the underlying CFD model predicts a maximum for the sweep design variable around 2 % d/R (Zahle et al., 2018, their Fig. 4). Interestingly, the efficient surrogate procedure allows them to explore the Pareto curve between bending moment increase and increase in mechanical power. Their mesh resolution of 5.97 million cells combined with the 300 sampling points allow for a detailed analysis of the parameter space, and the surrogate exhibited below 2 % error for both torque and bending moment. Their overall workflow includes tools for surface and volume mesh generation besides the exact same flow solver, EllipSys3D (Michelsen et al., 1992(Michelsen et al., , 1994Sørensen, 1995), as is used in the present study. All investigated tip shapes pointed upstream towards the pressure side of the blade to avoid a tower strike. Overall, Zahle et al. (2018) find that the CFD evaluations agree fairly well with the surrogate-based model. Furthermore, they conclude that a winglet-like extension should be favored over a straight extension, although the advantage is attenuated for higher wind speeds (Zahle et al., 2018, Fig. 6). Consequently, the predicted 2.6 % increase in mechanical power is valid for the lower wind speeds (6 and 8 m s −1 ). Finally, they explain how the tip shape in itself does not drastically increase power production but that its role instead is to diffuse and move the tip vortex further away in order to lower the induced drag. Reddy et al. (2019) use high-fidelity CFD evaluations to train a surrogate response surface and carry out a shape optimization study resulting in a C p increase of 4.5 % while only introducing a minor thrust force penalty. Before undertaking the actual optimization they validate their numerical setup by comparing it to experimental results. The defined objective is a compound function including coefficient of power, C p , coefficient of thrust, C t , and the twisting moment around the blade axis. They use five design variables: span, twist, dihedral, and sweep angle, as well as taper ratio. As stated by the authors, it seems to be the first published multi-objective optimization study for wind turbine winglets. Indeed, as seen in Table 1 only very few actual optimizations can be found in the related wind energy literature. Reddy et al. (2019) state that they use the much faster surrogate methods since an actual CFD evaluation of each configuration is too expensive. To train their surrogate 50 initial blade designs are used. In comparison, Zahle et al. (2018) use 300 CFD evaluations to train their surrogate. However, Zahle et al. (2018) also use 12 design variables and as a result would have to use more evaluations to train their surrogate since the underlying parameter space increases in size from 5 to 12 design variables. Based on their study, Reddy et al. (2019) conclude that winglets can increase power production and that their optimization framework is capable of handling the multidimensional design. Khaled et al. (2019) use a parameter study to investigate the performance effect of winglet length and cant angle for a small wind turbine and find that both power and thrust coefficients increase when a winglet is present. Using a low-speed wind tunnel they fabricate and test six rotor configurations and are able to quantify the error (∼ 5 %) between their computational setup and the experimental data. Subsequently, an artificial neural network is used to predict the best winglet shape which has a 6 % winglet length and a cant angle of 48 • resulting in a 9 % increase in both power and thrust coefficient.
In summary, very few actual optimizations of wind turbine winglets have been found. While it in principle is fine to compare (i) across model fidelity and (ii) between gradient-based and gradient-free optimization procedures, one can already identify one major issue in doing so: the cited works rarely quantify how well the design optimization problem is solved, and in not doing so one can basically not know if the problem is solved at all. Table 1 should mainly be seen as an overview of numerical studies within wind energy of rotors fitted with winglets, and correspondingly there are only a few studies (e.g., Tobin et al., 2015) with experimental results, whereas the remaining works include numerical investigations. Turning to the nature of the cited studies one will find that 12 are shape analyses, whereas only 6 include actual shape optimizations. It is fair to state that a chronological trend of transitioning from parameter studies to optimization studies can be observed since by far most optimization studies are very recent. For the parameter studies it is likely that better shapes could be found, making it difficult to compare these with the optimization works. Still, it is possible to give several general statements which can be found below.
Starting with discussing the role of the winglet it is quite clear that there is a consensus in the literature. The underlying physical principle of a winglet is to mitigate the induced drag which is introduced on any loaded wing close to the tip where a spanwise velocity component flows from the pressure side to the suction side. This spanwise flow results in a tip vortex which reduces the lift force. By tailoring the winglet one can manipulate the spanwise flow and change how and where the tip vortex occurs. The winglets' role is therefore not necessarily in itself to increase power production locally but merely to transport the tip vortex further away to lower the induced drag which in turn raises the power production further inboard Hansen and Mühle, 2018;Sy et al., 2020;Mourad et al., 2020).
Also the direction of wind turbine winglets has been discussed in numerous studies. Several works (Zhu et al., 2017;Khalafallah et al., 2019;Kulak et al., 2021) have found the best performance increase for upstream-directed winglets. However, it has also been reported that downstream winglets are most efficient (Johansen and Sørensen, 2006;Bak et al., 2007;Gaunaa and Johansen, 2008;Ariffudin et al., 2016;Khalafallah et al., 2019) and that these should be relatively short (Bak et al., 2007). In the latter case it should be noted that even short winglets may raise concern on tower clearance (Johansen and Sørensen, 2006). Indeed, tower strike is an important aspect once transitioning from academic exercises to actual industrial applications. The most straightforward conclusion to the above apparent contradiction on winglet direction is that it can be attributed to differences in parametrization and flow model fidelity. Furthermore, it is likely that the optimal downstream winglet and the optimal upstream winglet look different and therefore should have different parametrizations.
Indeed, taking in the whole body of work mentioned above it can be seen that the parametrization of winglets and novel tip shapes vary greatly. Parametrizations of 2 to 12 design variables have been reported. In general, the most typical considered design variables in Table 1 are winglet height, twist, sweep, cant angle, toe angle, extension, and curvature. For the higher numbers reported, for example, the 12 design variables used by Zahle et al. (2018), it is typical that the majority (9) of the design variables are used as planformtype design variables controlling, for example, the twist and chord distribution of the very tip of the blade. However, only three works (Hansen and Mühle, 2018;Zahle et al., 2018;Reddy et al., 2019) manage to take more than a few design variables into account simultaneously, and it is fair to speculate whether a mere two to four design variables indeed are sufficient to accurately model the full complexity of a winglet. That being said one can deduce several rules of thumbs for the use of design variables: one should, for example, expect the optimizer to leverage flapwise displacement to decrease bending moment (Imamura et al., 1998) or manipulation of the thrust coefficient (Khalafallah et al., 2019). Flapwise displacement can also be found to be favored slightly over sweep   Fig. 4), but both are used heavily to displace the tip vortex in numerous works (Hansen and Mühle, 2018;Sy et al., 2020;Mourad et al., 2020). Few works also include twist and chord design variables for the winglet planform where the expected trends of a reduced chord distribution towards the tip and an initial increase in twist distribution followed by a sharp decrease at the very end of the blade can be observed . However, these trends for twist and chord probably are difficult to generalize and are likely to be specific to pressure side winglets.
Turning to the ability of the winglet to increase overall performance the results in Table 1 vary from an increase of 1.1 % to 20.1 % depending on objective function and parametrization. However, not all investigated shapes were reported to be load-neutral compared to the baseline, which could compromise already installed rotors. One should therefore only compare load-neutral works (bold rows in Table 1) or clearly state by how much the initial load envelope is allowed to be exceeded in order to arrive at meaningful comparisons. Based on the load-neutral works in Table 1 it is fair to state that only a few percent improvement can be expected depending on the choice of objective function.
Despite the above-mentioned salient features recent publications (Reddy et al., 2019;Papadopoulos et al., 2020) agree that still only a limited use in the wind energy industry has been found for winglet applications. Moving forward, at least two trends can be pointed out to help spread the role of the winglet and to counter remaining contradictions found in the literature.
-Firstly, action should be a general increased model fidelity when studying a feature such as the winglet. Numerous works (Matheswaran et al., 2019;Døssing, 2007, etc.) point to the need of an increase in model fidelity when studying the tip area where conventional BEM models struggle. Using CFD and other highfidelity references also allows for improving said lowerfidelity models, which has already been reported in several studies (e.g., Sørensen et al., 2011). The increase in fidelity would also help capture the complex flow phenomena to better understand the role of the winglet.
-Secondly, one should aim for optimizations with unifying design optimization problems. By agreeing on unifying design optimization problems solved by the various frameworks one can eliminate some of the differences separating these works, and more consensus should be possible.
The novelty and contribution of the present study is to develop a CFD-based design optimization framework and to optimize a wind turbine blade tip, thereby addressing the needs identified in the presented literature review, i.e., (i) to study the tip using a high-fidelity model (CFD) and (ii) to carry out an optimization on a unifying design optimization problem. To this end, the remaining paper has sections for "Methodology" (Sect. 3), "Baseline analysis" (Sect. 4), "Design optimization problem" (Sect. 5), "Results" (Sect. 6), and "Conclusion" (Sect. 7).

Methodology
The overall aim of this study is to optimize the geometry of an academic wind turbine considering rotor aerodynamics only, and all aeroelastic effects will be disregarded. Constraints on load and geometry are used as a surrogate to maintain structural feasibility. This section describes the overall design framework used in the study, and the remaining sections will describe the optimization problem in further detail, as well as present the final results. Below, a general description of the framework is given (Sect. 3.1). Then, a description of the components -deformation library (Sect. 3.2), flow solver (Sect. 3.3), and optimizer (Sect. 3.4) -is given.

FlowOpt: a high-fidelity shape optimization framework
The high-fidelity shape optimization framework called "FlowOpt" at DTU Wind Energy is built around the inhouse flow solver, EllipSys3D. The framework is focused on gradient-based shape optimization, and in this study two different step-based approaches will be used to compute the gradient, namely the finite difference method and the Complex-Step method. The Complex-Step method can provide machine-accurate gradients, and up to 16 digits of gradient accuracy have been reported for the implementation in EllipSys3D (Madsen, 2020, their Fig. 7.9). This method will therefore be used in a gradient verification of the finite difference method gradients. However, it was observed that a few meshes (of the several hundred encountered throughout an optimization) did not lead to a deep convergence, resulting in impaired Complex-Step gradient accuracy for these few meshes. For this reason it was decided to use the finite difference method in the present study during the actual optimizations.
All components in the FlowOpt framework are written in compiled low-level programming languages for maximum efficiency. However, these components also have userfriendly interfaces allowing for a lenient interaction using the interpreted high-level Python programming language. It is through these interfaces that the Python-based FlowOpt optimization framework is built. A visualization of the FlowOpt framework can be seen in Fig. 1.
The design framework visualization in Fig. 1 starts in the upper-left corner where an optimizer (0, blue) sends a set of design variables, x DV , to a mesh deformation library (1, red) which in turn sends a resulting deformed CFD mesh surface, x surf , to a flow solver (2, green) which finally can compute a flow field, w, functions of interest, f, and constraints, c. All functions of interest and constraints are then together with the related gradients returned to the optimizer (3, blue), and the overall cycle is repeated until a final design, x * DV , is identified by the optimizer.
The choice of gradient computation method in the design framework has great impact on which optimization problems can be solved in a timely manner, but the actual course an optimizer takes during an optimization should not change. Indeed, it has previously been shown (Madsen, 2020, their Fig. 11.11) that the FlowOpt framework will carry out essentially identical optimizations when using either the Complex- Step method or the adjoint method as long as the flow field is well-converged to ensure that the gradients are machineaccurate.
Returning to Fig. 1 it can be noted that there are several layers to the mesh deformation component (1, red) and flow solver component (2, green). This signifies that the optimization may involve several simultaneous flow computations. As a result, the design framework supports a nested parallelism (through OpenMDAO; Gray et al., 2019) both with respect to running multipoint optimizations but also with respect to gradient evaluation. Thus, if sufficient computational resources are available, one can dedicate a separate group of CPUs for each design variable and maintain a fixed gradient computation time. This allows for an execution time comparable to those seen for adjoint-based optimizations. However, unlike the adjoint method the cost in computational resources will in this case increase linearly with the number of design variables being evaluated simultaneously. The presented study will therefore be rather costly in terms of CPU usage to ensure a state-of-the-art computation time.

FFDlib: a free-form deformation library
The in-house free-form deformation library, FFDlib (Madsen, 2020, their chap. 5), has been developed as an integral part of the FlowOpt design optimization framework and has previously been used in trailing edge flapping device studies (Horcas et al., 2018), as well as in high-fidelity shape optimization studies with various gradient computation techniques (adjoint, Complex-Step, etc.) as described elsewhere (Madsen, 2020). As the name suggests the parametrization library leverages a free-form deformation (FFD) formulation to propagate changes from a few chosen design variables out to every single embedded mesh point in the computational mesh.
The FFD methodology has numerous salient features and was chosen particularly due to the following three aspects: exact numerical mesh representation (if the inverse search is converged to machine precision), mesh topology agnostic, and analytical gradients.
As explained in the original FFD paper (Sederberg and Parry, 1986) the basic principle in free-form deformation is to embed an object in a rubber-like material, meaning that for the present study the tip of the blade will be embedded in deformation boxes. Mathematically speaking, an inverse search is used to map the discrete mesh points to the normalized parameter space spanned by the tuple coordinates, (s, t, u), according to the following equation: where CP ij k are the l × m × n control points of the FFD box, and X FFD (s, t, u) is the reconstructed 3-D point in the mesh computed from the normalized coordinates, (s, t, u), and the control points, CP ij k . The inverse search can be carried out by solving the following equation (Casale and Stanton, 1985, their Eq . 7): where P is the point which is to be embedded. This equation is solved in FFDlib with the iterative Newton search: The actual implementation of the above-described FFD method is split in two parts to maximize efficiency: it consists of an underlying code base written in Fortran containing the computationally heavy operations and a user-oriented interface written in Python which is integrated in the FlowOpt shape optimization framework.
Furthermore, the basic FFD methodology can be extended in many ways such as ensuring C i -continuity control between deforming and non-deforming interfaces or volume preservation capabilities (see Hahmann et al., 2012). While FFDlib has indeed been extended with both mentioned features, it is only the C i -continuity control that will be used in the present study to ensure a high mesh quality is maintained as described further in Sect. 5.1.
FFDlib will in the present study only be used to deform the CFD surface mesh. The deformed surface mesh will then be propagated down to the flow solver which in turn updates the volume mesh.

EllipSys3D: a general purpose flow solver
For the present study the general purpose flow solver, Ellip-Sys3D, based on the finite volume method is used to solve the steady-state incompressible RANS equations as the underlying flow model along with the κ − ω shear stress transport (SST) turbulence model by Menter (1992). Furthermore, velocity and pressure variables are coupled using the semi-implicit method for pressure-linked equations (SIM-PLE) (Patankar, 1980), in which the Rhie-Chow interpolation (Rhie and Chow, 1983) is used to avoid checkerboard patterns. Flow solutions are obtained with a third-order accurate discretization scheme. Finally, the internal volume mesh deformation routines will be used as explained further in Sect. 3.3.1.
At present, the EllipSys3D has been used extensively in numerous application areas ranging from blinded comparisons (Simms et al., 2001) to rotor analysis , DES simulations (Johansen et al., 2002), large eddy simulations (LESs) (Berg et al., 2018), and studies in vortexinduced vibrations (Horcas et al., 2018) to name but a few. EllipSys3D was also recently tested on the MareNostrum (https://www.bsc.es/marenostrum/marenostrum, last access: 30 September 2021) supercomputer and achieved above a 50 % scaling efficiency when using more than 16 000 CPUs. This aspect is particularly important for the present study for which a high number of CPUs must be leveraged per rotor computation in order to arrive at competitive computation timings for the entire optimization.
As seen, EllipSys3D has during the last three decades of development been thoroughly extended, including an overset grid method (Zahle, 2006), transition modeling (Sørensen, 2009), and an adjoint solver (Madsen, 2020). It is outside the scope of the present work to account for all these applications. To be clear; the transition modeling was not used in the present study, and the flow was assumed to be fully turbulent. Depending on the condition of the surface and type of airfoil, this assumption is not correct since the laminarto-turbulent transition will not take place at the leading edge but at some point along the chord. However, one can consider it a conservative modeling choice to ensure robustness of the design under conditions in which the boundary layers are turbulent (e.g., due to surface soiling).
Alone within studies focused on tip shapes and winglets the EllipSys3D flow solver has been used in about a dozen works Sørensen, 2006, 2007;Johansen et al., 2008;Johansen, 2007, 2008;Gaunaa et al., 2011;Sørensen et al., 2011;Kalken and Ceyhan, 2017;Zahle et al., 2018). Therefore, only a paragraph focusing on the shape optimization studies leveraging the EllipSys3D flow solver is given in the following.
The EllipSys flow solver has been used in slat design using an overset grid method (Gaunaa et al., 2013) on a 10 MW rotor configuration, in which a series of 2-D crosssections were optimized with multi-element airfoils. Ellip-Sys was also used to design airfoils using a gradient-based setup with finite-differenced gradients, where the optimized airfoils were subsequently validated with wind-tunnel testing (Zahle et al., 2014). More recently, EllipSys3D figured in a surrogate-based design study  in which an optimization problem similar to the one seen in the present study was solved. They showed there was great promise in coupling the CFD solver to lower-fidelity methods in the interest of saving computation time while maintaining the essential flow physics. Finally, the EllipSys3D flow solver has been integrated in the FlowOpt design optimization framework and applied in high-fidelity shape optimization. The chosen design optimization problem (Madsen, 2020, their Eq. 11.3) was a drag minimization of a 3-D wing subject to a lift constraint using five twist design variables. The problem was chosen to evaluate the newly developed framework on one of the few aerodynamic shape optimization problems that actually has an analytical optimal solution: an elliptic lift distribution. Mesh sizes up to 664 × 10 3 and 83 × 10 3 cells were used in the optimization using the Complex-Step and adjoint methods, respectively. It was found that the analytical elliptic lift distribution was well-approximated (Madsen, 2020, their Figs. 11.6 and 11.9), and most optimizations were tightly converged below a 10 −4 threshold. Finally, it was demonstrated that essentially identical optimizations (Madsen, 2020, their Fig. 11.11) occur when computing gradients either with the Complex-Step method or with the adjoint method for well-converged flows.

Mesh deformation
In this study the EllipSys3D flow solver will receive modified surface meshes as the optimization progresses. EllipSys3D will then subsequently propagate the deformation change between the original surface mesh and the deformed surface mesh out through the volume mesh using internal mesh deformation propagation routines. This deformation method in EllipSys is based on an analytical approach in which both the translatoric and overall re-orientations (i.e., rotations) of the surface are propagated and attenuated in the volume mesh using a hyperbolic tangent function, blended into the original volume mesh based on the distance to the blade surface along the given grid line. The consideration of the rotation of the surface greatly improves the quality of the deformed meshes for certain configurations compared to attenuating only the displacement resulting from the deformation. For instance this update has been instrumental for the present work as considerable local rotations were identified for some of the explored tip shapes. Without the consideration of rotations in the mesh deformation routines, this led to mesh folding in the boundary layer region.

Optimizer
In this work the Sparse Nonlinear OPTimizer (SNOPT) version 7.2-10 is used (Gill et al., 2018(Gill et al., , 2005 (https:// web.stanford.edu/group/SOL/guides/sndoc7.pdf, last access: 30 September 2021). SNOPT is based on a sequential quadratic programming (SQP) algorithm which allows for infeasible steps to be taken during the optimization. At convergence all optimizations were feasible within the requested tolerance.
The SNOPT optimizer is accessed in the FlowOpt framework through the open-source Python wrapper called pyOptSparse (Wu et al., 2020) by choosing the pyOptSparseDriver in OpenMDAO. pyOptSparse is leveraged extensively (https://mdolab-pyoptsparse. readthedocs-hosted.com/en/latest/publishedWorks.html, last access: 30 September 2021) throughout the broader numerical optimization community and is as the name suggests dedicated to constrained nonlinear optimization of large sparse problems.

Baseline analysis
The present section contains a description of baseline planform, surface mesh, and volume mesh. Mesh modifications were made to the original geometry to enable a deep convergence. These changes will also be presented, and the effects of the geometry changes will be assessed. Finally, a scaling study is also included to discuss the necessary computational resources needed to carry out direct CFD-based optimizations.
The rotor surface mesh has been generated from the planform data and the FFA-W3 airfoil family used for the IEA 10 MW with the in-house Parametric Geometry Library (PGL) tool. The surface mesh on each blade has 256 cells in the chordwise direction and 128 cells in the spanwise direction, partitioned into blocks of 32 × 32 cells. Four blocks of 32 × 32 cells form the tip cap, resulting in a total of 3 · 36 · 32 · 32 = 110 592 mesh cells for the surface mesh.
The baseline volume mesh is prepared with the in-house hyperbolic mesh generator, HypGrid3D (Sørensen, 1998). The volume mesh has an O-O topology in which 128 layers are grown from the surface mesh resulting in 14.16 million  cells. By setting the first boundary layer cell below 10 −6 m, a y + below 1.0 is ensured given the operational conditions seen in Table 2.
The above description is for the baseline mesh at the very start of the optimization. All subsequent volume meshes throughout the optimization are computed using the internal mesh deformation routines in EllipSys3D. As a result, they are all likely to exhibit a slight reduction in mesh quality due to impaired orthogonality of the mesh as the tip is created from a straight blade planform.
Finally, with respect to boundary conditions the rotor surface is a no-slip boundary, whereas the far-field zone is split into two sections: an approximately circular area behind the rotor is an outflow-scaling zone, whereas the rest of the farfield region is an (uniform) inflow zone.

Geometrical modifications
Initially, the described surface mesh of the IEA 10 MW reference wind turbine was used in the optimizations without any additional modifications. However, as finer grid levels were taken into use a seriously impaired gradient quality was observed. The explanation proved to be a lack of flow convergence on the finer grid levels due to complex swirling 3-D flow phenomena occurring in the root area of the rotor. Such a massive blunt object will inherently cause complex flow phenomena resulting in a lack of convergence for steady- There are several remedies suggested in wind energy research on shape optimization for the described convergence issues. Nielsen and Diskin (2012) were able to carry out shape optimization not just of the rotor itself 1 but also including nacelle and tower (Nielsen and Diskin, 2012, their Fig. 4) by using an unsteady RANS formulation. It is very likely that the unsteady RANS formulation in EllipSys3D would somewhat mitigate the observed impaired convergence. However, the computation time would drastically increase, which is why this option was ruled out.
Yet another option found in the literature is to exclude some mesh regions: Dhert et al. (2017) cut out the root of the rotor configuration to improve convergence, whereas Vorspel et al. (2018) exclude both root and tip regions from being deformed due to a local inferior gradient quality.
Of the above-mentioned alternatives the mesh (root) modification option is favored since the present shape optimization study is focused on tip shapes. However, instead of removing the root section altogether it was decided to simply re-shape it aerodynamically to reduce separation and in turn improve convergence. The modified baseline mesh can be inspected in Fig. 4.
To assess the quality of the modified mesh a full grid sequence is run on both the original baseline mesh and the modified baseline mesh using the operational conditions seen in Table 2.
The resulting convergence behavior, as well as spanwise forces, can be inspected in Fig. 5. As is evident from the figure it is only the innermost part of the spanwise forces where one can discern a minor change. The design optimization problem of optimizing the tip is in other words practically unaltered, and the convergence issues have been addressed without changing the task at hand.
1 Interested readers will notice that the actual root of the NREL VI rotor was excluded in this study. Still, the inclusion of nacelle and tower introduce a massive blunt object in the flow.

Mesh convergence study
A mesh convergence study is carried out using the modified rotor geometry to investigate mesh dependence of the flow solution. To generate the coarse mesh levels to be used in the mesh convergence study a grid coarsening sequence inside the flow solver is used: the above-described computational mesh is the finest mesh called L1. The next mesh level, L2, is generated by removing every second mesh point in all directions. Similarly, the L3 mesh level, which is the coarsest mesh level used in the present study, is generated by removing every second grid mesh point from L2. All three mesh levels, L1, L2, and L3, are listed in Table 3.
It has previously been shown (Madsen et al., 2019, their Table 4) that a flow solver with a noticeable grid dependency may even suggest incorrect design trends (Madsen et al., 2019, their Sect. 6.2) on the coarser grid levels. Therefore, one should ensure that all mesh levels exhibit low error percentages compared to the Richardson extrapolation. As listed in Table 3 the error percentage on both mesh levels L1 and L2 is well below 10 %, and it should be reasonable to expect relevant design optimization results at least for these two mesh levels.

Computational resources
As a final section in the baseline analysis the computational resources needed to carry out shape optimizations on all three grid levels are assessed. A visualization of flow solver scaling on the high performance computing (HPC) cluster 2 at DTU Wind Energy can be inspected in Fig. 6.
The most important aspect to consider when inspecting scaling results as shown in Fig. 6 is the finest mesh level, L1, since it by far is the most time consuming and since it will determine the final results. As seen, the scaling is indeed close to ideal on L1 for EllipSys3D, and if computational resources are available, one can advantageously use as many CPUs as possible on the given rotor mesh. In the present case that number is 432 -one CPU for each block in the mesh.

Design optimization problem
The single-point design optimization problem used in the present study is to optimize the power production using 12 design variables while satisfying constraints on smoothness of the geometry and on the flapwise bending moment computed at 90 % span, which from now on will be written as 0.9 r/R for brevity. Mathematically, the design optimization 2 The Sophia HPC cluster at DTU Wind Energy comprises 516 computational nodes where each of these is a x86-64 computer with 32 cores. More information is available at https://windenergy.dtu. dk/nyheder/2019/12/ny-computer-cluster-paa-risoe-campus?id= a495d2e5-a7f1-4133-9fb2-488d150f7c01, last access: 30 September 2021.   Table 3. Also the normal force (b) and driving force (c) are visualized. As seen, the spanwise forces only differ noticeably within the 30 m farthest inboard.
problem can be formulated as follows: with respect to twist: θ 1 , θ 2 , θ 3 , θ 4 chord: c 1 , c 2 , c 3 , c 4 tip shape: c wl , h wl , s wl , R ext , subject to M bending | 0.9 r/R initial Above, the objective function is mechanical power, P = ω · Q, found from rotational rate, ω, and torque, Q, where the operational conditions used in this study can be found in Table 2.
As seen, the constraint on bending moment ensures that the bending moment at 0.9 r/R span does not increase compared to the baseline value. The geometric chord design variable constraint ensures that the optimization does not increase the chord towards the very tip of the blade. There are no constraints on the twist design variables. In the following section (Sect. 5.1) the 12 design variables are further described.

Parameterization
In the design optimization problem there are 12 design variables: four twist variables (θ 1,2,3,4 ), four chord variables (c 1,2,3,4 ), one extension variable (R ext ), one flapwise tip displacement variable (h wl ), one edgewise tip displacement variable (s wl ), and one tip curvature variable (c wl ). We have adopted all design variable nomenclature from a related surrogate-based optimization study  to allow for an easy comparison. The twist, chord, and extension design variables change the planform at the blade tip region, whereas c wl , h wl , and s wl are used to shape the tip towards winglet-like shapes.
All 12 design variables are imposed by manipulating the three FFDlib tip boxes (blue) seen in Fig. 7. As seen, the deforming tip region (red mesh lines) embedded in the FFD boxes only makes up 10 % of the total blade length of the baseline rotor (gray). The twist and chord variables are imposed section-wise on FFD box sections S 3 , S 4 , S 5 , and S 9 in Fig. 7. Sections S 6-8 are given from interpolation once sections S 5 and S 9 are set. Also seen are two darker blue areas on the FFD boxes signifying that the related FFD spanwise sections have a particular functionality: the darker blue part further inboard visualizes the two FFD box sections, S 1-2 , which are locked to ensure a C 1 continuity with the remaining surface mesh. The darker blue part towards the very tip of the blade visualizes three FFD sections, S 6-8 , that have been inserted as a tip-cap protection since it is crucial that this part of the mesh retains a high mesh quality. Notice that the tip can indeed still be deformed by moving the outermost FFD box section, in which case the three tip-cap protection sections are interpolated to their correct position.

Tip design variables
The three tip design variables -flapwise tip displacement (h wl ), edgewise tip displacement (s wl ), and curvature of the tip (c wl ) -can be inspected in Fig. 8 where the FFD box visualization has been omitted in order to more easily inspect the  6. Measured scaling on the Sophia cluster for the EllipSys3D flow solver on the modified baseline mesh using 27, 54, 108, 216, and 432 CPUs. For the L2 mesh it seems up to 108 CPUs will result in very efficient CPU usage. For L1, which is by far the most interesting and time-consuming mesh level, the scaling is ideal, and up to 432 CPUs can advantageously be used. On L3 the mesh is so coarse that the computational task is modest, making scaling on this mesh level less important. deformed geometry. Notice that the curvature is an interpolation variable from 0.0 (maximum curvature) to 1.0 (straight tip). It is also relevant to point out that the role of the three locked FFD sections (dark blue in Fig. 7) next to the outermost tip section is to protect the volume cells at the tip, which at the same time results in a reduced maximal curvature for the parametrization.

Gradient verification
Given that the present study uses the finite difference method an initial step size study is carried out to identify a suitable step size. Table 4 shows how the finite difference gradient accuracy correlates to the chosen finite difference step size. The machine-accurate reference gradient is computed using the Complex-Step method. Gradient accuracy for all constraints has also been verified (not shown) and exhibits similar accuracy. By inspecting Table 4 one learns that for most design variables (e.g., twist, edge, extension) the best finite difference step size is 10 −4 . However, for chord design variables a step size of h = 10 −5 seems better suited. Up to six significant digits can be seen, which is more than plenty to carry out gradient-based design optimization. However, it is also evident that the step size does not have to be much off before the gradient precision is worsened. This may factor in for longer optimizations for which many new rotor shapes are intro-duced, meaning that also the optimal step size might change slightly throughout the course of the optimization.

Results
The main results of this study fall in two parts. The first part (Sect. 6.1) is carried out solely on mesh level L3 and is a study in finding the best settings that balance the most accurate gradient computation on the one hand with more robust settings on the other hand that result in properly converged optimization problems. The second part (Sect. 6.2) is a shape optimization study using three grid levels in which focus is laid on analyzing the shape of the resulting blade tip.

Step size study for shape optimizations based on finite difference
To identify the best functioning step size a series of six shape optimizations were run using step sizes h = 10 −1 , h = 10 −2 , h = 10 −3 , h = 10 −4 , h = 10 −5 , and h = 10 −6 . All optimizations were allowed to run until either the upper limit of 100 major iterations (i.e., design steps) was reached or the optimizer exited since it could not converge the design problem further 3 . The lower and upper design variable bounds for this Above, the units for the design variables are the following. Twist variables, θ , are in degrees. Chord variables, c, use a unitless scaling factor. Curvature is likewise a unitless interpolation factor from 0.0 to 1.0, where 0.0 represents the maximally allowed curvature for the parametrization, and 1.0 represents a straight tip (see Fig. 8c). Flapwise displacement, h wl , and edgewise displacement, s wl , are in meters, and the extension scaling variable, R ext , is a unitless scaling variable which stretches the entire FFD box.
The shape optimization results for the six different step sizes are listed in Table 5, and their optimization histories are visualized in Fig. 9. Before discussing the results in Table 5 it should be noted that the step size is but one of several important settings that one must fine-tune to arrive at a properly functioning optimization framework. Another important consideration to mention is that these step size studies ideally should be done on each grid level. However, that is extremely expensive on the finest grid level for 12 design variables. Based on the present study it is the authors' experience that for flow solvers as consistent across grid levels as seen in Table 3 it will suffice to carry out the step size study on mesh level L3. For flow solvers less consistent across grid levels one might have to redo the step size study on each grid level. The first thing to note when inspecting Table 5 is that one should take care not to use finite difference step sizes that are too small. Indeed, the most accurate gradient step sizes identified in Sect. 5.2 (i.e., h = 10 −4 and h = 10 −5 ) do not result in very successful optimizations. Although h = 10 −4 and h = 10 −5 indeed where the most accurate step sizes on the baseline mesh, they do not (in the present study) seem to be a robust choice over the course of an entire optimization. One potential explanation could be that cancellation errors due to step sizes that are too small are likely to occur for at least a few of the several hundred rotor shapes generated by the optimizer during an optimization. Even if this happens for just a few shapes, it might result in the optimizer having to reset the Hessian which could impair the optimization problem convergence. One could implement individual step sizes for each design variable in the attempt to gain better gradient accuracy and robustness, but in general it seems advisable to use a step size 1-2 orders of magnitude larger than h = 10 −4 to deeply converge optimization problems.
Judging from Table 5 the main benefit from finding a good compromise between gradient accuracy and overall robustness is that a sound procedure for ending the optimization is available: optimizations with a functioning step size (SPL3e1c, SPL3e2c, and SPL3e3c) all have around 50 major iterations when terminating and result in approximately the same improvement, whereas the other optimizations (SPL3e4c, SPL3e5c, SPL3e6c) differ both in number of major iterations and in the final improvement. Noticeably, these optimizations (e.g., SPL3e4c) may still result in approximately as much improvement as the better-performing Table 5. Overview of six CFD-based shape optimizations of wind turbine blade tips using different finite difference step sizes. Operational conditions are found in Table 2. The naming convention explanation using "SPL3e2c" as an example is as follows. SP: a single-point optimization; L3: on mesh level L3; e2: using a finite difference step size of 10 −2 ; and c: (cold-)started from a straight baseline configuration. All optimizations were carried out using 648 CPUs split into 12 groups -one group of 54 CPUs for each design variable.

ID
Mesh level Wall clock * Maj. iter.
optimizations, but the optimization problem is only converged by 1-2 orders of magnitude, and one cannot be sure whether the optimization problem is actually solved.
Optimizations SPL3e5c and SPL3e6 in Table 5 resemble two typical end-case scenarios for optimizations with inefficient settings. Either the optimization finishes prematurely (e.g., SPL3e5c) or the optimizer keeps trying to converge the optimization problem using excessive iterations (e.g., SPL3e6c). In both cases, the optimizer exits due to numerical difficulties resulting from the inaccurate gradient. While the SPL3e5c optimization finishes much sooner than the SPL3e6c optimization, thus saving considerable computation time, it actually results in a shape which performs slightly worse (not visible in Table 5). Thus, SPL3e5c represents the least successful optimization in Table 5.
Turning to Fig. 9 it is easy to see how the optimizations progress. All optimization problems start with an optimality around 10 −3 (see right y axis in Fig. 9 shown in red). The 10 −7 threshold has been chosen to ensure that all design optimization problems are converged by about 4 orders of magnitude. To relate this to relevant literature it is noted that a wind turbine design problem with a single design variable (pitch) has been well-converged with an optimality reduction of only 3 orders of magnitude on similar mesh sizes (325 × 10 3 cells in Dhert et al., 2017, their Fig. 5, and 221 × 10 3 cells in Madsen et al., 2019, their Fig. 8). Indeed, Madsen et al. (2019) showed that wind energy design optimization problems with 1, 14, and 154 design variables were all well-converged with an optimality reduction of about 2-4 orders of magnitude (Madsen et al., 2019, their Figs. 8, 9, 11, 12, 14, and 17). Therefore, a chosen threshold of about 4 orders of magnitude reduction in optimality for the present study seems reasonable. As seen in Fig. 9 the chosen threshold is only reached by the optimizations which converge properly, and the threshold could even have been 3-4 orders of magnitude stricter at the limited expense of about 10 major iterations. However, it is clear from the merit function shown in black that it would lead to no added value since the objective function for all practical purposes is fully converged. In order to accommodate reproducibility, Fig. 9 shows a "merit function" instead of the actual design optimization problem objective since it is the former metric that SNOPT exports. However, these two metrics are highly related as detailed in SNOPT's manual, and the merit function will converge to the objective function value as the solution is approached. The optimization history for SPL3e6c clearly visualizes what may happen if said threshold is not met: the optimizer uses a large amount of excessive iterations in the attempt to further converge the optimization problem. Evidently, it is crucial to have a well-defined way to end optimizations.
An important point when discussing the cost of optimizations is to discern an optimizer's major iterations from wall clock computation time: although SPL3e5c is the optimization with by far the fewest major iterations, it is the second most time-consuming wall-clock optimization, as also indicated in Table 5. The optimizer simply takes very few actual steps in this optimization due to impaired gradient precision.
In summary, the best functioning step sizes seem to be h = 10 −1 , h = 10 −2 , and h = 10 −3 . Given that h = 10 −3 of the three resulted in the most accurate gradient computation when comparing to machine-accurate reference gradients (Table 4), it is the h = 10 −3 step size that will be used in the ensuing section.

Aerodynamic shape optimization of wind turbine blade tips
With h = 10 −3 , identified as a promising finite difference step size in Sect. 6.1, a shape optimization study across three grid levels has been carried out. All optimizations were again allowed to run until either the upper limit of 100 major iterations (i.e., design steps) were reached or the optimizer exited Figure 9. Merit function (black, left y axis) and optimality (red, right y axis) for shape optimizations on mesh level L3 using step sizes h = 10 −1 , 10 −2 , 10 −3 , 10 −4 , 10 −5 , and 10 −6 . The simulations are further described in Table 3 under the names SPL3e1c, SPL3e2c, SPL3e3c, SPL3e4c, SPL3e5c, and SPL3e6c, respectively. As seen, the three optimizations with the largest finite difference gradient step sizes converge deeply, which is not true for the three optimizations with the smallest step sizes.
since it could not converge the design problem further 4 . For large flapwise and edgewise displacements negative cell volumes were encountered on the finest mesh level as a result of the deformation of the mesh during the optimization. In order to make sure that the exact same optimization could be carried out on the various grid levels, it was therefore necessary to limit the upper design variable bounds for h wl and s wl to 2.0 m. All other settings from Sect. 6.1 were kept the same. The results from the final shape optimization study are listed in Table 6, and their optimization histories are visualized in Fig. 10. As seen from Table 6 a total of five optimizations where run: three optimizations (SPL3e3cb, SPL2e3cb, and SPL1e3cb) were (cold-)started from a straight baseline blade, whereas two optimizations (SPL2e3hb, SPL1e3hb) used the optimization result from a coarser mesh as a starting point.
Starting from the left in Table 6 one can, after "ID" and "Mesh level", find two columns describing the computational resources used in this study (i.e., "CPUs" and "Wall clock"). When discussing the amount of CPUs used in the study it is good to remember that the CPUs are split into 12 groups: one group for each design variable to reduce the gradient computation time. The reason for using only 12 · 54 = 648 CPUs on mesh level L3 is that the efficiency study (see Fig. 6) clearly showed that very little speed-up could be gained by increasing the number of CPUs on this mesh level. Similarly, for L2 there is a drop in efficiency in Fig. 6 after 108 CPUs, which is why 12 × 108 = 1296 CPUs were used for SPL2e3cb. However, for L1 the efficiency in Fig. 6 for the EllipSys3D flow solver is at the upper possible limit, and more CPUs could advantageously have been used. The reason for limiting SPL1e3cb and SPL1e3hb to 1296 CPUs has to do with the computational resources available for the present work. 4 The related SNOPT message is as follows: SNOPTC EXIT 40 -terminated after numerical difficulties. SNOPTC INFO 41 -current point cannot be improved.
This also means that the computation time could be much improved: simply by raising the number of CPUs from 108 to 432 one could gain a factor of 4 in speed-up, meaning that SPL1e3hb could be carried out in only 204.3/24/4 ≈ 2 d. For a well-converged high-fidelity optimization 2 d is certainly an acceptable computation time. One may find more optimistic computation time consumptions reported in the literature but typically with a correspondingly poor convergence of the design optimization problem. As also indicated in Fig. 10 the computation time is very dependent on the chosen threshold. Therefore, one should not discuss one without considering the other.
While on the topic of computation time it may be relevant to mention the τ time unit which is a non-dimensional work unit to compare across HPC systems (https://cfd.ku. edu/hiocfd/, last access: 12 July 2022). The τ code was downloaded, and 10 runs were carried out on the Sophia cluster resulting in an average execution time of 3.89 s. Using this result one can now compute a unitless normalized version of the reported wall clock times in Table 6 It is difficult to relate the reported timings to relevant literature from the wind energy community since very few highfidelity CFD-based shape optimization studies of that magnitude exist (Madsen, 2020, Table 3.1). Elfarra et al. (2014) do mention an approximate wall clock timing of 240 h for their gradient-free optimization, but since the timings are not in normalized τ units, it is difficult to compare across HPC platforms. Furthermore, Elfarra et al. (2014) do not mention how well their optimization problem is converged in terms of optimality reduction, which makes a comparison very difficult to carry out. Turning to the study by Dhert et al. (2017) one can, however, find both wall clock and optimality reduction reported: they spend 8.25 h on a wind turbine design problem with a 2.6×10 6 cell mesh resolution. The reported mesh resolution is most easily compared with the SPL2e3cb results Table 6. Overview of optimized shapes obtained from CFD-based shape optimizations of wind turbine blade tips. Operational conditions are found in Table 2. The naming convention explanation using "SPL1e3hb" as an example is as follows. SP: a single-point optimization; L1: on mesh level L1; e3: using a finite difference step size of 10 −3 ; h: (hot-)started from the optimized shape on L2; and b: with limited design variable bounds that work on all grid levels. All optimizations were carried out using either 648 (L3) or 1296 (L2, L1) CPUs split into 12 groups -one group of either 54 (L3) or 108 (L2, L1) CPUs for each design variable. Figure 10. Merit function (black, left y axis) and optimality (red, right y axis) for shape optimizations (Table 6) on mesh levels L3, L2, and L1 using finite difference step size h = 10 −3 . The SPL1e3hb is a hot-started optimization on L1 where the result from L2 is used as the starting point. The SPL2e3hb correspondingly uses the L3 result as the starting point. All other optimizations start from a straight blade.
in Table 6. However, it is difficult from the reported final optimality (Dhert et al., 2017, Table 2) to learn how many orders of magnitude it has been converged, making it difficult to compare with the presented timings in this study. Madsen et al. (2019) report CPU timings (Madsen et al., 2019, Table 6) for adjoint-based high-fidelity shape optimizations using a single design variable of close to 60 h for a mesh with a resolution equal to L1 in Table 3. Again, these computations were carried out on a different HPC platform than the present study, so these timings are difficult to compare, but as optimizations tend to become more difficult to converge as more design variables are included, the 60 h should certainly be seen as a lower possible bound in that study. In summary, a realistic lower bound for a high-fidelity optimization that is well converged seems to be around 2-3 d for very efficient frameworks.
Returning to Table 6 one can in the fifth column find the final design variables for each optimization. For readability all design variables ending at the upper/lower limit have been colored accordingly. Overall, it can be said that the final shape trends favored by the optimizer are an increase in sweep and an even greater increase in flapwise displacement. Both design variables are used to mitigate an increase in bending moment as the blade is extended. Of particular interest is the change in final sweep design variable from L2 to L1, where the L1 result, s wl = 1.9, is not on the limit any-more, and the optimizer seems to have found a maximum for this design variable. This aligns very well with findings in the related surrogate-based study, in which a maximum for the sweep design variable is found at 2 % d/R (Zahle et al., 2018, their Fig. 4). Turning to the twist and chord design variables the optimizer favors the twist design variable over the chord design variable to shed loads. For all optimizations the optimizer favors curvature, and for most optimizations it is at the very limit of what is allowed for the parametrization. All design trends agree with the surrogate-based design study by Zahle et al. (2018).
The resulting shape is visualized in Fig. 11, and an analysis of planform and spanwise forces is given in the following subsection (Sect. 6.2.1). As seen, the produced novel curved tip shape can extend the blade in an efficient manner, and it effectively mitigates the downwash by displacing the tip vortex. The vortex is somewhat diffused and smeared out along the extended tip structure protruding into the out-of-plane region as one would expect to see for a functioning optimization.
With respect to the final two columns in Table 6 showing bending moment constraint and gained improvement, they agree well with expectations: the bending moment constraint is indeed a design-driving load constraint and will to a great extent dictate how far the blade can be extended. In practice, the load level constraint would depend on the structural capacity in the blade, which may well allow for increases in the flapwise moment. A relevant investigation would therefore be to explore the effect of varying this constraint, which however, is beyond the scope of this work. To give the optimizer more freedom to operate future studies will involve relaxing this constraint to explore the resulting shapes produced by the optimizer.
With respect to the gained improvement in mechanical power it can be said that the optimization results from the finer grid levels (SPL2e3cb, SPL2e3hb, SPL1e3cb, and SPL1e3hb) agree very well at 0.42 %-0.44 % improvement, in which the result from L3 is lower (0.33 %). This observation agrees with earlier findings (Madsen et al., 2019) stating that very coarse mesh levels should be used with care in shape optimizations. Furthermore, it should be pointed out that the SPL3e3c improvement percentage from Table 5 is higher than the SPL3e3cb improvement percentage from Table 6 because the design variable bounds for h wl and s wl were changed to 3.5 from 2.0. Thus, one cannot directly compare SPL3e3c and SPL3e3cb.
Finally, while on the topic of improvement it is well to note the importance of regenerating the mesh: as seen, the final shape designs from SPL1e3cb and SPL1e3hb are for completeness evaluated with a regenerated volume mesh to ensure as accurate a result as possible. As a result, the final improvement in mechanical power changes from 0.44 % to 1.12 %. Importantly, the bending moment fraction constraint only changes from 1.000 to 0.9992, signifying that the final shape is still feasible and at the upper limit of the constraint as expected.
The 1.12 % improvement in mechanical power aligns very well with previous findings from the literature (see, e.g., Table 1): Matheswaran et al. (2019) report a 2.5 % power increase for a load-neutral optimization, and Zahle et al. (2018) report an improvement in power of 0.76 % for straight blade extensions and up to 2.6 % improvement for optimized winglet shapes. The surrogate-based approach by Zahle et al. (2018) used the exact same mesh generator and flow solver, meaning that the results should align fairly well. Given that the developed FFD-based parametrization in the present study does not produce true 90 • winglet shapes (see Fig. 15, left, for a comparison) it is reasonable that the expected improvement from the present study should lie somewhere between 0.76 % and 2.6 %, which is also the case.
The reason that mechanical power changes much more than the bending moment when the mesh is regenerated is that the bending moment computation is driven by pressure and friction, whereas the mechanical power computation is based on drag which is much more mesh-quality-dependent. The same phenomenon is seen in standard 2-D CFD airfoil computations in which lift mainly is a projection of pressure forces, whereas the viscous forces are very important for the drag. A small change in force vector will therefore mean much more for the drag than for the final lift.
It should be clearly stated that the change in final mechanical power in Table 6 from 0.44 % to 0.12 % deserves further investigation in future studies. Below, some of the planned investigations are described in further detail.
One could opt to restart the optimization on a regenerated mesh around the final L1 shape to obtain a result that is more independent of mesh quality. Alternatively, one could look into other popular methods such as radial basis functions or the inverse distance method to investigate whether these methods produce meshes that are closer in quality to an actual regenerated mesh. Yet another option is to regenerate the mesh after every optimization step. However, given that the aim is to arrive at a high-fidelity optimization framework that also utilizes adjoint solvers, that is not a desirable avenue to pursue since it would further complicate the gradient computation. A final option worth mentioning is to make a dedicated L3 mesh that is not a result of the grid coarsening described in Sect. 4.3. A better way to generate the L3 mesh would be to grow the mesh directly on L3 using the hyperbolic mesh generator. This approach has proved to be very efficient in the past. The dedicated meshes could be combined with grid sequencing to save further time. Table 6 gives two examples (SPL2e3hb, SPL1e3hb) of how optimizations may save time by starting from a coarser mesh level's result: as an example one could start an optimization on L2 (SPL2e3cb) and then proceed on L1 (SPL1e3hb) using the L2 result as seen in Fig. 12. Importantly, the final shape on a given grid level is the same when starting from a straight blade (SPL2e3cb, SPL1e3cb) or when starting from a result Figure 11. Tip vortex visualization using the shape optimization result from grid level L1. Wind direction is along the z axis. The full rotor image (a) is a superimposed image showing both the baseline rotor (gray) with straight blades and the final rotor shape (red) with the optimized tip shapes. The tip vortex is visualized both for the straight baseline (b) and for the final optimized tip shape (c) to show that it is smeared out and moved away from the rotor plane. from a coarser level (SPL2e3hb, SPL1e3hb) as one would expect for a well-functioning setup. Given that a speed-up of about a factor of 2 is observed this grid sequencing approach in shape optimization seems very advantageous and should be further investigated.

Analysis of the optimized shape
To inspect the optimized blade shape the planform is visualized in Fig. 13, whereafter the resulting spanwise forces are visualized in Fig. 14.
Inspecting the final twist distribution (Fig. 13, upper) a slightly jagged curve is seen where the optimizer tries to continue the upward rising twist curve at the start of the tip, as well as tries to introduce more negative twist towards the very tip of the blade. Reassuringly, these exact same trends were observed in a recent surrogate-based study (Zahle et al., 2018, their Fig. 8c) albeit with a smoother curve. The present study's parametrization has mainly been designed with robustness and mesh quality in mind, focusing on avoiding negative cells on the finest mesh level. Future work will entail experimenting with the parametrization in the attempt to try to arrive at a smoother twist curve. That being said, the resulting spanwise forces are very smooth even for the present setup as can be seen in Fig. 14.
The chord distribution only changes at the very tip of the blade (Fig. 13, lower) where the outermost chord design variable is used to slim the blade. One could attempt to move the second outermost FFD box section towards the tip to activate the remaining chord design variables. This would indeed also give the optimizer more freedom to shape the blade tip. How-ever, that FFD section has been purposefully placed at a distance to the actual blade tip to protect the cell volumes at the very tip, meaning one should use caution not to compromise the mesh quality. Overall, the trend of slimming the blade using the chord design variable as the blade is extended is to be expected for these optimizations.
Finally, the resulting spanwise forces are shown in Fig. 14 where the driving force is placed above the normal force. The entire span is shown to the left, and a zoom-in of the outermost part of the blade is shown to the right.
The driving force is exactly as expected for this type of tip optimization: as visible in the zoomed-in plot (right) the optimizer sacrifices a small portion of power at the start of the tip (90-95 m) but generates more power towards the very tip of the extended blade, resulting in a net increase in torque.
Turning to the normal force for the optimized shape the tip is effectively de-loaded mid-tip (∼ 97 m span) after an initial slight increase at the beginning of the tip. The de-loading allows the optimizer to extend the blade, thus incurring new loads at the very tip.
In sum, a novel curved tip shape has been designed. The results show that with approximately 1 % blade extension, 2 % flapwise displacement, and slightly below 2 % edgewise displacement, one can obtain a 1.12 % increase in power. The design favors as much curvature as is possible with the present parameterization, and it is likely that parameterizations allowing for a curvature closer to a 90 • winglet-like shape would find an even greater power increase. Indeed, the novel shape is able to extend the blade efficiently in the following manner: using a combination of the twist and chord design variables, it effectively sheds loads at the beginning  Table 6 and in Fig. 10. of the tip (Fig. 14) where also a reduced driving force is observed in the process, while extending the blade with a slender chord distribution, minimizing the load impact from the extension. Flow visualizations (Fig. 11) showed that compared to the original tip, the curved tip shape results in a more smeared out tip vortex, thus reducing tip loss. However, further investigations are needed to fully understand how, for example, the addition of sweep at the tip is favorable to a non-swept tip from an aerodynamic point of view.

A comparison across fidelities
Given that the study by Zahle et al. (2018) is closely related to the present work it is rewarding to compare the parametrizations and in particular the difference in maximally allowed curvature by superimposing a tip shape result from the present work (red) on to the final design (gray) from the surrogate-based study by Zahle et al. (2018) as seen in Fig. 15. Inspecting the upper-left plot in Fig. 15 it is evident that the present study cannot produce true 90 • winglet shapes due to concerns of the mesh quality, which is why one should expect to find final improvements slightly lower than in the study by Zahle et al. (2018). Reassuringly, a 1.12 % increase in power is reported in the present study, whereas a 2.6 % increase in power is reported in Zahle et al. (2018).

Future work
As a first step one should further increase the robustness and efficiency of the presented FlowOpt framework. This includes adding enhanced convergence methods and an adjoint method.
An enhanced convergence method would, apart from a general increase in optimization robustness, also ensure that  Resulting tip shape from the present study (red) compared to the surrogate-based design by Zahle et al. (2018) (gray). Both results exhibit maximally allowed curvature by their respective parametrization. As seen, the curvature from the present work (red) is not a true 90 • winglet as in the study by Zahle et al. (2018), signifying that one should expect a difference in final possible improvement (final results are 1.12 % and 2.6 %, respectively). The upper three figure panels show different views in the rotor plane, whereas the lower panel is in the upstream direction. aerodynamically challenging shapes due to, for example, stall, could be handled (see, e.g., Fig. 16), meaning that a significant increase in the design space could be gained as it would only be due to negative cells that one would have to limit the design variables.
The adjoint method allows for a gradient computation time that is independent of the number of design variables, which would be highly relevant for future work. Notice that the adjoint method will most likely not result in a speed-up of what can already be achieved for the presented optimizations with the current framework when using the parallelization techniques in OpenMDAO to compute perturbations for various design variables simultaneously. In fact, given that not all adjoint solvers are as fast as their flow solver counterparts one might see a slight slowdown. However, given that the present study uses the finite difference method which clearly depends on a preceding step size study, the final run times are difficult to predict and discuss without a concrete comparison being done.
Leaving the discussion of possible lower bounds for run times aside, one thing that is certain is that the framework with an adjoint solver will be able to take on new optimization problems altogether since the gradient computation will be independent of the number of design variables. Thus, full Figure 16. A sequence of three different deforming curved tip shapes (red) imposed on the baseline blade (gray) where surface-restricted streamlines can be seen as thin black lines. The flow visualization shows an emerging stall region leading to the impaired flow convergence. Shedding of vorticity is introduced as the blade is further twisted. These shapes were created by the optimizer during an early exploratory optimization. They are easier to solve with, for example, unsteady methods, and the design space was therefore limited enough to avoid the stall region.
shape optimizations using free-form techniques will be manageable -something one can never hope to do with the finite difference method.
Finally, it would be very relevant to validate the numerical results against experimental data in future studies (see, e.g., Barlas et al., 2021a).

Learning outcomes
Based on the above-described detailed high-fidelity shape optimization study, these are the overall findings.
-A thorough literature review showed that there is a lack of high-fidelity shape optimization studies within wind energy in which most works simply are parameter studies.
-Robust mesh deformation is an absolute key feature for high-fidelity shape optimization with y + ∼ 1 and O(10 7 ) cells.
-In order to explore a larger design space (e.g., stall regions) there is a need for enhanced convergence methods which would also bring about an increase in robustness.
-Meticulous setup of the finite difference method will allow for deeply converged design optimization problems even without machine-accurate gradients.
-Although the finite difference method is a viable approach high-fidelity shape design, the authors can conclude based on experience with both direct CFD-based optimizations and surrogate-based optimizations that due to ease of use and a much lower computational cost, one should prefer the surrogate-based approach for optimizations of up to about a dozen design variables if one can accept the drop in model fidelity.

Conclusions
In this study a novel curved tip shape was aerodynamically designed for maximum power using a CFD solver on a 10 MW reference wind turbine with the constraint that the initial steady-state loads should not be compromised. The study showed that a 1.12 % increase in power was possible while satisfying the imposed constraints on loads and geometry. The final curved tip results in a 1 % blade length extension, a 2 % flapwise tip displacement, just below a 2 % edgewise tip displacement, and as much tip curvature as possible within the developed parameterization. Using twist and chord design variables to reduce loads and slim the outermost part of the blade, the novel curved tip shape efficiently extends the blade, and a flow analysis visualized how the final design effectively displaces the tip vortex to mitigate induced drag. A tip design as the one presented could be mounted on already installed wind turbines as a sleeve-like solution or be conceived as part of a modular blade with tips designed for site-specific conditions. Importantly, this study was not aeroelastic but aerodynamic only. Only steady-state flow conditions and normal operation were considered, and a detailed unsteady load analysis based on the IEC (International Electrotechnical Commission) standard was thus not carried out, which would be needed to design the structural geometry of the blade tip and ensure that it is indeed loadneutral.
Turning to the numerical aspect of the study it can be concluded that it is indeed possible to tightly converge direct CFD-based design optimization problems using the finite difference method as long as a meticulous step size study is carried out. However, the finite difference method is found to be extremely expensive on industrial-scale cases (above 14 million cell meshes), and a surrogate-based approach should be favored due to ease of use and implementation as long as a drop in model fidelity can be accepted. Furthermore, the study revealed that robust mesh deformation routines are very important for a successful optimization framework.
A comprehensive literature review on blade tips preceding the optimizations revealed many overall favorable design trends could be identified. Furthermore, it was found that up to a 2.5 %-2.6 % increase in power should be possible for winglet-like load-neutral tip shapes. However, as also pointed out in the literature review there is a void of highfidelity shape optimization results, and this study shows at least one way to set up the essential components in a framework for CFD-based optimization and how to tightly converge the design optimization problems through a meticulous fine tuning of the setup.
Appendix A: Visualization in the rotor plane of the optimized tip shape Figure A1. Baseline (gray) and optimized (red) blade shape from different angles.
Data availability. Data are available upon request to the corresponding author.
Author contributions. MHAM carried out the literature review, wrote the FFDlib, ran all optimizations, and wrote the bulk of the paper. FZ developed PGL and assisted in work related to the parameterization and optimization. Also, FZ assisted in postprocessing and analysis of all results and in the comparison to surrogate-based optimizations. SGH provided meshes and helped guide the study -particularly in the early stage (develop parameterization, ensuring cross-platform compatibility, etc.) -and assisted in the analysis of results. Furthermore, SGH ensured the final parameterization is aligned with the approach used in the DTU Wind Energy FSI framework for future work. TKB guided the project, assisted with expert knowledge on blade tip design, and aligned the present study with the surrounding SmartTip study. NNS, as one of the two original EllipSys3D developers, advised on computational setup and parallelization and implemented the enhanced mesh deformation algorithm used in the present study. He also worked on the EllipSys3D coarse grid solver complexification. All authors took part in writing and editing the paper.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.

Disclaimer.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements.
A special mention of the head of section (ARD), Flemming Rasmussen, DTU Wind Energy, is also warranted. He has through several years supported the development of a high-fidelity design framework without which this study would not have been possible.
Finally, the authors thank PhD student Neil Wu (MDOLab, University of Michigan) and OpenMDAO project lead Justin S. Gray (NASA, Glenn) for assistance with pyOptSparse and OpenMDAO, respectively.
Financial support. This research has been supported by the Innovation Fund Denmark (grant no. 7046-00023B).
Review statement. This paper was edited by Horia Hangan and reviewed by two anonymous referees.