Actuator Line Simulations of Wind Turbine Wakes Using the Lattice Boltzmann Method

The high computational demand of large-eddy simulations (LES) remains the biggest obstacle for a wider applicability of the method in the field of wind energy. Recent progresses of GPU-based (Graphics Processing Unit) lattice Boltzmann frameworks provide significant performance gains alleviating such constraints. The presented work investigates the potential of LES of wind turbine wakes using the cumulant lattice Boltzmann method (CLBM). The wind turbine is represented by the actuator line model (ALM). The implementation is validated and discussed by means of a code-to-code comparison to an 5 established finite-volume Navier-Stokes solver. To this end, the ALM is subjected to both laminar and turbulent inflow while a standard Smagorinsky sub-grid scale model is employed in the two numerical approaches. The resulting wake characteristics are discussed in terms of the firstand second-order statistics as well the spectra of the turbulence kinetic energy. The nearwake characteristics in laminar inflow are shown to match closely with differences of less than 3% in the wake deficit. Larger discrepancies are found in the far-wake and relate to differences in the point of the laminar-turbulent transition of the wake. In 10 line with other studies these differences can be attributed to the different orders of accuracy of the two methods. Consistently better agreement is found in turbulent inflow due to the lower impact of the numerical scheme on the wake transition. In summary, the study outlines the feasibility of wind turbine simulations using the CLBM and further validates the presented set-up. Furthermore, it highlights the computational potential of GPU-based LBM implementations for wind energy applications. For the presented cases, near real-time performance was achieved using a single, off-the-shelf GPU on a local workstation. 15 Copyright statement. TEXT

sensitivity of the blade forces of the ALM to the spatial and temporal resolution of the bulk scheme as well as computational performance.
The objective of this paper is to analyse the wake of a single wind turbine simulated with the ALM and the cumulant lattice Boltzmann method (CLBM), a recently developed high-fidelity collision operator that is particularly suited for high Reynolds number flows (Geier et al., 2015(Geier et al., , 2017b. The main portion of the presented study is based on a code-to-code comparison 5 to a standard finite volume (FV) Navier-Stokes (NS) solver. The primary motivation for this is to extend the aforementioned validation study of this ALM implementation (Asmuth et al., 2019) to the near-and far-wake characteristics. The comparison comprises a laminar and turbulent inflow case, respectively. Furthermore, using the same set-up, we briefly evaluate the impact of a stabilising limiter within the collision operator on the wake characteristics.
To the authors' knowledge, this study constitutes the first application of the CLBM to wind turbine wake simulations. More-10 over, application-oriented studies of the utilized parametrised version of this collision operator (as further outlined in Sect. 2) are generally still limited, see Lenz et al. (2019). Therefore, a further motivation of the study is to show the general potential of wind turbine wake simulations using the LBM and specifically the CLBM. The numerical stability of such simulations using the LBM is not self-evident when using typical, rather coarse grid resolutions.
The remainder of the paper is organised as follows: Sect. 2 provides a brief introduction to the LBM. This includes a 15 description of the underlying numerical concept, characteristics of the cumulant collision model, the use of turbulence models in the CLBM and, lastly, details on the implementation of the ALM. Sect. 3 describes the utilised numerical frameworks and case set-up. In Sect. 4 we present the code-to-code comparison in laminar inflow. A discussion of the results in turbulent inflow is given in Sect. 5. The impact of the third-order cumulant limiter is outlined in Sect. 6. Sect. 7 briefly touches upon aspects of computational performance. Lastly, final conclusions and guidelines for future studies are provided in Sect. 8. 20

The Lattice Boltzmann Method
In the following we provide a brief description of the LBM. This comprises a description of the governing equations as well as more specific topics relevant for the presented studies, such as sub-grid scale modelling and the implementation of the ALM.
For a more detailed description of the fundamentals, the interested reader is referred to the work by Krüger et al. (2016).

25
The LBM solves the kinetic Boltzmann equation, i.e. the transport equation of particle distribution functions (PDF) f in physical and velocity space. PDFs describe the probability to encounter a particle (mass) density of velocity ξ at time t at location x. Solving the kinetic Boltzmann equation thus requires a discretisation in both physical and velocity space. Using a finite set of discrete velocities (referred to as velocity lattice, see, Fig. 1) and discretising in space and time one obtains the lattice Boltzmann equation (LBE) in index notation 30 f ijk (t + ∆t, x + ∆t e ijk ) − f ijk (t, x) = Ω ijk (t, x) , x y z ∆x Figure 1. Schematic of three-dimensional velocity lattices. Coordinate-normal planes marked in yellow. Each vector referring to a discrete velocity e ijk as given in Eq. (1). Velocities of the D3Q19 lattice (Qian et al., 1992) with 19 discrete directions given by orange vectors.
Additional velocity directions considered in the D3Q27 lattice given by red vectors.
where e ijk = (ic, jc, kc) (2) is the particle velocity vector and i, j, k ∈ {−1, 0, 1}. The lattice speed c is chosen such that On uniform Cartesian grids PDFs are therefore inherently advected from their source (black dot in Fig. 1) to the neighboring 5 nodes during one time step avoiding any interpolation in the advection. The collision operator Ω ijk on the right-hand side models the redistribution of f through particle collisions within the control volume. Based on kinetic theory the collision process is modelled as a relaxation of particle distribution functions towards an equilibrium. In the classical and most simple collision model, the single-relaxation-time model (SRT), commonly referred to as lattice Bhatnagar-Gross-Kroog (LBGK) model (Bhatnagar et al., 1954), all PDFs are relaxed towards an equilibrium using a single constant relaxation time τ , viz. 10 The equilibrium distribution f eq ijk is given by the second-order Taylor expansion of the Maxwellian equilibrium where c s is the lattice speed of sound and u and ρ the macroscopic velocity and density, respectively. The weights w ijk are specific to the velocity lattice and ensure mass and momentum conservation of the equilibrium. 15 Macroscopic quantities can generally be obtained from the raw velocity moments of the PDFs with α, β and γ denoting the order of the moment in the referring lattice direction and α + β + γ the total order of the moment.
Following from dimensional analysis the macroscopic mass density ρ is given by the zeroth-order moment m 000 . Analogously, the momentum in x, y and z is obtained from the first-order moment in the referring coordinate direction m 100 , m 010 and m 001 , respectively. The macroscopic velocity and density required for the computation of f eq ijk can thus be obtained locally from the PDFs. Furthermore, starting from a moment expansion of the LBE itself we can show via a Chapman-Enskog expansion that it 5 recovers the (weakly-compressible) Navier-Stokes equations on the macroscopic level. For the sake of brevity details upon the latter are omitted here. A comprehensive overview can be found in Krüger et al. (2016). Nevertheless, it should be noted that with ν being the kinematic viscosity and ω the relaxation rate (He and Luo, 1997;Dellar, 2001).
In summary, the simplicity of the LBM leads to a straightforward explicit algorithm. Numerically, it is realised by decom-10 posing and rearranging Eqs. (1) and (4) into two separate parts. The first becomes the collision step where f * ijk is the post-collision distribution function. And, the second, is the streaming (or propagation) step advecting f * ijk to the neighbouring nodes.

The Cumulant Collision Model
Due to poor numerical stability of the original LBGK model, various alternative approaches have been presented. These mostly relate to the class of multiple-relaxation-time models (MRT), see for instance Lallemand and Luo (2000) and d 'Humières et al. (2002). MRT models transform the pre-collision PDFs f ijk (Eq. (8)) into a velocity moment space. Each moment is then relaxed individually towards a referring equilibrium moment m eq αβγ . The individual relaxation rates of the hydrodynamic moments (up 20 to second order) remain physically motivated with the second-order relaxation rate given by Eq. (4). Relaxation rates of higherorder moments though can be tuned freely. Subsequently, the moments are transformed back into particle distribution space and advected following Eq. (9).
Despite significant stability improvements, several fundamental deficiencies of MRT models render the approach unsuitable for high Reynolds number flows as required for studies of wind turbines in the ABL. Referring to the seminal paper by Geier 25 et al. (2015) such are, among others, the lack of a universal formulation for optimal collisions rates, deficiencies stemming from the rather arbitrary choice of moment space, lacking Galilean invariance and the introduction of hyper-viscosities. Deteriorations of the flow field through local instabilities can be the consequence (Gehrke et al., 2017). To remedy the aforementioned deficiencies Geier et al. (2015) suggest a universal formulation based on statistically independent observable quantities (cumulants) of the PDFs, i.e., the CLMB. After performing the two-sided Laplace-transform of the pre-collision PDFs with Ξ = {Ξ, Υ, Z} denoting the particle velocity ξ = {ξ, υ, ζ} in wave number space, cumulants c αβγ can be obtained as Subsequently, cumulants are relaxed towards the referring equilibrium: Here, c * αβγ denotes the post-collision cumulant and ω αβγ the referring relaxation rate. As shown by Geier et al. (2015), the 5 statistical independence of cumulants unconditionally eliminates the MRT's deficiencies such as the dependency of Galilean invariance and occurence of hyper-viscosities on the choice of relaxation rates.
A simple and widely adopted choice in the CLBM is to set all relaxation rates of higher-order cumulants to one, commonly reffered to as ALLONE cumulant. In this case, higher-order cumulants are instantly relaxed towards the referring equilibrium.
This unconditionally damps all higher-order perturbations providing an inherently stable solution and thereby an extremely 10 robust numerical framework. Numerous studies have shown that the ALLONE CLBM can be readily applied to high Reynolds number flows (see, Geier et al., 2015;Far et al., 2016;Gehrke et al., 2017;Kutscher et al., 2019;Onodera and Idomura, 2018). A further development of the original ALLONE is the parametrised CLBM presented in Geier et al. (2017b). Based on the theory of the so-called magic parameter (Ginzburg and Adler, 1994;Ginzburg et al., 2008) the authors derived a parametrisation to optimise the higher-order relaxation rates. The same authors show that the parametrisation increases the convergence of the 15 CLBM in diffusion to fourth order under diffusive scaling (i.e., ∆t ∝ ∆x 2 ). However, unconditional numerical stability is no longer guaranteed and requires the use of a limiter as outlined in Sect. 2.4.
From a theoretical point of view the parametrised CLBM can arguably be seen as one of the most advanced collision models today, both in terms of accuracy and stability. Nevertheless, the complexity of the collision model makes it more computationally demanding in terms compared to SRT and MRT models. Furthermore, the CLBM is only defined on the 20 D3Q27 velocity lattice as opposed to SRT and MRT models that typically employ D3Q19 lattices. Consequently, it also requires more memory. In addition to the aforementioned theoretical considerations we therefore provide a pre-study on the suitability of other collision models for this application in Appendix A.

Nondimensionalisation
For the sake of simplicity as well as numerical efficiency and accuracy implementations of the LBM are commonly nondi- factor C u = √ 3 u 0 /Ma. It follows that the temporal scaling factor is given by C t = C x /C u , which implies a physical time step ∆t = C t ∆t LB that is inherently proportional to the grid spacing and Mach number. The viscosity in lattice units becomes The order of magnitude of ν LB thus directly depends on the choice of grid resolution and Mach number. In this study we employ the LBM for an incompressible problem. As in the majority of applications, this implies that compressibility effects are assumed to have negligible effects on the flow physics of interests. The Mach number is thus merely 5 required to be small, yet, does not necessarily have to comply with the physically correct value of the problem. For incompressible flows it therefore commonly reduces to a somewhat free parameter affecting numerical accuracy in the incompressible limit (Dellar, 2003;Geier et al., 2015Geier et al., , 2017b, computational demand by means of the time step as well as the magnitude of the viscosity on the lattice level.

10
Early on, LES approaches have been used in LBM frameworks (see, e.g., Hou et al., 1996). The most common choice are eddy-viscosity approaches that are simply adopted from NS frameworks and incorporated by adding the eddy-viscosity ν t to the shear viscosity ν in Eq. (7). Examples thereof range from the standard Smagorinsky model (Hou et al., 1996;Krafczyk et al., 2003) to more advanced models like the wall-adapting local eddy-viscosity model (WALE; Weickert et al., 2010), the shearimproved Smagorinsky model (SISM; Jafari and Mohammad, 2011) as well as dynamic Smagorinsky approaches (Premnath 15 et al., 2009b). Others, on the other hand, suggested LB-specific methods based on the approximate deconvolution of the LBE itself (Sagaut, 2010;Malaspinas and Sagaut, 2011;Nathen et al., 2018).

Implementation of Eddy-viscosity Models
Using a standard constant Smagorinsky model, the eddy-viscosity can be determined locally using the well-known formulation where C s is the Smagorinsky constant, ∆ the filter width (here referring to the grid spacing ∆x) andS the second invariant of the filtered strain rate tensor A clear advantage of the LBM over NS approaches in this context is the local availability of the strain rate tensor. Using the 25 second-order moments or cumulants of the local PDFs, respectively, the components ofS ij can be determined without finite differencing. Further details on the determination ofS ij in cumulant space can be found in Geier et al. (2015Geier et al. ( , 2017b. It should be noted, though, that the strain rates in the CLBM and most MRT models are dependent on the total shear viscosity (ν tot = ν + ν t ) and the bulk viscosity. As opposed to the SRT, whereS ij is only dependent on the total shear viscosity, it is therefore not possible to explicitly determine ν t . Hence, the eddy-viscosity ν t (t) can be computed either explicitly, using 30 ν t (t − ∆t) or, iteratively. Yu et al. (2005), however, showed that the error associated with the implicitness of ν t is generally negligible due to the typically small time steps used in the LBM. We shall therefore refrain from implicitly solving for ν t , in line with similar Smagorinsky approaches in MRT frameworks (Yu et al., 2006;Premnath et al., 2009a).

Stabilising Limiter in the Cumulant LBM
A crucial characteristic of the CLBM is the model's inherent numerical stability. As opposed to many other collision models it does not require the stabilising features of explicit turbulence models, even for under-resolved highly turbulent flows. The 5 stabilising characteristic of the original ALLONE cumulant approach appears rather obvious as it unconditionally resets all higher-order cumulants in each time step. The fourth-order accuracy of the parametrised approach, however, relies on the temporal memory of these higher-order cumulants. Therefore, Geier et al. (2017b) suggest the use of a limiter λ m that is only applied to the relaxation of the third-order cumulants. The relaxation rates of these cumulants, subsequently referred to as ω m , are consequently substituted by where |c m | refers to the magnitude of the referring third-order cumulant. Destabilising accumulation of energy in these cumulants is hereby inhibited as ω ζ approaches 1 for ρλ m |c m |. Nonetheless, the order of the error introduced by the limiter lies well below the leading error of the LBM itself. The fourth-order accuracy of the scheme is thus not affected in the asymptotic limit. Like the ALLONE version, the parametrised CLBM therefore does not require the numerically stabilising features of an 15 explicit subgrid-scale model, yet with a higher order of accuracy. In this study we shall therefore focus on the investigation of the parametrised CLBM.

Implementation of the Actuator Line Model in Lattice Boltzmann Frameworks
The lattice Boltzmann actuator line implementation used in this study closely follows the original description in NS frameworks as presented by Sørensen and Shen (2002). The forces acting on the rotor are determined using the local relative velocity u rel 20 of the referring blade elements along the actuator line. The relative velocity is computed from the sampled velocity in bladenormal (stream-wise) and tangential direction u n and u θ , respectively using where Ω is the rotational velocity of the turbine and r the radial position of the blade element. The local blade force per unit length then reads with e L,D being the unit vector in the direction of the lift and drag force, respectively and c a being the chord length of the referring airfoil section. The lift and drag coefficients C L and C D are provided from tabulated airfoil data as functions of the local angle of attack and Reynolds number. The resulting blade forces are subsequently applied across a volume in the flow field by taking the convolution integral of F with a Gaussian regularisation kernel η , given by where adjusts the width of the regularisation and d is the distance from the centre of the blade element to the point in space where the force is applied. The resulting force is applied at each grid node by simply adding the referring component of ∆t/2 F to the pre-collision first-order cumulants. For the sake of completeness it should be noted that body force formulations 5 generally depend on the collision model. See, for instance, Buick and Greated (2000) and Guo et al. (2008) for a description in SRT and MRT frameworks, respectively.
Differences between ALM implementations in NS and LBM frameworks are obviously small. The latter can be expected given that the link between the model itself and the flow solver is simply made by exchanging information of velocity and body forces. Lastly, it is worth mentioning that the locality of all subroutines of the ALM allows for a perfect parallelisation. 10 The model is therefore efficiently parallelised on the GPU, in line with the general architecture of the utlised LBM solver (see Sect. 3.1) using Nvidia's CUDA toolkit.

Numerical Set-up
In light of the code-to-code comparison the simulations in both frameworks were set-up in the most similar manner possible.
This refers to the grid, the boundary conditions as well as the implementation of the ALM. Nevertheless, certain differences 15 remain unavoidable due to the inherently different numerical approaches. Further details thereupon, as well as the set-up in general, will be given in the following.

The Lattice Boltzmann Solver ELBE
The LBM simulations are performed using the GPU-accelerated Efficient Lattice Boltzmann Environment ELBE 1 (Janßen et al., 2015) mainly developed at Hamburg University of Technology (TUHH). The toolkit comprises various collision models, allows 20 for free-surface modelling  as well as efficient geometry mapping (Mierke et al., 2018). The implementation of the CLBM in ELBE was recently validated by Gehrke et al. (2017Gehrke et al. ( , 2020 and Banari et al. (2020).
Symmetry boundary conditions (zero gradient with no penetration) are applied at the lateral boundaries of the domain, referring to a simple-bounce back with reversed tangential components (Krüger et al., 2016). The velocity at the inlet is prescribed using a Bouzidi-type boundary condition (Bouzidi et al., 2001;Lallemand and Luo, 2003), i.e., a simple bounce- 25 back scheme adjusted for the momentum difference due to the inlet velocity. For the outlet we chose a linear extrapolation anti-reflecting boundary condition as described in Geier et al. (2015).

ELLIPSYS3D
As a NS reference we consult the multi-purpose flow solver ELLIPSYS3D developed at the Technical University of Denmark (DTU) by Michelsen (1994a, b) and Sørensen (1995). The code has been applied to numerous wind power related flow problems and served for several fundamental investigations of the ALM (Sørensen and Shen, 2002;Troldborg, 2008;Troldborg et al., 2010;Sarlak et al., 2015a). 5 The governing equations are formulated in a collocated finite-volume approach. Diffusive and convective terms are discretised using second-order central differences and a blend of third-order QUICK (10%) and fourth-order central differences (90%), respectively. The blended scheme for the convective term was shown to provide sufficient numerical stability while keeping numerical diffusion to a minimum (Troldborg et al., 2010;Bechmann et al., 2011). The pressure correction is solved using the SIMPLE algorithm. Pressure decoupling is avoided using the Rhie-Chow interpolation. 10 Symmetry conditions are applied at the lateral boundaries, equivalently to the LB set-up. The outlet boundary condition prescribes a zero velocity gradient.

Case Set-up
For the evaluation of the ALM we choose one of the most prominent test cases in this context, i.e. the simulation of the NREL 5MW reference turbine (Jonkman et al., 2009). The mean inflow velocity in all presented cases is u 0 = 8m s −1 while the turbine

Code-to-code Comparison in Uniform Inflow
As a starting point we compare the results obtained with the CLBM to the NS reference in uniform laminar inflow. The simplic-5 ity of the case eliminates various uncertainties associated with more complex, yet, possibly more realistic inflow conditions. Also, it becomes more straightforward to analyse the effect of the numerical scheme on the downstream evolution of the wake and particularly the onset of turbulence as recently discussed by Abkar (2018).  Troldborg et al. (2010), the CFL number is thus typically lower than required by the LES to obtain timestep independence. In LBM simulations the timestep is dictated by the Mach number as outlined in Sect. 2.3. A preceding study has shown that the forces determined by the ALM can be significantly more sensitive to the Mach 20 number than the bulk flow depending on the smearing width (Asmuth et al., 2019). Under consideration of this issue we chose Ma = 0.1 referring to CFL = 0.058 for the CLBM cases. This is obviously well below the value required by the ALM, yet inevitable due to the numerical method.
As for the ALM, the blades in all cases are discretised by 64 points. The smearing width is set to 0.078125 D referring to /∆x = {1.25, 1.875, 2.5} for the three different resolutions, respectively. obvious that the dependency of the blade forces on the grid resolution is small in both numerical approaches. The same holds for the differences between the CLBM and the NS solution, even though these are found to be slightly larger than in the former were only found for /∆ ≤ 1. The choice of therefore states a compromise. On the one hand, it ensures numerical stability for the cases with the lowest resolution. On the other hand, is kept reasonably low with respect to the cases with the highest spatial resolution. Note, that unnecessarily large smearing widths would imply larger deviations from the underlying lifting line theory and are therefore undesirable (Martínez-Tossas and Meneveau, 2019). In summary, and in line with other similar codeto-code comparisons (Sarlak, 2014;Sarlak et al., 2015b;Martínez-Tossas et al., 2018), it can be concluded that the agreement 5 in the blade forces is sufficient to facilitate a wake comparison with focus on the behaviour of the bulk scheme. Alternatively, it could be considered to prescribe the body forces along the actuator lines for the sake of a pure wake comparison. Yet, as this study aims for a comparison of the ALM as a whole, including the interaction of the aerodynamic model with the flow solver, this approach is not pursued here.

Wake Characteristics
10 Firstly, we compare the time-averaged cross-stream velocity profiles, given in Fig. 5. Furthermore, Fig. 6 provides a direct comparison of the depicted velocity components of the two numerical approaches at each referring grid resolution by means of the L 2 -relative error along the profile, i.e.
where φ = {ū,v} and n z is the amount of sample points along the profile. 15 It can be seen that the two numerical approaches are in good agreement in the near-wake of the turbine. Up until x = 3 D, differences inū amount to less than 1% while increasing to ∼ 3% at x = 9 D. The differences in the tangential velocity componentv are found somewhat higher with ∼ 5% for x ≤ 3 D increasing to ∼ 10% at x = 9 D. The latter can be related to the fact we also find higher differences in the tangential than in the normal force component as shown in Fig. 4 .
In the near-wake region discussed here, viscous effects usually only play a minor role. This shows, for instance, in a small 20 wake recovery with downstream distance. Also, the rotational velocity does not change significantly. The wake is thus mostly governed by the inviscid flow solution (Troldborg, 2008;Troldborg et al., 2010). Both the NS and the CLBM approach recover the Euler equations at the same order of accuracy. A similar numerical behaviour in this part of the wake should therefore be expected (assuming comparability of all other aspects like boundary conditions and the implementation of the ALM). In light of the motivation for this comparison these results can thus be appreciated. 25 Further downstream (x > 9 D) differences between all compared cases increase significantly. Generally, the vortex-sheet of the near-wake starts to meander and eventually breaks down as the wake transitions to a fully turbulent state. An impression thereof is provided in Fig. 7, showing the downstream evolution of the wake in terms of the contour plots of the instantaneous stream-wise velocity. After the onset of turbulence the wake starts to recover more rapidly while the turbulence slowly decays.
Differences in the velocity in the far-wake both between the two numerical approaches as well as the referring grid resolutions 30 can therefore be related to different downstream positions of the points of transition.
More quantitatively, the break-down of the wake can be observed by means of a drastic increase in the turbulence intensity Ti as depicted in Fig. 8. It shows that the turbulence intensity in all CLBM cases lies at a similar magnitude in the near-wake. At the same time it is notably higher than in the NS cases at the same downstream position. Downstream of x = 6 D it can be seen that Ti increases faster with downstream distance the higher the spatial resolution. Also, it increases earlier in the CLBM than in the NS solutions. In addition to Fig. 8 this process is illustrated in Fig. 9 by means of the stream-wise evolution of Ti at a radial position of r/D = 0.625. It clearly shows the faster increase of Ti at higher spatial resolutions as well as a downstream shift of the build-up in the NS cases.

5
The mechanism of the transition of wind turbine wakes has been extensively described based on ALM simulations, see, e.g., Sarmast et al. (2014). Fundamental studies thereof do, however, mostly use higher spatial resolutions in order to resolve (pseudo-spectral approaches in that study) generally dampen those perturbations less than more diffusive lower-order schemes (referring to second-order collocated finite-volume discretisations, equivalently to the NS reference used here) and thus show a faster growth of turbulence. The same interpretation can indeed be applied to the results shown here. As described in Sect. 2, the parametrisation of the relaxation rates results in a scheme with fourth-order accuracy in diffusion as opposed to the second- order accuracy of the NS finite-volume scheme. At this point we shall briefly comment on the second-order accurate ALLONE CLBM mentioned earlier (Sect. 2). In fact, using this version of the CLBM shifts the transition further downstream when compared to the parametrised CLBM, see Fig. 9. This generally corroborates the aforementioned discussion on the effect of the numerical diffusivity. As for this case, the scheme even appears to be more diffusive than the NS solution. be aware, however, that the diffusivity of the ALLONE CLBM also strongly depends on the Mach number (as opposed the parametrised 5 approach). Nevertheless, a further analysis of the ALLONE CLBM is not the focus of this study and is omitted here for the sake of brevity.
As a last aspect we analyse the one-point turbulence kinetic energy spectra. The spectra shown in Fig. 10   At x = 12 D a pre-transition wake meandering can be seen. The occurrence of this feature is not as confined to a single frequency as the aforementioned blade-passing frequency. Yet, an increased energy level in a frequency band around f m ≈ 0.025Hz (and its higher harmonics) can be observed in all cases. It was illustrated in Fig. 7 that the meandering starts to occur 10 at different positions downstream depending on the resolution and numerical approach. It then steadily increases until the wake becomes fully turbulent. The amplitudes at f m therefore differs depending on how far upstream the meandering started to build up. Also, it again shows that the meandering and subsequent transition occurs earlier in the CLBM cases. Additionally, the signature of the blade passage is still visible in the lower-resolution CLBM cases. This is not the case for the NS reference, despite the smaller meandering at this downstream position. In line with the observations made earlier, this aspect might relate to a higher numerical dissipation of the NS scheme.
Further downstream at x = 24 D the wake is fully turbulent in all CLBM cases, characterised by a sub-intertial range with a typical -5/3-slope. This is also the case for the NS solution with ∆x = D/32. Here, however, the meandering is still more visible due to the later start of the transition of the wake. Also, when comparing both approaches at the highest spatial resolution 5 (bottom right in Fig. 7) it shows that the sub-inertial range of the CLBM approach reaches to higher frequencies. In accordance with that, it appears that the CLBM does indeed resolve smaller turbulent structures, as shown in the contour plot of the Q-criterion (Fig. 11).

Code-to-code Comparison in Turbulent Inflow
Laminar inflow cases allow for a good comparison of fundamental numerical aspects as discussed in Sect. 4. Nevertheless, the 10 case itself remains rather academic as atmospheric inflows are mostly turbulent. Furthermore, Sect. 4 has shown that a direct comparison of the far-wake can be difficult due to the different downstream positions of the laminar-to-turbulent transition of the wake. A turbulent inflow generally accelerates the transition while reducing the dependency of the point of transition on the numerical diffusivity of the scheme. A complementing comparison in turbulent inflow will therefore be presented in the following. For the sake of brevity we limit the discussion to cases with the highest spatial resolution ∆x = D/32. Apart 15 from the introduction of turbulence at the inlet both numerical set-ups remain unchanged. Also note, that the mean resulting blade loads exhibit no notable difference towards the laminar inflow case. Additional discussion beyond Sect. 4.1 is therefore omitted.

Synthetic Turbulence Generation at the Inlet
At the inlet we prescribe homogeneous isotropic turbulence (HIT) based on the von Kármán energy spectrum. The threedimensional field of velocity fluctuations is generated based on the method developed by Mann (1998) using the open-source code TUGEN by Gilling (2009). As we are only interested in HIT the model's shear parameter Γ is set to zero. The length scale of the spectral velocity tensor is chosen as L = 40m = 0.317 D. The mean turbulence intensity is scaled via the coefficient 5 α 2/3 = 0.01. The resulting Ti of the turbulence field measures Ti = 0.028. The length of the turbulence field in the stream-wise direction measures 24576m. Following Taylor's frozen turbulence hypothesis the field is advected with u 0 . The turbulence field is consequently recycled after 6.72 domain flow-through times. The lateral dimensions of the field are set to 1536m (referring to 12.19 D). Since we only use a cross-section of 6 D × 6 D we ensure zero correlation of the velocity fluctuations between the lateral boundaries of the domain. The spatial resolution of the field is 8192 grid points in the stream-wise direction and 64 grid 10 points in the lateral directions. In both numerical approaches the velocity fluctuation is superimposed with the mean inflow velocity u 0 and applied at the inlet. turbulence intensity of 2.3% in both approaches which is slightly lower than the one of the synthetic turbulence field. Such discrepancies have been discussed earlier and are commonly counteracted by scaling a given turbulence field if a desired 15 turbulence intensity is to be matched (see, e.g. Olivares-Espinosa et al., 2018;van der Laan et al., 2019). Some possible explanations of this issue are given by Gilling and Sørensen (2011). Among others, they argue that the discrete representation of the otherwise continuous turbulence field can lead to noticeable discontinuities when being differentiated with low-order schemes. Directly after the inlet the NS solution shows a small increase in Ti followed by a continuous decay throughout the entire domain. The turbulence intensity in the CLBM solution initially drops behind the inlet. However, the subsequent 20 decay up until x = 12 D is lower than in the NS solution. The decay rates of the two approaches seem to align only at the far end of the domain. As a result, the turbulence intensity at the turbine position differs by ∆Ti = 0.0005 while the maximum difference further downstream amounts to ∆Ti = 0.0027. A detailed analysis of the rather fundamental aspects related to these discrepancies goes beyond the scope of this paper. After all, the observed differences remain small enough not to be significant when compared to the turbulence related to the wake flow, as shown later. Fig. 13 depicts the spectra of the turbulent kinetic energy at the turbine position. Chiefly, it shows that the CLBM exhibits a sub-inertial range extending to higher frequencies than the NS solution, similarly to the far-wake turbulence found in laminar inflow (see Fig. 10).

Wake Characteristics
Analogously to Sect. 4, we firstly compare the cross-stream profiles of the mean velocity in Fig. 14. In the stream-wise velocity componentū we find an excellent agreement of the two solutions. When compared to the laminar cases discussed before this not only applies to the near-wake but the entire domain. The difference inū between the cross-stream profiles of the two approaches for x ≤ 12 D amounts to less than 1% in terms of the L 2 -relative error norm as shown in Fig. 15. While steadily 10 increasing with downstream distance, the maximum discrepancy measures 1.6% at x = 24 D. Be aware, that the laminar inflow cases only exhibited similar agreements in the near-wake.
Profiles of the turbulence intensity are shown in Fig. 16. Similarly to the velocity, differences between the CLBM and NS solutions are small. Most importantly, it can be observed that the transition of the wake is triggered at very similar downstream positions. This also explains the significantly better agreement in the velocity. After all, most differences observed in laminar 15 inflow are related to the different downstream positions of the laminar-to-turbulent transition.
In the case discussed here the transition is dominated by instabilities introduced by the ambient turbulence. As opposed to the transition in laminar inflow, the impact of the dissipative characteristics of the numerical scheme here appears to be subordinate, if not negligible. Without imposed turbulence, perturbations triggering the transition grow within the wake itself starting from infinitesimal magnitudes as outlined in Sect. 4. Hence, the transition mainly depends on the growth of such 20 perturbations and eventually the point where they reach a critical magnitude. Consequently, the transition is increasingly delayed the higher this growth is dampened by the numerical dissipation. In contrast, the imposed turbulence states a finite-size perturbation that affects the wake immediately from the rotor plane downstream independent of the numerical scheme and its The spectra of the turbulent kinetic energy at three different downstream positions are provided in Fig. 17. As in the laminar case, a distinct peak at the blade-passing frequency f B and its higher harmonics can be observed in both approaches in the near-wake (x = 1 D). From the velocity profiles in Fig. 14 it can be inferred that the transition of the wake occurs between x = 3 D and x = 6 D, characterised by the change from a typical near-wake to a far-wake Gaussian profile. In the spectra this is reflected by an overall increase in the energy level across all resolved frequencies. Also, the signature of f B is no longer visible. Moving further downstream (x = 18 D) the overall turbulent kinetic energy decreases due to the continuous decay of both ambient and far-wake turbulence. When compared to the previous position, the energy content at smaller scales increases slightly relative to the larger scales. The latter relates to the continuous break-down of the turbulent structures of the wake along 5 the energy cascade. The relative energy increase at higher frequencies appears to be more pronounced in the CLBM solution.
Again, this might relate to the higher dissipation found in the NS solver inducing an earlier cut-off in the sub-inertial range as discussed earlier.
Lastly, we shall comment on the small differences in the ambient turbulence shown earlier. Based on the above elaborations one might expect a more notable impact on the wake characteristics. With regards to this we refer to the study by Sørensen et al. with the given inflows, lying well within the range of the differences observed here.
6 Impact of the Third-order Cumulant Limiter

15
A further aspect of the CLBM to be discussed is the impact of the limiter of the third-order cumulants described in Eq. (15).
The main motivation behind the limiter is to provide a damping of high-wave number perturbations in the CLBM in order to ensure numerical stability. Geier et al. (2017b) showed theoretically and by means of a decaying shear-wave and Taylor-Green vortex that the use of the limiter does not affect the asymptotic order of accuracy of the scheme. Investigations of the effects of the limiter in more applied high-Reynolds-number cases are, however, not available to date. Geier et al. (2017a) and Lenz et al. 20 (2019) presented applications of the parametrised CLBM, yet both did not touch upon the topic discussed here. Then again Pasquali et al. (2017) state that they chose suitable values for λ m manually, close to the stability limit and case-dependent.
Both the effect of λ m on turbulent flows, as well as criteria to choose adequate values thus remain open questions. At the same time, some authors refer to the parametrised CLBM and also the ALLONE as implicit LES (cf. Far et al., 2017;Lenz et al., 2019;Nishimura et al., 2019). However, the latter is solely supported by the fact that the CLBM remains numerically stable in 25 under-resolved turbulent flows without explicit turbulence models (as opposed to many other LBM collision operators). To the authors' knowledge, a full understanding of the dissipation behaviour associated with the limiter (or the ALLONE), especially in under-resolved flows, is still lacking. This again, though, would be clearly required to fully replace an explicit SGS-model.
In the code-to-code comparison the limiter was practically switched off for the sake of comparison. Hence, numerical stability was also solely provided by the Smagorinsky model. Motivated by the lack of experience with the use of the limiter Contour plots of the mean stream-wise velocity and turbulence intensity are shown in Fig. 18. While the mean velocity in the region close to the turbine is almost unaffected by the choice of λ m , the evolution of the turbulence intensity and ultimately the point of transition change drastically. With λ m = 10 0 , Ti grows significantly, closely behind the turbine. At only 3 D downstream the wake is highly turbulent. With λ m = 10 −1 the wake characteristics only change marginally. Increasing λ m from 10 −1 to 10 −2 , however, delays the transition considerably. This implicitly shows that the order of magnitude of the third-5 order cumulants in crucial regions of the wake lies within this range, which can be deduced from Eq. (15). When choosing λ m = 10 −2 the limiter dampens the third-order cumulants considerably when compared to the optimised relaxation rates.
Moreover, the far-wake distribution of Ti more closely resembles taht of the Smagorinsky case than with lower λ m . Turbulent perturbations of the wake do, however, grow over a longer fetch than in the Smagorinsky case, starting in the near-wake.
Moreover, it should be noted that increasing λ m also increases the amplitude of small scale fluctuations in the ambient flow 10 field. Among others, these are likely to be related to acoustic reflections of small scale turbulence on the domain boundaries and/or spurious numerical oscillations. Partially, these can be seen in the Ti contour plots (Fig. 18) upstream of the turbine for the two higher λ m values. More specifically, 1 D upstream of the turbine, we find Ti = O(10 −4 ) for λ m = 10 0 . In comparison, the CLBM case with Smagorinsky model as well as the NS reference discussed earlier exhibit a magnitude that is two and three orders of magnitude lower, respectively. Referring to the discussions of tip-vortex stability by Ivanell et al. (2010) or Sørensen one presented here did not comment on this topic. Deskos et al. (2019), on the other hand, found that the mutual inductance of tip-vortices can be severely disturbed if the diffusivity of the scheme is too low.
The presented case study underlines that the impact of the limiter is sufficiently large to arbitrarily tune the scheme's dissipativity over a wide range. Hence, the choice of the limiter in underresolved flows is by no means irrelevant despite the negligible influence on the asymptotic order of accuracy. On the other hand, the limiter conceivably states a measure to achieve 5 implicit LES characteristics with the CLBM. As mentioned earlier, though, this clearly requires a more systematic understanding and subsequent tuning. Without the latter, the use of classical well-documented SGS-models might remain more practical.
Ultimately, they also provide numerical stability while choices for model parameters can build on well-documented experience.

Computational Performance
We initially outlined that the main motivation for the use of the LBM in this context is the method's superior computational 10 performance. Nevertheless, a detailed discussion is not the focus of this paper. For further details on this topic we refer to our previous study (Asmuth et al., 2019) as well as numerous other publications, see, for instance, Schönherr et al. (2011), Obrecht et al. (2013), Januszewski and Kostur (2014), Hong et al. (2016) or Onodera et al. (2018). In brief, we shall remark that all simulations with the CLBM ran with an average of 1050 MNUPS (Million Node Updates Per Second). A similar single-GPU performance on uniform grids was recently reported by Lenz et al. (2019). For the cases discussed in this study 15 this refers to a wall time of 524s per domain flow-through time on a single Nvidia RTX 2080 Ti on a local workstation.
Putting this into perspective, the wall time per flow-through time of the NS case amounts to 5028s. The latter ran on 1044 CPU cores (Intel Xeon Gold 6130) and thus amounts to 1463 CPUh. A last interesting aspect to remark upon is the ratio of simulated real time to computation time r r2c = ∆t real /∆t comp . The topic was recently addressed in the context of urban flows (Onodera and Idomura, 2018;Lenz et al., 2019) as well as for atmospheric boundary layer flows and wind energy applications 20 (Bauweraerts and Meyers, 2019). A ratio of r r2c > 1 would enable the use of LES for real-time forecasts of, e.g., urban microclimates or wind farm performance and loads. For this specific LBM case we obtain r r2c = 0.902. For the NS approach we get r r2c = 0.094. Despite this obviously only being a case study, real-time LES of wind farms with affordable hardware appears to be possible.

25
The cumulant lattice Boltzmann method was applied to simulate the wake of a single wind turbine in both laminar and turbulent inflow. The turbine was represented by the actuator line model. The presented model was compared against a well-established finite volume Navier-Stokes solver. It was shown that the cumulant lattice Boltzmann implementation of the actuator line model yields comparable first-and second-order statistics of the wake. More specifically, a very good agreement of the two numerical approaches was found in the near-wake in laminar inflow, with differences amounting to less than 3% in terms of the wake could be related to the different numerical diffusivities of the schemes building onto previous similar code-to-code comparisons (Sarlak, 2014;Sarlak et al., 2016;Martínez-Tossas et al., 2018). On the other hand, the comparison in turbulent inflow showed an excellent agreement of the two solutions in both near-and far-wake. Here, differences in the numerical schemes were found to be subordinate as the wake characteristics were dominated by the imposed turbulence. The latter manifested in differences in the wake deficit of less than 1% in large parts of the domain. 5 An additional case-study investigated the impact of the third-order cumulant limiter in laminar inflow. It was shown that the choice of the limiter largely affects the dissipativity of scheme. Likewise, the tuneability of this dampening characteristic clearly shows the potential to be used in a more systematic way and might be exploited as an implicit LES feature. Yet, this requires further fundamental investigations in order to understand and calibrate it or even develop procedures to determine optimal values dynamically. As of now, the use of explicit eddy-viscosity SGS-models thus appears more practical despite a 10 small computational overhead.
As for future applications of the lattice Boltzmann method to more realistic wind-power-related flow cases, the following conclusions can be drawn. First and foremost, the presented study underlines the suitability of the cumulant lattice Boltzmann method for the simulation of highly turbulent engineering flows. The crucial advantage over other collision operators is the superior numerical stability of the method. No other collision operator initially tested in this study was found to be sufficiently 15 robust using the given grid resolutions. The tested single-and multiple-relaxation-time models therefore do not appear suitable for LES of entire wind farms where higher spatial resolutions are not feasible and viscosities on the lattice scale consequently small. The advantages of the parametrised cumulant clearly render it as a preferable collision model for wind turbine simulations and presumably other atmospheric flows. Application-oriented studies of the model are so far limited to this work and the recent study by Lenz et al. (2019). Further investigations of the model are therefore clearly required. This applies espe-20 cially to wall-bounded turbulent flows like atmospheric boundary layers that require the use of wall-models. When compared to Navier-Stokes-based LES, the experience with wall-models in the LBM in general is limited to only a handful of studies to date (Malaspinas and Sagaut, 2014;Pasquali et al., 2017;Wilhelm et al., 2018;Nishimura et al., 2019). More specifically, simulations of wall-modelled atmospheric boundary layers employing Monin-Obukhov-type near-wall treatments have not been reported at all to the authors' knowledge. The latter ultimately remains a crucial step towards the simulation of wind 25 farms using the LBM. Nevertheless, in summary, the presented work underlines the great potential of wind turbine simulations using the LBM. Without suffering losses in accuracy, the computational cost can be significantly reduced when compared to standard NS-based approaches. Considering the reported run-times, even an overcoming of the LES-crisis, i.e. the inability to obtain overnight LES solutions for industrial applications (cf. Löhner, 2019), appears possible in the context of wind farm simulations. Appendix A: Pre-study on the Stability of Collision Operators Generally, the choice of collision operator and lattice should consider stability, accuracy, memory demand and performance.
Based on the seminal works by Geier et al. (2015Geier et al. ( , 2017b the CLBM can undoubtedly be considered superior in terms of the former two. Utilising a D3Q27 lattice though eventually implies an increased memory demand of about 40%. Also, the higher complexity of the CLBM eventually renders the model computionally more expensive.

5
As for this specific set-up, satisfactory stability could only be achieved using the CLBM despite the use of the Smagorinsky model (for the referring formulations in moment space applied to the SRT and MRT models, see Yu et al. (2005Yu et al. ( , 2006). The SRT generally became unstable after only a few time steps. The utilised MRT model (see, Tölke et al., 2006), on the other hand remained mostly numerically stable. Yet, unphysical oscillations in the turbulent regions of the flow led to significant degenerations throughout the entire domain. 10 In addition to stability issues, the isotropy of the D3Q19 lattice was shown to be insufficient. Fig. A1 shows three exemplary cross-stream velocity contours at different downstream positions. At x = 3 D, small deviations from the expected axisymmetric profile can be observed for the MRT. Further downstream a more cross-like structure develops that deviates severely from an expanding circular wake. A similar behaviour on D3Q19 lattices has been described earlier by Geller et al. (2013) and Kang and Hassan (2013) when simulating circular jet and pipe flows, respectively. Both argue that the missing velocity vectors of the 15 D3Q19 lattice cause violations of the rotional invariance of axisymmetric flows. Furthermore, White and Chong (2011) remark that this behaviour might only be obvious when simulating simple axisymmetric flows, possibly with analytical reference solutions. Nevertheless, deteriorations of non-axisymmeric real-world problems should also be anticipated, yet, might be harder to examine. This observation should thus also be taken into account when simulating wind turbines in more realistic, sheared, turbulent inflows. 20 Usually, stability issues as described above can be remedied by using smaller grid spacings. As we consider the latter unfeasible for the described applications, we refrain from further investigations thereof at this point. Moreover, White and Chong (2011) also show that the lacking order of isotropy of the D3Q19 lattice can only partially be reduced under grid refinement. The use of the D3Q27 lattice and the CLBM thus appears as the most suitable choice for the investigation of wind turbine wakes. Lastly, it should be pointed out that performance differences between the investigated collision operators were drafted the original paper. HOE and SI contributed to the conceptualisation of the study, discussion of the results and revision of the manuscript.
Competing interests. The authors declare no coflict of interest.
Acknowledgements. The authors would like to thank Martin Gehrke (TUHH) for the productive colaboration on the implementation and 5 testing of the parametrised cumulant LBM. Also, the many fruitful discussions of the case set-up and results with Niels N. Sørensen (DTU) are highly appreciated.
ELLIPSYS3D simulations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at NSC.