This paper describes a method to identify the heterogenous flow characteristics that develop within a wind farm in its interaction with the atmospheric boundary layer. The whole farm is used as a distributed sensor, which gauges through its wind turbines the flow field developing within its boundaries. The proposed method is based on augmenting an engineering wake model with an unknown correction field, which results in a hybrid (grey-box) model. Operational SCADA (supervisory control and data acquisition) data are then used to simultaneously learn the parameters that describe the correction field and to tune the ones of the engineering wake model. The resulting monolithic maximum likelihood estimation is in general ill-conditioned because of the collinearity and low observability of the redundant parameters. This problem is solved by a singular value decomposition, which discards parameter combinations that are not identifiable given the informational content of the dataset and solves only for the identifiable ones.

The farm-as-a-sensor approach is demonstrated on two wind plants with very different characteristics: a relatively small onshore farm at a site with moderate terrain complexity and a large offshore one in close proximity to the coastline. In both cases, the data-driven correction and tuning of the grey-box model results in much improved prediction capabilities. The identified flow fields reveal the presence of significant terrain-induced effects in the onshore case and of large direction and ambient-condition-dependent intra-plant effects in the offshore one. Analysis of the coordinate transformation and mode shapes generated by the singular value decomposition help explain relevant characteristics of the solution, as well as couplings among modeling parameters. Computational fluid dynamics (CFD) simulations are used for confirming the plausibility of the identified flow fields.

Understanding and modeling wind farm flows is one of the key grand challenges
facing wind energy science

Within this very wide field, the present work tries to explore the idea of using the whole wind plant as a distributed sensor that, interacting with the atmospheric boundary layer, responds to it and, consequently, effectively measures the flow developing within its own boundaries. Exploiting this idea, can data from a wind plant be used to detect significant features in the flow, in support of an improved understanding of key driving phenomena? Can the same data be leveraged to derive more accurate flow models? Finally, how can the knowledge already encapsulated in existing models be combined with the information contained in the data?

These questions are explored here in relation to engineering wake models.

Within the plethora of wind farm flow models that have been developed,
engineering wake models have carved an extremely successful niche for
themselves at the lower end of the fidelity spectrum. In fact, they now
support a wide range of use cases, from wind plant design to wind farm flow
control

However, like all models, engineering wake models are not an exact copy of
reality and are unable to precisely match field measurements. For example,
the comprehensive survey of

A first reason for the mismatch between predictions and reality is the
unsuitable calibration of the model parameters. For example, wake recovery is
affected by atmospheric conditions and terrain roughness

Additionally, engineering wake models approximate (but do not exactly resolve) only some (but not all) physical processes that take place in and around a wind plant.

For example, the influence of terrain orography is difficult to capture for
onshore wind farms, and high-fidelity models may be necessary to adequately
resolve all flow effects (for example, see

The interaction of a wind farm with the atmospheric boundary layer (ABL) is another extremely complex process, which has not yet always been properly accounted for in engineering models. In general, several flow regions can be distinguished around and within a wind farm

Yet another poorly understood and modeled effect is the way wakes interact,
mix, and merge together. Current models range from simple superposition laws,
e.g., the sum of squares freestream superposition (SOSFS) method

There are three main approaches to deal with the deficiencies of current engineering wake models.

The first is to eliminate the resolved part of the model altogether and use
a black box to learn the complete system behavior from data. Indeed,
data-driven machine learning methods are a growing trend in many fields,
including fluid mechanics

The second approach is to improve a (white) model by tuning its parameters.
For example,

The third possible approach is to directly acknowledge the hybrid nature of
the problem. This means augmenting the resolved model with parametric
corrections that represent the unmodeled physics, resulting in the so-called
grey-box approach. Data are used to tune the parameters of the resolved model
and to learn the ones of the corrections. These two processes of tuning and
learning are clearly intimately linked, and should be conducted
simultaneously. In the framework of wind farm flows, the approach of
simultaneous tuning and learning (STL) was first proposed by

In the present paper, the STL approach is extended by augmenting a wake model
with a heterogeneous background flow, which can be considered a
correction to the normally assumed uniform ambient flow. In this way, the whole
wind plant becomes a distributed sensor that “feels” the flow that develops
within its boundaries; this has suggested the name of “farm as a sensor” to
this approach. The similar concept of the “the turbine as a sensor” has
been developed by the senior author and his collaborators, where a wind
turbine is turned into a sensor that “feels” the inflow at its rotor disk;
interested readers can refer to

The paper is organized as follows. Section

Within a wind plant, the scalar wind speed field

Engineering models such as FLORIS

This work considers the steady-state behavior of wind plants for given
ambient and operating conditions. Consequently, the wind field model includes
only the component

The wind speed field can be causally decomposed as

The second term

The third and last term

The term

In summary, the causal decomposition of the flow speed expressed by
Eq. (

A similar causal decomposition is assumed for wind direction

The functional dependency of the heterogeneous correction term

It is the primary goal of this paper to present a method for computing a
best estimate of the flow fields expressed by
Eq. (

The spatial heterogeneity of field

The parameterization of the

Alternatively, the heterogeneous field

The

In general, FLORIS and similar models are characterized by the following
functional dependency:

Notice that, in addition to the “native” parameters of the FLORIS model,
additional extra parameters can be used to augment the model with ad hoc
correction terms.

Stacking the parameters for the heterogeneous flow correction and the
parameters for wake model tuning, the final vector of the to-be-identified
parameters is

Following a classical approach

This dilemma is overcome by performing the identification through a singular value decomposition (SVD). The SVD-supported identification approach is general and can be applied to various problems: for example,

A steady wind farm flow model can be written as the following nonlinear functional
expression:

The observability of the parameters can be gauged by the inverse of the
Fisher information matrix

An important result of MLE theory is that the

This important result is used to set an observability threshold

Choosing a lower threshold implies that fewer parameters are deemed trustworthy and are retained in the solution; this might reduce the quality of the solution if meaningful terms are discarded, but it will also reduce the computational cost and will typically ease convergence. On the other hand, picking a higher threshold has the opposite effect.

For guiding the solution, it is useful to enforce bounds on the parameters in
the form

The result section is divided in two parts, each examining a specific site.
The Sedini and Anholt wind farms represent a typical mid-size onshore and
large offshore case, respectively. These two plants are characterized by
different wind climates and dominating flow effects, whose very distinct
features are useful for assessing the generality of the proposed STL method.
Furthermore, the quality and quantity of SCADA (supervisory control and data acquisition) data typically differ from site to site on account of different turbine types, acquisition systems,
sampling frequencies, failure rates, miscalibration of sensors, and several
other effects; here again, the use of different plants can help verify the
robustness of a method that operates based on operational data of such
variable quality and quantity. An overview of some key characteristics of the
two wind plants is provided in Table

Comparison of the main characteristics of the Sedini and Anholt wind farms.

Layout and flow correction grid for the Sedini

The Sedini wind farm is located in the north of Sardinia, a large island off
the western coast of Italy. A subgroup of turbines was the subject of a wind
farm flow control test campaign, using both wake steering and axial induction
control. Because of this previous activity, the behavior of the farm had been
already examined with different wake models

The Anholt offshore wind farm is located about 20 km east of the Danish coast in the Kattegat, a shallow sea between the Jutland peninsula and the west cost of Sweden. The presence of the Jutland coastline influences the western inflow to the farm, creating a gradient that was already investigated by

Figure

Layout of the Sedini wind farm. The colormap shows the height
difference with respect to the average terrain elevation. A bold identifier
indicates turbines used to determine the average wind direction

For this study, SCADA data at 10 min sampling frequency were made available
for the years 2015 and 2016, whereas meteorological mast measurements were
made available for the years 2008–2010. Since the two time periods do not
overlap, the mast data were used only to analyze the general climate at the
site

The data were first cleaned of entries where turbines were not reporting to
the acquisition system. Next, for every timestamp, the average wind direction

Because short-term fluctuations

Additional ambient conditions such as TI, shear, and density – although certainly significant for wake behavior and turbine performance and loading – cannot be typically derived in a straightforward manner exclusively from the turbine SCADA data.

The output vector

Likelihood cost function

The STL parameter vector

For the data-driven learning of the heterogeneous background velocity

Although orography-induced effects may in principle result in the
heterogeneity of the wind direction at a site, such an effect could not
be observed at Sedini based on the available dataset. On the other
hand, a global correction of the wind direction proved to be necessary
and very beneficial for the quality of the results. This was achieved
by using a single correction node

The identification of a heterogeneous TI field

The wake model behavior is captured by the wake parameter vector

The error covariance matrix was assumed to be known a priori and diagonal,
i.e.,

The choice of the observability threshold ^{®} Core™ i7-9700 CPU desktop. However, it should be noted that processing time is not a very meaningful metric because the present code was not optimized for speed and processing power improves rapidly over time, quickly rendering execution times obsolete. Figure

Figure

Variance of all orthogonal parameters

As previously mentioned, the wind direction was corrected over the entire
domain by the value

Table

Results of the wake model tuning, with the initial baseline
parameters

Matrix

An examination of the rotation matrix

Inspection of the reduced matrix suggests a few observations. First, the
directional correction

Decrease in the normalized subsector cost function when activating one orthogonal parameter at a time in the sequence

Dominating eigenshapes of the flow corrections

To better understand the nature of the corrections

Figure

The first eigenshape, Fig.

The corrections identified by the proposed method describe a direction-dependent heterogeneous flow field that very significantly improves the matching of the FLORIS model predictions with measured operational data. However, is this identified flow field a reasonable approximation of the true flow over the terrain at this site, or is it just a mathematical correction that happens to improve the results? A definitive answer to this question is probably difficult to give with the limited data and information available. However, a qualitative and quantitative verification of the plausibility of the data-learned field can be obtained by comparing it with an independent CFD simulation of the flow over the terrain.

To perform this plausibility check, Reynolds-averaged Navier–Stokes (RANS) simulations were conducted for the values

To quantify the similarity between the learned and CFD-computed fields, their spatial correlation is calculated as

Spatial correlation of learned and CFD-computed velocity speedups (

Figure

Comparison between the learned

Corrections can be learned with respect to an initial heterogenous flow field, instead of a uniform one (i.e., utilizing Eq.

Initial CFD-computed heterogeneous flow

Results indicate that the identified wind direction

As shown in the next section, the use of a CFD-computed initial flow field
offers quantitatively no visible error reduction for power when compared to
the simpler option of starting from an initial uniform background flow.
Indeed, the solution shown in Fig.

Reduction in the overall error by the activation of different correction types

At the convergence of the estimation process, the remaining error is defined as

Four different cases for the analysis of learned corrections on the power matching error.

Figure

Figure

Figure

Normalized measured and calculated power for the two turbines A2-12

Figure

A similar situation is observed for turbine A1-E7 in the general overview
plot of Fig.

The color of the frames of each subplot of Figs.

Layout of the Anholt wind farm with turbine identifiers and
wind direction frequency

Figure

Tuning and learning were performed using the same procedures as in the Sedini
case. However, the two cases are significantly different, impacting the
relative importance of the heterogeneous corrections terms of
Eq. (

For the present analysis, SCADA data at 10 min sampling frequency were available from January 2013 until July 2015. The overall problem setup, solution methods, and data preprocessing were the same used for the onshore plant, as described in Sect.

In addition to providing a comparison for learned coastline-induced effects,
the WRF time series were used to filter the dataset for atmospheric
stability. Following

Unstable and stable observations were separated, creating two distinct
datasets. Since turbulence intensity could be not inferred from the available
SCADA data, it was assigned based on stability, using

The STL parameter vector

Similarly to the Sedini case, the wind speed corrections were defined
as Reynolds-independent speedup factors

For the wind direction, a single parameter

The wake model tuning parameters

The influence of the Danish coastline about 20 km west of the Anholt wind
plant has already been analyzed by

Figure

WRF-computed

Figure

Speedup factors at the first row of turbines for westerly
winds from 240

The speedup factors from

The speedup factors for

WRF simulations were run without the presence of the turbines, which therefore cannot include plant-induced effects. On the other hand, STL results are based on measured turbine data, which automatically include such effects.

These results show that the STL method is capable of detecting the inflow
heterogeneity at this site. A similar capability was achieved – albeit in a
less general setting than here – by

Plant-induced flow effects account for various complex, often interrelated
phenomena. At a macroscopic level, a wind farm acts similarly to a local
patch of increased surface roughness in its encounter with the atmospheric
flow, leading to the development of an internal boundary layer

The present method employs a correction term

Learned speedup fields for various wind directions for
STL

The ability to explain the results of data-driven approaches remains a topic of central importance for future research. A possible way to address this need is to resort once again to a grey-box approach by embedding within FLORIS additional models for blockage, local accelerations, gravity waves, and other effects, as well as tuning their parameters based on data, similarly to what is done here for the wake models. The estimated background flow would at that point represent corrections to those models and be in charge of accounting for their deficiencies and any missing physics. This possible extension of the present formulation is not considered further, and the present study is limited to the identification of a “catch-all” correction term without the pretense of being able to fully explain what has been identified. Although the explanation of this term might not be complete, it is still capable of correcting the baseline FLORIS model, substantially improving its match with respect to the measurements.

Notwithstanding these limitations of the present study, an effort was made to
pragmatically separate some effects as much as possible. Specifically,
orography-induced effects were reduced by considering northern and southern
wind directions, where the influence of the neighboring coastlines is
minimal. Conveniently, for these wind directions, the Anholt wind plant
presents a significant streamwise depth, which facilitates the onset of
deep-array effects. This agrees with the findings of

Figure

First, the wind speed fields that are identified for stable conditions deviate significantly from those obtained in unstable conditions. In fact, the flow field seems to have a higher degree of heterogeneity in stable conditions, and, as expected, intra-plant effects appear to be generally more pronounced.

Second, speedups at the edges of the wind farm can be observed for the
directions 0, 30, 180, and 210

Third, there seems to be a streamwise velocity decrease in the background
flow field, especially for stable conditions, indicating the growth of a
fully developed flow region. The higher mixing promoted by unstable
atmospheric conditions probably induces an entrainment of the higher speed
that flows over and around the array

Reduction in the overall error by the activation of different
correction types

An additional problem with the interpretation of the results is due to the
fact that the identified flow correction can be affected by the wake
combination scheme. As the number of wake overlaps grows towards the trailing
edge of the farm, any inaccuracy in the combination model will be amplified
there. The results of the figure were obtained with the FLS model, which
has been reported to more accurately predict power deep inside the farm

These results highlight a problem that deserves attention and further research. In fact, the approach of adding a background correction term to the FLORIS model is somewhat oblivious to the deficiencies of its submodels: for each different wake combination model, a different background flow field is identified that, in the end, is capable of delivering a similar good match of the power predictions with the measurements, compensating possible differences in the behavior of the models. While on the one hand this “obliviousness” is one of the strengths of learning-based data-driven methods, on the other hand it is clearly also one of their main weaknesses because it tends to mask possible problems of the submodels, hindering a full understanding of their true capabilities.

Stability affects not only the identified background flow but also the
simultaneous tuning of the wake model parameters. For the stable and unstable
cases, Table

Results of the wake model tuning, with the initial baseline
parameters

Next, the performance of the STL method was compared to the baseline untuned
homogeneous-background case, considering the successive activation of the
various correction terms. Figure

The initial error for the baseline model is higher in the stable case than in the unstable one. This is to be expected, since wake and farm effects are more prominent in stable atmospheric conditions.

The flow correction term produces, similarly to the Sedini case, the largest contribution to the improvement in the error. As for Sedini, even here this term contains clear land-induced effects, generated by the neighboring coastline. However, even more prominent effects are driven by the growth of the boundary layer over the farm because of its relatively large streamwise depth.

In contrast to the Sedini case, the wind direction correction

Figure

The present paper has formulated and demonstrated the STL method, which
simultaneously calibrates and augments a steady-state parametric wind farm
flow model; this work extends an earlier less general formulation first
described in

A decomposition of the wind farm flow field by temporal and causal effects forms the basis for the definition of the extra correction terms, together with their functional dependencies and assumed parametric discretizations. The formulation allows for the first time a two-dimensional heterogeneous background flow to be learned directly from operational data. In this sense, the whole wind farm is used as a distributed sensor, which detects the development of the flow within its own boundaries through the response of its wind turbines (which act as local flow sensors). The learned heterogeneous flow is influenced by ambient conditions, terrain orography, roughness, sea state, and plant-induced effects. The learned corrections are not limited to wind speed but can also include heterogeneous wind direction or turbulence intensity fields.

Tuning and learning result in a severely ill-posed identification problem because of the collinearity and/or lack of observability of the redundant unknown parameters. This problem is solved by an SVD-supported MLE. The SVD effectively performs a generalized modal decomposition of the whole solution, which includes the coupled effects of the heterogeneous flow field and of the other tunable model parameters. In this way, combinations of the parameters that are not visible – given the necessarily limited informational content of the available dataset – can be readily discarded, whereas only visible combinations are retained. As a byproduct of this analysis, the examination of the underlying coordinate transformation and resulting mode shapes can be used to reveal interesting features of the solution.

The methodology was showcased via two distinct applications.

For the onshore Sedini farm, the STL revealed the existence of a heterogeneous wind speed field. Augmenting the baseline model with this learned background correction, together with the site-specific tuning of the wake model, resulted in a very significant improvement to the prediction of power output throughout the farm, even when compared to the predictions of the ad hoc tuned baseline model. The learned corrections showed a significant correlation with the terrain elevation, suggesting that the observed heterogeneity of the flow is primarily driven by orographic features of the site. This was further confirmed by over-the-terrain CFD simulations, which also showed a good agreement with the learned corrections. Additionally, the CFD-computed flow field was used as an initial starting guess for the learned correction term; this, however, did not significantly change the results. Furthermore, the STL was able to identify a large bias in the wind direction presumably due to problems with the wind turbine yaw encoders.

For the much larger offshore Anholt farm, the STL revealed the existence of gradients in the inflow, as well as the presence of a strongly direction and stability-dependent highly heterogeneous intra-plant flow field. Comparison with WRF simulations confirmed the origin of the inflow gradients as being caused by the presence of coastlines in close proximity to the farm, as already observed by other authors. The intra-plant flow exhibited clear instances of local accelerations close to the farm edges, suggesting that the flow is “turning” around the obstacle represented by the farm. The observed intra-plant flow appears to be caused by the growth of the boundary layer over the farm. The flow appears to be very significantly influenced by the irregular shape of the farm and by the spacing of the turbines, which would be difficult to capture with simplified analytical models. However, the interpretation of the results was complicated by the effects caused by the interaction of multiple wakes towards the farm trailing edge. It was in fact observed that changes to the wake combination model can affect the identified background flow. Given the present dataset, it was not possible to disentangle the two effects, which remains an open problem that will necessitate further research. Notwithstanding this limitation, the STL was able to very significantly improve the prediction of power when compared to the tuned baseline, no matter what wake combination model was used.

Future work can further improve the STL approach.

On the white-box side of the problem, it would be interesting to add the most recent generation of intra-plant effects. This could help disentangle the causes for the observed heterogeneous background corrections in large farms. Similarly, one should explore more sophisticated wake combination models than the ones used here given their significant effects on the estimated background flow. Models that are parametric (i.e., that can be tuned) would be of particular interest given the “monolithic” parameter estimation performed by the STL.

On the black-box side of the problem, the use of richer datasets than the ones used here could really help illuminate some of the complexities of wind farm flows. For example, operational data accompanied with information on the ambient conditions could help in better discerning the effects of stability on phenomena such as boundary layer growth, blockage, gravity waves, and others. Additionally, extra measurements provided on site by met masts and/or long-range scanning lidars could be fused with the operational data, boosting the informational content of the dataset. Although the grey-box nature of the STL method means that the white-box component can compensate for the lack of information in the data, it is also true that what is not in the white box and not in the data can never be correctly represented by the model. Therefore, future improvements depend to some extent on the richness and quality of the datasets that will be available.

Finally, the STL method should be extended to incorporate unsteady effects by the use of a dynamic version of the baseline engineering wake model. It is envisioned that the steady-state STL could be used, as done here, to adapt the model to represent permanent features of the flow (for example, as caused by a hill), whereas the unsteady STL could be used to render any transient effects (for example, as caused by the finite-speed propagation downstream of set point or inflow changes).

For the Sedini case, RANS simulations were
carried out in OpenFOAM

The learned corrections

A rectangular domain was used because of its simpler mesh generation and
clear identification of inlet and outlet compared to other shapes. For each different wind direction case, the terrain was rotated to align the domain with the inflow. Figure

Terrain elevation around the Sedini wind farm

A regular background mesh was generated with the blockMesh tool that is part
of the OpenFOAM distribution. In the horizontal direction,

As simulations were performed only for neutral conditions, buoyancy effects
were not included. Furthermore, Coriolis effects were also neglected as only
the velocity at hub height is of interest, which is very close to the
surface. The k-

The domain boundary conditions were imposed as follows. A logarithmic
velocity profile was imposed at the inlet, with a roughness length of

A second-order accurate linear discretization scheme was used for the divergence terms. The problem was solved with simpleFoam, an implementation of the SIMPLE algorithm. The setup was first tested in an empty domain, where it was able to establish an equilibrium ABL with constant velocity profile from inlet to outlet. Simulations were run with 322 cores and converged after ca. 1200 iterations.

To investigate grid convergence, the mesh was progressively coarsened in both
the horizontal and vertical directions (in the latter case, only in the
bottom section of the domain), obtaining 10 %, 30 %, and 50 % fewer grid points, respectively. Table

Figure

Mesh characteristics for the grid convergence study for the 270

Average relative difference in hub-height speed for grids of increasing coarseness.

Normalized measured and calculated power for all turbines for all 5

Normalized measured and calculated power for all 5

Data of the Sedini wind farm are the property of Enel Green Power S.p.A. Data of the Anholt wind farm are the property of Ørsted A/S. All figures and the data used to generate them can be retrieved in Pickle Python and MATLAB formats via

CLB developed the concept of the wind farm as a sensor, formulated the STL algorithm, and supervised the research. RB and AV implemented the method; AV developed the data processing procedures. RB developed the application to the Sedini wind farm and AV to the Anholt plant. RB developed the over-the-terrain CFD approach and performed the numerical simulations. All authors equally contributed to the interpretation of the results. RB and CLB wrote the manuscript, with contributions from AV in the Anholt section. All authors provided important input to this research work through discussions and feedback and by improving the manuscript.

At least one of the (co-)authors is a member of the editorial board of

The authors express their gratitude to Enel Green Power S.p.A. and Ørsted A/S, which granted access to the Sedini and Anholt field data, respectively, and to Achim Fischer for his advice on over-the-terrain CFD.

This work is funded in part by the e-TWINS project (FKZ:03EI6020A), which receives funding from the German Federal Ministry for Economic Affairs and Climate Action (BMWK). Additional funding is provided by the European Union under the Horizon Europe grant no. 101084216 (project MERIDIONAL).

This paper was edited by Jens Nørkær Sørensen and reviewed by two anonymous referees.