the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The AWAKEN wind farm benchmark, Part 2: Modeling results
Abstract. Accurately modeling wind farm performance in complex atmospheric flows remains a challenge. This paper presents the modeling results of the American WAKE experimeNt (AWAKEN) wind farm benchmark, a collaborative effort involving 16 research groups from academia and industry within the International Energy Agency Wind Technology Collaboration Programme Task 57. The study evaluates a diverse suite of simulation tools, ranging from fast-running engineering wake models to high-fidelity large-eddy simulations, against a diurnal case study observed during the AWAKEN campaign. The benchmark utilized a three-phase structure to progressively assess model performance as observational data availability increased. Initial blind predictions showed that higher-fidelity models did not uniformly outperform simpler simulation tools. A distinct spatial bias was observed where models struggled to resolve the interplay between a low-level jet, wakes, and terrain-induced flow acceleration. In subsequent phases, leveraging additional measurements for model improvement led to a reduction in mean absolute error across the model ensemble; however, this effect was most pronounced in engineering wake models, where targeted calibration reduced error by up to 40 %. Overall, the study demonstrates that inflow characterization remains a primary prerequisite for accuracy, particularly for models relying on coarse forcing datasets. While the limited ability to resolve local terrain-flow interactions under single-day conditions represent a recognized constraint, the overall findings on wake modeling and real-world validation still provide valuable guidance for model application and for mitigating this limitation.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Wind Energy Science.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(1424 KB) - Metadata XML
-
Supplement
(3006 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on wes-2026-34', Anonymous Referee #1, 26 Apr 2026
-
AC1: 'Reply on RC1', Nicola Bodini, 13 Jun 2026
The comment was uploaded in the form of a supplement: https://wes.copernicus.org/preprints/wes-2026-34/wes-2026-34-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Nicola Bodini, 13 Jun 2026
-
RC2: 'Comment on wes-2026-34', Anonymous Referee #2, 26 May 2026
This manuscript presents a well-structured and valuable benchmarking exercise that leverages the unique AWAKEN dataset and a carefully designed multi-phase framework. The progressive release of observational data is a clear strength, as it enables a controlled evaluation of how models respond to increasing constraint. The paper provides important insights into model behavior across a wide range of fidelities, and I broadly agree with the central conclusions as well as with the points raised by the other reviewer, particularly regarding the dominant role of inflow characterization and the need to distinguish inflow-related errors from wake-model physics.
However, from a systems and industrial perspective, the manuscript would benefit from a more explicit distinction between predictive capability and state-conditioned calibration. As currently framed, reductions in aggregate error metrics such as mean absolute error risk being interpreted as genuine improvements in model skill. In practice, these improvements are largely achieved after the full release of inflow and wake observations, and therefore reflect the model’s ability to adapt to a well-constrained, fully observed state rather than to predict it. This is particularly relevant in Phase 3, where a significant reduction in MAE is reported for several models. Such improvements are best interpreted as conditional error minimization rather than as evidence of generalizable predictive capability.
Closely related to this is the question of calibration versus transferability. In several cases, improved agreement appears to be achieved through localized tuning or spatially heterogeneous parameter adjustments that allow models to absorb unresolved physical processes, such as terrain-induced flow modification or stability effects, into empirical corrections. While effective for reproducing the specific case studied, this approach does not necessarily transfer to different atmospheric conditions. The manuscript would therefore benefit from a clearer conceptual separation between reconstruction of an observed case and prediction of unseen conditions.
The reliance on a single diurnal case study further reinforces this limitation. The benchmark focuses on one specific atmospheric realization, characterized by low-level jet dynamics and stability transitions. While this provides a rich and controlled test case, it also introduces a structural risk of overfitting, as model adjustments may implicitly encode features of this particular event. As a result, performance improvements observed within the benchmark cannot be directly extrapolated to longer time scales such as annual energy production or probabilistic metrics like P50 or P90. This does not reduce the scientific value of the study, but it defines a clear boundary on the interpretation of model performance.
Another important aspect concerns the interpretability of the model ensemble. The manuscript acknowledges that subjective modeling choices, or the “human factor,” contribute significantly to variability in results. However, the absence of a structured control across key configuration elements, such as turbulence parameterizations, boundary layer schemes, and mesoscale setup, makes it difficult to attribute observed differences in performance to underlying physical mechanisms. In its current form, the ensemble represents a collection of plausible configurations rather than a fully controlled experiment, which limits its ability to isolate causal drivers of error.
The identification of inflow characterization as a primary determinant of accuracy is one of the most important outcomes of the study. From an industrial perspective, this result has an even stronger implication than currently stated. It indicates that inflow uncertainty effectively defines a lower bound on achievable model accuracy, independent of wake modeling fidelity. Even in cases where turbines are minimally affected by wakes, substantial errors persist, suggesting that upstream atmospheric state representation is the dominant source of uncertainty. This insight should be elevated from a supporting observation to a primary conclusion, as it has direct consequences for pre-construction assessment workflows and uncertainty quantification practices.
I also concur with the other reviewer that the benchmark exposes several fundamental research challenges, including the breakdown of classical boundary layer assumptions in stable conditions, the difficulty of generating realistic turbulent inflow for high-fidelity models, and the strong coupling between terrain-induced flow features and wake dynamics. Rather than reiterating those points, I would emphasize that these challenges collectively define limits on model transferability, not just areas requiring incremental improvement. This reinforces the need to distinguish models that reproduce a single constrained state from those that generalize across atmospheric regimes.
In conclusion, this is a high-quality and important contribution that should be published. However, the manuscript would benefit from a reframing that explicitly separates predictive capability from calibration-driven reconstruction, clarifies the limits of generalization inherent in a single-case study, and more strongly emphasizes inflow uncertainty as a primary limiting factor. Addressing these points would significantly strengthen both the scientific interpretation and the applicability of the results in operational contexts.
Citation: https://doi.org/10.5194/wes-2026-34-RC2 -
AC2: 'Reply on RC2', Nicola Bodini, 13 Jun 2026
The comment was uploaded in the form of a supplement: https://wes.copernicus.org/preprints/wes-2026-34/wes-2026-34-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Nicola Bodini, 13 Jun 2026
Data sets
AWAKEN wind farm wake benchmark inputs Nicola Bodini https://zenodo.org/records/15623845
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 394 | 189 | 14 | 597 | 41 | 20 | 26 |
- HTML: 394
- PDF: 189
- XML: 14
- Total: 597
- Supplement: 41
- BibTeX: 20
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This paper presents a highly valuable and extensive benchmark of wind farm flow models against the unique AWAKEN dataset. The multi-phase, blind approach is a significant strength, offering critical insights into model performance and the value of data for model improvement. The finding that inflow characterization is a primary prerequisite for accuracy is an important, well-supported conclusion. However, the framing of model performance, especially regarding the comparison between engineering and higher-fidelity tools, can be misleading and risks underrepresenting the fundamental research challenges that this benchmark uniquely exposes. Specific comments are as follows:
1. The manuscript makes statements such as "initial blind predictions showed that higher-fidelity models did not uniformly outperform simpler simulation tools" (Abstract) and "simpler engineering and steady-state models often matched or outperformed higher-fidelity mesoscale approaches" (Conclusions). While factually correct in terms of the bulk error metrics (e.g., MAE) for this specific case, this framing can be misleading without critical context.
2. The authors correctly note that engineering models directly ingested high-quality, single-point observations (from site A1) in Phase 1. Their performance is therefore not a triumph of simplified physics, but a demonstration of the effectiveness of empirical calibration against a known inflow. In contrast, the higher-fidelity models (WRF, LES) were tasked with a much harder problem: predicting the inflow ab initio from coarser boundary conditions. Their errors are primarily "inflow errors," not necessarily "physics errors" within the wake model. The text should more clearly distinguish between the performance of a model's inflow characterization strategy and its wake physics fidelity. Suggesting that an engineering model "outperforms" an LES model conflates a site-calibrated tool with a predictive one.
3. The results of this benchmark expose profound, fundamental research challenges for high-fidelity modeling that are mentioned but not centered as key findings. The paper should more forcefully articulate these challenges as critical outcomes of the study:
4. The stable case (06:00 UTC) demonstrates the breakdown of Monin-Obukhov Similarity Theory (MOST) (as noted for Participant 6), placing the turbine rotor layer outside the surface layer. Standard RANS models, which rely heavily on equilibrium boundary layer assumptions, are fundamentally challenged by such non-canonical conditions (e.g., low-level jets). The manuscript should explicitly state and discuss the crucial need for improved turbulence closures for wind energy applications.
5. LES is often driven by mesoscale simulations, which do not have turbulence content. This underscores the challenge of generating realistic, turbulent, site-specific inflows for LES, particularly in complex terrain where standard periodic boundary conditions or simplified precursor methods fail. It is suggested to discuss this issue in the revised paper.
6. The observation that terrain-induced flow acceleration frequently outweighed wake losses (e.g., at Site H) and caused specific waked turbines to overproduce is a critical finding. This demonstrates that resolving microscale terrain features is not just a detail but a first-order priority on par with wake parameterization. In LES of atmospheric turbulent flows, the near-wall region is often modelled rather than directly resolved. The classic wall model depends on the logarithmic law of the wall, which, however, fails in complex. This challenge should be highlighted in the paper as a fundamental research challenge.
7. The conclusion section can be strengthened by distilling the key results from the points above into a clear summary of high-priority, fundamental research needs, including such as inflow characterization and modeling, non-equilibrium and non-canonical boundary layer physics, and multiscale coupling of terrain and wake effects.
.