The AWAKEN wind farm benchmark, Part 2: Modeling results

Bodini, Nicola; Moriarty, Patrick; Thedin, Regis; Doubrawa, Paula; Archer, Cristina; Blaylock, Myra; Bottasso, Carlo; Carmo, Bruno; Cheung, Lawrence; Dubreuil, Camille; Floors, Rogier; Herges, Thomas; Houck, Daniel; Kanjari, Ali; Kaul, Colleen M.; Kelley, Christopher; LI, Ru; Lundquist, Julie K.; Major, Desirae; Nguyen, Anh Kiet; Optis, Mike; Parada, Luan R. C.; Peña, Alfredo; Quick, Julian; Ricarte, David; Radünz, William C.; Rai, Raj K.; Garcia Santiago, Oscar; Schulte, Jonas; Seim, Knut S.; van der Laan, M. Paul; Vimalakanthan, Kisorthman; Wise, Adam

doi:10.5194/wes-2026-34

Preprints

https://doi.org/10.5194/wes-2026-34

Preprints

24 Mar 2026

| 24 Mar 2026

Status: a revised version of this preprint is currently under review for the journal WES.

The AWAKEN wind farm benchmark, Part 2: Modeling results

Nicola Bodini, Patrick Moriarty, Regis Thedin, Paula Doubrawa, Cristina Archer, Myra Blaylock, Carlo Bottasso, Bruno Carmo, Lawrence Cheung, Camille Dubreuil, Rogier Floors, Thomas Herges, Daniel Houck, Ali Kanjari, Colleen M. Kaul, Christopher Kelley, Ru LI, Julie K. Lundquist, Desirae Major, Anh Kiet Nguyen, Mike Optis, Luan R. C. Parada, Alfredo Peña, Julian Quick, David Ricarte, William C. Radünz, Raj K. Rai, Oscar Garcia Santiago, Jonas Schulte, Knut S. Seim, M. Paul van der Laan, Kisorthman Vimalakanthan, and Adam Wise

Abstract. Accurately modeling wind farm performance in complex atmospheric flows remains a challenge. This paper presents the modeling results of the American WAKE experimeNt (AWAKEN) wind farm benchmark, a collaborative effort involving 16 research groups from academia and industry within the International Energy Agency Wind Technology Collaboration Programme Task 57. The study evaluates a diverse suite of simulation tools, ranging from fast-running engineering wake models to high-fidelity large-eddy simulations, against a diurnal case study observed during the AWAKEN campaign. The benchmark utilized a three-phase structure to progressively assess model performance as observational data availability increased. Initial blind predictions showed that higher-fidelity models did not uniformly outperform simpler simulation tools. A distinct spatial bias was observed where models struggled to resolve the interplay between a low-level jet, wakes, and terrain-induced flow acceleration. In subsequent phases, leveraging additional measurements for model improvement led to a reduction in mean absolute error across the model ensemble; however, this effect was most pronounced in engineering wake models, where targeted calibration reduced error by up to 40 %. Overall, the study demonstrates that inflow characterization remains a primary prerequisite for accuracy, particularly for models relying on coarse forcing datasets. While the limited ability to resolve local terrain-flow interactions under single-day conditions represent a recognized constraint, the overall findings on wake modeling and real-world validation still provide valuable guidance for model application and for mitigating this limitation.

How to cite. Bodini, N., Moriarty, P., Thedin, R., Doubrawa, P., Archer, C., Blaylock, M., Bottasso, C., Carmo, B., Cheung, L., Dubreuil, C., Floors, R., Herges, T., Houck, D., Kanjari, A., Kaul, C. M., Kelley, C., LI, R., Lundquist, J. K., Major, D., Nguyen, A. K., Optis, M., Parada, L. R. C., Peña, A., Quick, J., Ricarte, D., Radünz, W. C., Rai, R. K., Garcia Santiago, O., Schulte, J., Seim, K. S., van der Laan, M. P., Vimalakanthan, K., and Wise, A.: The AWAKEN wind farm benchmark, Part 2: Modeling results, Wind Energ. Sci. Discuss. [preprint], https://doi.org/10.5194/wes-2026-34, in review, 2026.

Received: 03 Feb 2026 – Discussion started: 24 Mar 2026

Competing interests: At least one of the (co-)authors is a member of the editorial board of Wind Energy Science.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1424 KB)

Supplement (3006 KB)

Download & links

Status: final response (author comments only)

RC1:
'Comment on wes-2026-34', Anonymous Referee #1, 26 Apr 2026

This paper presents a highly valuable and extensive benchmark of wind farm flow models against the unique AWAKEN dataset. The multi-phase, blind approach is a significant strength, offering critical insights into model performance and the value of data for model improvement. The finding that inflow characterization is a primary prerequisite for accuracy is an important, well-supported conclusion. However, the framing of model performance, especially regarding the comparison between engineering and higher-fidelity tools, can be misleading and risks underrepresenting the fundamental research challenges that this benchmark uniquely exposes. Specific comments are as follows:

1. The manuscript makes statements such as "initial blind predictions showed that higher-fidelity models did not uniformly outperform simpler simulation tools" (Abstract) and "simpler engineering and steady-state models often matched or outperformed higher-fidelity mesoscale approaches" (Conclusions). While factually correct in terms of the bulk error metrics (e.g., MAE) for this specific case, this framing can be misleading without critical context.

2. The authors correctly note that engineering models directly ingested high-quality, single-point observations (from site A1) in Phase 1. Their performance is therefore not a triumph of simplified physics, but a demonstration of the effectiveness of empirical calibration against a known inflow. In contrast, the higher-fidelity models (WRF, LES) were tasked with a much harder problem: predicting the inflow ab initio from coarser boundary conditions. Their errors are primarily "inflow errors," not necessarily "physics errors" within the wake model. The text should more clearly distinguish between the performance of a model's inflow characterization strategy and its wake physics fidelity. Suggesting that an engineering model "outperforms" an LES model conflates a site-calibrated tool with a predictive one.

3. The results of this benchmark expose profound, fundamental research challenges for high-fidelity modeling that are mentioned but not centered as key findings. The paper should more forcefully articulate these challenges as critical outcomes of the study:

4. The stable case (06:00 UTC) demonstrates the breakdown of Monin-Obukhov Similarity Theory (MOST) (as noted for Participant 6), placing the turbine rotor layer outside the surface layer. Standard RANS models, which rely heavily on equilibrium boundary layer assumptions, are fundamentally challenged by such non-canonical conditions (e.g., low-level jets). The manuscript should explicitly state and discuss the crucial need for improved turbulence closures for wind energy applications.

5. LES is often driven by mesoscale simulations, which do not have turbulence content. This underscores the challenge of generating realistic, turbulent, site-specific inflows for LES, particularly in complex terrain where standard periodic boundary conditions or simplified precursor methods fail. It is suggested to discuss this issue in the revised paper.

6. The observation that terrain-induced flow acceleration frequently outweighed wake losses (e.g., at Site H) and caused specific waked turbines to overproduce is a critical finding. This demonstrates that resolving microscale terrain features is not just a detail but a first-order priority on par with wake parameterization. In LES of atmospheric turbulent flows, the near-wall region is often modelled rather than directly resolved. The classic wall model depends on the logarithmic law of the wall, which, however, fails in complex. This challenge should be highlighted in the paper as a fundamental research challenge.

7. The conclusion section can be strengthened by distilling the key results from the points above into a clear summary of high-priority, fundamental research needs, including such as inflow characterization and modeling, non-equilibrium and non-canonical boundary layer physics, and multiscale coupling of terrain and wake effects.
.

Citation: https://doi.org/10.5194/wes-2026-34-RC1
- AC1: 'Reply on RC1', Nicola Bodini, 13 Jun 2026
  
  The comment was uploaded in the form of a supplement: https://wes.copernicus.org/preprints/wes-2026-34/wes-2026-34-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/wes-2026-34-AC1
RC2:
'Comment on wes-2026-34', Anonymous Referee #2, 26 May 2026

This manuscript presents a well-structured and valuable benchmarking exercise that leverages the unique AWAKEN dataset and a carefully designed multi-phase framework. The progressive release of observational data is a clear strength, as it enables a controlled evaluation of how models respond to increasing constraint. The paper provides important insights into model behavior across a wide range of fidelities, and I broadly agree with the central conclusions as well as with the points raised by the other reviewer, particularly regarding the dominant role of inflow characterization and the need to distinguish inflow-related errors from wake-model physics.
However, from a systems and industrial perspective, the manuscript would benefit from a more explicit distinction between predictive capability and state-conditioned calibration. As currently framed, reductions in aggregate error metrics such as mean absolute error risk being interpreted as genuine improvements in model skill. In practice, these improvements are largely achieved after the full release of inflow and wake observations, and therefore reflect the model’s ability to adapt to a well-constrained, fully observed state rather than to predict it. This is particularly relevant in Phase 3, where a significant reduction in MAE is reported for several models. Such improvements are best interpreted as conditional error minimization rather than as evidence of generalizable predictive capability.
Closely related to this is the question of calibration versus transferability. In several cases, improved agreement appears to be achieved through localized tuning or spatially heterogeneous parameter adjustments that allow models to absorb unresolved physical processes, such as terrain-induced flow modification or stability effects, into empirical corrections. While effective for reproducing the specific case studied, this approach does not necessarily transfer to different atmospheric conditions. The manuscript would therefore benefit from a clearer conceptual separation between reconstruction of an observed case and prediction of unseen conditions.
The reliance on a single diurnal case study further reinforces this limitation. The benchmark focuses on one specific atmospheric realization, characterized by low-level jet dynamics and stability transitions. While this provides a rich and controlled test case, it also introduces a structural risk of overfitting, as model adjustments may implicitly encode features of this particular event. As a result, performance improvements observed within the benchmark cannot be directly extrapolated to longer time scales such as annual energy production or probabilistic metrics like P50 or P90. This does not reduce the scientific value of the study, but it defines a clear boundary on the interpretation of model performance.
Another important aspect concerns the interpretability of the model ensemble. The manuscript acknowledges that subjective modeling choices, or the “human factor,” contribute significantly to variability in results. However, the absence of a structured control across key configuration elements, such as turbulence parameterizations, boundary layer schemes, and mesoscale setup, makes it difficult to attribute observed differences in performance to underlying physical mechanisms. In its current form, the ensemble represents a collection of plausible configurations rather than a fully controlled experiment, which limits its ability to isolate causal drivers of error.
The identification of inflow characterization as a primary determinant of accuracy is one of the most important outcomes of the study. From an industrial perspective, this result has an even stronger implication than currently stated. It indicates that inflow uncertainty effectively defines a lower bound on achievable model accuracy, independent of wake modeling fidelity. Even in cases where turbines are minimally affected by wakes, substantial errors persist, suggesting that upstream atmospheric state representation is the dominant source of uncertainty. This insight should be elevated from a supporting observation to a primary conclusion, as it has direct consequences for pre-construction assessment workflows and uncertainty quantification practices.
I also concur with the other reviewer that the benchmark exposes several fundamental research challenges, including the breakdown of classical boundary layer assumptions in stable conditions, the difficulty of generating realistic turbulent inflow for high-fidelity models, and the strong coupling between terrain-induced flow features and wake dynamics. Rather than reiterating those points, I would emphasize that these challenges collectively define limits on model transferability, not just areas requiring incremental improvement. This reinforces the need to distinguish models that reproduce a single constrained state from those that generalize across atmospheric regimes.
In conclusion, this is a high-quality and important contribution that should be published. However, the manuscript would benefit from a reframing that explicitly separates predictive capability from calibration-driven reconstruction, clarifies the limits of generalization inherent in a single-case study, and more strongly emphasizes inflow uncertainty as a primary limiting factor. Addressing these points would significantly strengthen both the scientific interpretation and the applicability of the results in operational contexts.

Citation: https://doi.org/10.5194/wes-2026-34-RC2
- AC2: 'Reply on RC2', Nicola Bodini, 13 Jun 2026
  
  The comment was uploaded in the form of a supplement: https://wes.copernicus.org/preprints/wes-2026-34/wes-2026-34-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/wes-2026-34-AC2

Supplement

https://doi.org/10.5194/wes-2026-34-supplement

Data sets

AWAKEN wind farm wake benchmark inputs Nicola Bodini https://zenodo.org/records/15623845

Viewed

Total article views: 672 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
449	206	17	672	51	23	27

HTML: 449
PDF: 206
XML: 17
Total: 672
Supplement: 51
BibTeX: 23
EndNote: 27

Views and downloads (calculated since 24 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	49	25	1	75
Apr 2026	176	62	8	246
May 2026	143	95	5	243
Jun 2026	46	10	1	57
Jul 2026	35	14	2	51

Cumulative views and downloads (calculated since 24 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	49	25	1	75
Apr 2026	176	62	8	246
May 2026	143	95	5	243
Jun 2026	46	10	1	57
Jul 2026	35	14	2	51

Viewed (geographical distribution)

Total article views: 638 (including HTML, PDF, and XML) Thereof 638 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2026

Download

Preprint (1424 KB)
Metadata XML

Short summary

Predicting wind farm energy production is challenging because wind patterns are complex. We tested 16 different models against real data from a major field experiment to see which worked best. Surprisingly, the most expensive and detailed models were not always more accurate than simpler ones. We found that feeding models better weather data was the most effective way to improve accuracy. These results help the industry choose the right tools for designing more efficient wind farms.


Total:	0
HTML:	0
PDF:	0
XML:	0