the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Convolutional versus graph-based surrogate models for inter-farm wake prediction using multi-fidelity transfer learning
Abstract. Accurate prediction of wind farm wake interactions is important for energy yield assessment, as offshore wind farms increasingly operate in close proximity to one another. This work presents a systematic comparison of two neural network surrogate models for inter-farm wake deficit prediction: a convolutional neural network (CNN) based attention residual U-Net (ARU-Net) and a graph neural network (GNN) based graph neural operator (GNO). Both architectures are rained using multi-fidelity transfer learning. First, pre-trained on low-fidelity engineering model simulations, and second fine-tuned on high-fidelity Reynolds-averaged Navier-Stokes actuator wind farm (RANS-AWF) data. The models are evaluated on procedurally generated wind farm layouts spanning diverse farm sizes, turbine spacings, wind speeds, and ambient turbulence intensities. Both architectures achieve high prediction accuracy but exhibit complementary strengths: evaluated over the wake region (δw ≥ 10−3 m s−1) and averaged across both evaluation grids, the GNO achieves a lower RMSE (0.024 vs. 0.028 m s−1), while the ARU-Net attains a higher F1 score (0.98 vs. 0.91), reflecting its superior wake boundary capture. Transfer learning substantially benefits the ARU-Net, while the GNO shows only marginal improvement.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Wind Energy Science.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(3221 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on wes-2026-54', Anonymous Referee #1, 21 Apr 2026
-
RC2: 'Comment on wes-2026-54', Anonymous Referee #2, 21 Apr 2026
The manuscript entitled “Convolutional versus graph-based surrogate models for inter-farm wake prediction using multi-fidelity transfer learning” presents a systematic comparison of two neural-network surrogate models for inter-farm wake deficit prediction, i.e. an attention residual U-Net (ARU-Net) and a graph neural operator (GNO), both pre-trained on low-fidelity TurbOPark samples and fine-tuned on high-fidelity RANS-AWF samples. Averaged over the wake region, the GNO achieves a lower RMSE (0.024 vs. 0.028 m/s) while the ARU-Net attains a higher F1 score (0.98 vs. 0.91), and transfer learning is shown to benefit the ARU-Net substantially more than the GNO. The paper is a useful and timely contribution to the wind farm modelling literature.
The paper is clearly written, well structured, and it is enjoyable to read. The comments below can be used to improve the quantitative claims and fix a few production issues of the manuscript.
- Throughout the paper and in the abstract, the RMSE (0.024 vs. 0.028 m/s) is reported, which does not clearly show how good the model is for the readers. Both models appear to be very accurate just from these values. A more meaningful relative error could be designed to better showcase the model performance over analytical models. For example, as a suggestion, this may be defined as the RMSE of ML models normalized by RMSE of TurbOPark model. A key question that should be answered will be, by what extent the developed ML models are better than engineering wake models? This will be key to demonstrating the value of this work.
- The PLayGEN-generated layouts seem pretty realistic and meaningful for real-world applications. However, for evaluations, are all high-fidelity data samples fall into the Cluster type? If it is the case, it will be beneficial to add new test results for samples in single string, parallel string, multiple string types, for quantitative and better evaluation of performance and generalizability.
- The author clearly described the reason behind using different layout generation methods in low-fidelity and high-fidelity datasets. Still, it would be better if the authors can comment on the pros/cons of PLayGen vs random turbine removal.
- The impact of this work may be enhanced by open-sourcing the dataset (at least the 200 test cases used in the paper). This is not mandatory though. But it could serve as useful benchmark for the communities in the future.
- Figure 11 could benefit from adding the analytical model results as well, to illustrate how and where the ML models perform better.
- A general aspect: how the reported evaluation and metrics are affected by intra-farm wakes? Are they minor in the reported RMSE and F1 scores?
Some minor points:
- In abstract, “First, pre-trained on low-fidelity engineering model simulations, and second fine-tuned on high-fidelity Reynolds-averaged Navier-Stokes actuator wind farm (RANS-AWF) data.” This is not a sentence.
- In abstract, “Transfer learning substantially benefits the ARU-Net, while the GNO shows only marginal improvement.” The second half of the sentence is misleading. It should be “transfer learning just improves the GNO marginally”.
- In Introduction, Line 80-85, CNN-based models based on multi-fidelity data is briefly reviewed. Please note the omission of a related work “Multi-fidelity modeling of wind farm wakes based on a novel super-fidelity network”, where a different strategy is used for CNN-based wake modelling based on multi-fidelity data.
- Line 95-100, “pretraining” should be “pretrained”; “fine-tuning” should be “fine-tuned”.
- Why CNN grid (256x128) and GNN grid (257x129) are not exactly the same but have a difference of 1?
- Figure 3 appears later in the main texts than Figure 2. Please order the figures by their order of appearance.
- Figure 13 has no green color. But the main texts refer to the green color. Please fix. Also check other figures and the related texts.
- Figure 12: (a-e) labels do not match the ones in the figure.
- Broken cross-reference and links: line 562, “As mentioned in ??”; “…made available at Zenodo (?)”. Please fix.
- Line 600: “Regarding fine-tuning the full-tuning and frozen encoder strategy performed comparably for the ARU-Net” reads awkwardly; please rephrase.
Citation: https://doi.org/10.5194/wes-2026-54-RC2
Data sets
Wind-Farm-GNO Jens Peter Schøler https://github.com/jenspeterschoeler/Wind-Farm-GNO
Model code and software
Wind-Farm-ARU-Net Frederik Peder Weilmann Rasmussen https://github.com/FPWRasmussen/Wind-Farm-ARU-Net
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 248 | 64 | 16 | 328 | 15 | 18 |
- HTML: 248
- PDF: 64
- XML: 16
- Total: 328
- BibTeX: 15
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a rigorous and timely comparison of ARU-Net and GNO surrogate models for inter-farm wake prediction using multi-fidelity transfer learning. The study addresses a critical challenge in offshore wind energy as farm clusters become denser. The key findings offer valuable and actionable insights for surrogate model selection in wind energy applications. The paper is well-structured and the experiments are reproducible. Addressing the requested clarifications on computational efficiency, out-of-distribution robustness, and certain methodological details will further strengthen this solid contribution. I recommend acceptance pending major revisions.
1. Introduction:
- The transition from the motivation for multi-fidelity transfer learning to the side-by-side introduction of CNN and GNN architectures (Page 3, Line 80) is somewhat disjointed. It would be helpful to state that CNNs and GNNs represent the two principal architectural paradigms currently being explored for implementing such multi-fidelity surrogates.
- The topic about inter-farm wake interaction for the wind farm clusters should be reviewed deeply, such as the literature reported in Journal of Cleaner Production 2023, 396: 136529 and Energy Conversion and Management 2022, 267, 115897.
- Please ensure that all abbreviations are expanded upon first use. For instance, "SciML" in Section 2.1 should be written as "Scientific Machine Learning (SciML)" on its first occurrence.
2. Methodology:
- The manuscript states that PLayGen generates four distinct layout types for the low-fidelity dataset. However, it appears that the high-fidelity RANS-AWF dataset is restricted predominantly or exclusively to cluster layouts. Could the authors clarify the rationale for this restriction? Is it due to computational constraints, or is the cluster layout considered sufficiently representative for the physics corrections targeted by the high-fidelity data? This clarification is important for understanding the generalizability of the fine-tuned models.
- The ARU-Net is implemented in PyTorch, whereas the GNO is built on JAX/Jraph. Could the authors comment on whether this difference in underlying frameworks could introduce any biases in the reported wall-clock training times (Table 4)? For instance, data loading overhead or graph construction routines may differ in efficiency, potentially affecting the comparison of LoRA versus full fine-tuning efficiency.
- The authors clamp velocities exceeding U∞ to U∞ prior to the log-deficit transformation for the ARU-Net. This effectively forces the ARU-Net to ignore regions of flow acceleration caused by blockage effects. Could this design choice partially explain the observed differences in F1 scores and RMSE between the two models? A brief discussion of whether the GNO's ability (or inability) to predict such speed-ups affects the practical utility of the respective models would be insightful.
3. Results and Discussion:
- The manuscript states that the GNO benefits "only marginally" from transfer learning. While the relative improvement is indeed smaller than that observed for the ARU-Net, Table 4 shows a reduction in RMSE from 0.00696 (trained from scratch) to 0.00558 (LoRA fine-tuned), which represents a ~20% relative improvement. In the context of wind farm energy yield assessment, this may not be negligible. The authors should consider softening the language to reflect that while the relative gain is modest compared to the ARU-Net, pre-training still yields the best absolute performance for the GNO.
- The observation that LoRA fine-tuning offered no significant wall-clock time savings despite the substantial reduction in trainable parameters is counterintuitive. Could the authors elaborate on the likely bottleneck? Was the training time dominated by data loading, graph construction, or the forward pass through frozen layers?
- The ARU-Net performance degrades noticeably when interpolated from the CNN grid to the GNN grid. Please specify the interpolation method used (e.g., bilinear, bicubic). Additionally, comment on whether this sensitivity to grid resolution poses a practical concern for deployment scenarios where query points may not align with the training raster.
4. Conclusion: Further research directions are suggested.
Recommendation:
1. Ensure all acronyms (e.g., SciML, PEFT, FiLM, SE, AG) are spelled out upon first occurrence in the main text.
2. The description of the connection between CNNs and GNNs in Section 2.1, while mathematically rigorous, is somewhat verbose. Consider condensing the high-level analogy to improve the pacing of the methodology section.
3. The manuscript would benefit from a schematic diagram in the Introduction illustrating the overall multi-fidelity transfer learning workflow. This would enhance the structural clarity for the reader.