Status: this preprint is currently under review for the journal WES.
Evaluating Yawed Turbine Transfer Functions from SCADA Data
Aidan Gettemy,Luke Abbatessa,and Nathan L. Post
Abstract. While nacelle transfer functions (NTFs) have been applied to correct free-stream wind speed measurements at the nacelle of turbines steered into the wind, less is known about the relationship between unsteered (non-yawed) and steered (yawed) wind measurements on turbines performing wake steering. As wake steering becomes an important tool for maximizing collective wind farm power, determining and correcting bias caused by prolonged yaw misalignment on wind measurements is critical to improving collective wind farm control and analysis. We propose a new approach for evaluating NTFs using SCADA data. Using SCADA and wake steering controller 1-minute statistics recorded over 3.5 months at a large utility-scale wind plant, we apply several consensus methods to estimate unsteered turbine measurements for steered turbines using neighboring turbines. Then a bagged tree regressor algorithm is trained to predict the unsteered wind direction, wind speed, and generator power using the measured SCADA data during wake steering using the best consensus estimate as the target value. With the NTFs estimated through the ML model, we define experimentally determined non-linear sensor bias in the measured data as a function of yaw angle.
Received: 31 Oct 2025 – Discussion started: 13 Nov 2025
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
The manuscript develops a machine learning model to predict the wind speed, wind direction, and power production yawed turbines would experience if they were unyawed, with has as inputs only SCADA data. To support this, it also develops a way to estimate these quantities based on data from neighboring turbines. This work is useful, as it addresses a real problem that will become more important as active wake steering becomes more widespread. Furthermore, the use of real SCADA data demonstrates that the method could be implemented in real-world scenarios. However, the developed model itself is not convincing, and the manuscript does not explain it well. A full overview of the issues I have is given below.
Overall, the manuscript needs major changes before being considered for publication. I recommend adding a simpler model for comparison and a major rewrite of some sections, as discussed below. I would be happy to review a revised version with these changes.
Major comments:
The authors use a machine learning algorithm as their model for predicting the unyawed QoIs based on the yawed measurements. They argue that this is necessary to enable non-linear fitting of the NTFs (line 174). However, later results such as figure 9 seem to give quasi-linear predictions for the majority of the inputs. It therefore seems to me that a simple linear regression between the measured and predicted wind vane and speed could perform well. Furthermore, literature already contains models relating yawed and unyawed power output as a function of turbine angle, such as a cosine power law. Compared to these simpler approaches, the full ML algorithm developed by the authors seems needlessly complex. Including such a simple approach and comparing it against the ML model would greatly enhance the value of this manuscript. If not, the authors should argue more strongly and convincingly why a fully non-linear model is needed, building on earlier literature.
The bagged tree regression model used by the authors is not explained in the text, outside of a few sentences giving a conceptual description (lines 175-180). This is not sufficient, and makes it impossible to understand the work without consulting external sources. Please add a description of the algorithm, along with the core equations if possible.
Overall, the manuscript is poorly written. My main complaints regarding the writing were:
The authors do not differentiate between the \citet and \citep LateX commands, resulting in each citation being a full name citation, without brackets. Normally, this would be a minor comment, but since it happens for every single citation it makes the first two sections of this paper unreadable. The paper cannot be accepted without correcting this.
The “wind vane” QoI is defined as the “relative wind direction sensor” (line 54-55). It is not immediately clear whether this corresponds to wind direction, turbine yaw angle, or some combination of the two. This is complicated by the authors not being consistent in their usage of the term and reverting back to “wind direction” several times throughout the manuscript (eg. figure 2 caption, line 234, line 258, to name but a few). Please differentiate better between yaw and wind direction.
There is no single paragraph clearly stating the goals, methodology, and structure of the paper. The final sentences of sections 1 and 2 give the goal, but only in section 3 is the overall methodology outlined. The manuscript would be greatly improved by a paragraph in section 1 listing how the authors will tackle the problem and which sections of the paper discuss what.
The workflow of this model development is quite complicated and difficult to follow. There are no equations, so it’s hard to see where one model ends and another begins. The manuscript would benefit from some sort of data flow diagram, visualizing what inputs are used by which model and what outputs are produced.
Uncertainty is not considered throughout the manuscript. I suggest the authors incorporate this into their model, as this would greatly enhance its useability for industry applications. If this is not possible, the discussion around model uncertainty should be more in-depth.
Figures 9-11: Building on the previous comment, these plots would highly benefit from uncertainty estimates around the predictions. Based on the results presented here, I am not convinced that the non-linearity for high wind vane angles in figure 9 is not a statistical artifact, since based on figure 6 there does not seem to be much data available at these angles. Please discuss this further.
Figure 11: A direct comparison with a cosine power law would be very interesting here. Consider adding that to the paper.
Line 181: The authors mention that the ML model can select predictors or input variables. However, the only results they show with this seems to correspond to a sensitivity analysis for the different inputs, which many models, including simple linear regression, can do. It’s very unclear what the added capability of the developed model is in this context.
Line 215: The calculation of the turbulence intensity is not clearly explained. Are the “global average” and the deviation taken over all the turbines for a given timestamp? Or are these values calculated separately for each turbine? In case of the former, the calculated value might be more indicative of general flow non-uniformity throughout the farm, and not just turbulence intensity, which would greatly change the interpretation of some later results. For instance, the discussion on line 222 would be moot, since the high circular differences would then trivially correspond to flow non-uniformity. Furthermore, it’s not clear whether this is the standard method for estimating turbulence intensity based on SCADA data. If so, please refer to relevant literature.
Technical issues and minor comments/suggestions:
Were the QoIs considered here (vane, speed, and power) the only available data? Or was this a selection by the authors?
Line 188: what is the difference between a predictor feature, a parameter, and a model input? Please be more consistent in the terminology.
Line 196: “base estimators” are never defined. I assume this is related to major comment #2, but a clear description would be appreciated.
Figure 2: This figure shows the data before the filtering operations. This should be clearly mentioned in the caption. As a suggestion, consider moving this figure to section 3.2, where the filtering is developed.
Line 212: “stable” conditions has the connotation of stratification effects. Since the authors are discussing non-uniform conditions, consider using “uniform” instead. The same comment applies to line 333, and to “steady” on line 218 (implies transient effects).
Figure 3: Turbine 456 is not plotted in figure 1. Why is this?
Line 239: “While difficult to read due to the sheer number of datapoints” using a transparent scatter or a KDE would solve this issue, and both are done later in the manuscript. Please just do that here as well.
Figure 4: it’s not clear what value this figure has over figure 5.
Line 266 / Figure 6: Wind speeds below 5m/s are filtered out. Please just limit your plot axes to this same range.
Line 263: “Inferred by the NTF” does this mean figure 6 shows the dataset the NTF is trained on? Or does it mean that figure 6 shows the output of the NTF? This wording is confusing.
Figure 6: The horizontal line corresponding to the desired 1 ratio is not indicated in several subfigures, which makes the results harder to read.
Line 301-302: the figure is not misleading, as the second column solves the issue you mention here.
Figure 8: What is the third column showing? There is no reference to it in the caption or the text.
Figure 8, bottom left: Why is the distribution slanted? There are clear borders, and parallel lines of points within the distribution.
Wake steering intentionally yaws upwind wind turbines to decrease wake interactions with downwind turbines. However, measurement of wind conditions may be biased when a turbine is yawed. This work explores an approach to estimate reference wind conditions from raw wind turbine data. The result is a function defined by a machine learning algorithm that can be used to correct measurements on steered turbines.
Wake steering intentionally yaws upwind wind turbines to decrease wake interactions with...
The manuscript develops a machine learning model to predict the wind speed, wind direction, and power production yawed turbines would experience if they were unyawed, with has as inputs only SCADA data. To support this, it also develops a way to estimate these quantities based on data from neighboring turbines. This work is useful, as it addresses a real problem that will become more important as active wake steering becomes more widespread. Furthermore, the use of real SCADA data demonstrates that the method could be implemented in real-world scenarios. However, the developed model itself is not convincing, and the manuscript does not explain it well. A full overview of the issues I have is given below.
Overall, the manuscript needs major changes before being considered for publication. I recommend adding a simpler model for comparison and a major rewrite of some sections, as discussed below. I would be happy to review a revised version with these changes.
Major comments:
It therefore seems to me that a simple linear regression between the measured and predicted wind vane and speed could perform well. Furthermore, literature already contains models relating yawed and unyawed power output as a function of turbine angle, such as a cosine power law. Compared to these simpler approaches, the full ML algorithm developed by the authors seems needlessly complex.
Including such a simple approach and comparing it against the ML model would greatly enhance the value of this manuscript. If not, the authors should argue more strongly and convincingly why a fully non-linear model is needed, building on earlier literature.
Furthermore, it’s not clear whether this is the standard method for estimating turbulence intensity based on SCADA data. If so, please refer to relevant literature.
Technical issues and minor comments/suggestions:
As a suggestion, consider moving this figure to section 3.2, where the filtering is developed.