Dear authors,
Thank you for your submission, I found it very interesting and a relevant study. For clarity I was brought into this as a reviewer at a later stage in the reviewing process. I've reviewed the paper 'as is' without reviewing your replies to other reviewers. I'm also familiar with your past work.
In general this paper is an ambitious effort to apply different strategies for fatigue damage estimation in the context of lifetime extension (or optimized operation) of offshore wind farms. It touches on several techniques and several considerations to be made when using any of the proposed techniques. In essence the paper primary objective is to act as a 'practical' review paper of several techniques rather than to have the ambition to introduce many novel solutions. I welcome such an effort. The presented techniques and (most) considerations are IMO relevant and on-topic and it was interesting to see the results.
** Global comment **
My biggest concern with the paper in the current form is a bit the downside of its ambitious nature. The large number of techniques used, intermingled with one-another, sometimes makes it difficult to keep track of the exact methodology followed. In particular this plays a role when the performance and the uncertainty are discussed. Questions like : "Are we looking at the error on the total damage (sum over a year) or at the 10 minute errors on damage?)", "Do we retrain the ANN for every iteration of our 1000 uncertainty assessments?" regularly popped up.
At times it was hard to follow what the boxplot meant that were presented to us. I feel some extra clarifications here and there how some methods are exactly employed might be beneficial. In particular when uncertainties are addressed.
Perhaps it might be a consideration to drop certain segments (e.g. the one on computation time felt unnecessary in my book) or even reduce the role of GPR (which due to their timely exit, I don't feel ended up contributing much to the paper) in favor of a more clear narative. But I leave this up to the authors.
** Specific comment **
(Line numbers are used approximately, some comments are later resolved, for those I add an UPDATE to my comment )
- How the authors relate their work to e.g. the work of the group of Prof. John D Sørensen? Who approaches this problem more from a probabilistic angle?
- One concern when working with 10 minute damages is the role of long term fatigue cycles, (E.g. See "Marsh e.a. - 2016 - Review and application of Rainflow residue process" and recently confirmed in "Sadeghi e.a. - 2022 - Fatigue damage calculation of offshore wind turbines' longterm data considering the low-frequency fatigue dynamics"). Obviously such information is lost when binning as the temporal sequence is lost, moreover it can be difficult to replicate when drawing from a random distribution in forecasting. A similar concern applies for sequence effects. Do the authors have any thoughts on the matter?
- Line 132 : authors state that mean values are not relevant for fatigue damage. In fact they are saying to ignore mean stress effects, this is a (valid) assumption, but I wouldn't say mean values aren't relevant.
- Figure 8, 9: Scatter plots with such a big dataset, don’t show the density differences of the data which I assume is more densely packed towards the middle of the cloud. Perhaps working with a different style plot.
- How does 315 relate to the dominant wind direction?
- Line 215 : the discussion on the “horizontal part” is interesting, but perhaps is phrased a bit too simplistic given there is a common terminology for it: ‘Fatigue limit’. Perhaps a wording “the S-N curve does not account for a fatigue limit in the material, i.e. a horizontal part at low stress cycles, …” is more appropriate as I do appreciate the authors’ effort of linking it with with such a strong visual cue
- Line 280 : I personally find the statement “fill up about 40% of the bins (cf. Fig.9)” overly dramatizes the problem. Yes Figure 9. Does show a lot of “empty space” but in fairness those missing conditions also represent improbable conditions (e.g. very high waves at low wind speed), so while 40% of the bins might be empty, their combined probability will lay well below 40%, probably even reasonably below 1%. The story of empty bins is correct, but might not have too much impact on the total outcome due to low probabilities.
- Eq 13: the authors opt to work with a absolute error on their prediction. But I’m curious to see the results with a sign. E.g. given the conservative nature of filling empty bins do we end up with a conservative damage estimate? In contrast ANN doesn’t have this built in conservatism, does it still produce conservative predictions (UPDATE: they later present results without a signed error)
- Figures 14: just to be 100% sure, each box in the boxplot only contains 13 samples?
- Figure 15 : is it possible to have the axis the same size as the adjacent Figure 14? I feel it is fair to compare these results.
- In section 4.1.4: for the binning and functional extrapolation, I assume you are still using the wind speeds for binning and extrapolation? Perhaps this should be emphasized a bit stronger.
- Section 4.1.4. : how are the “discrete” status’s fed into the ANN? As numeric values (operation=1, parked=2)?
- Section 4.2 : I’m a bit confused about this whole segment. For the binned technique you explicitly mention that the processing times goes up dramatically because you need to fill in the empty bins. I conclude that in your 1000 prediction, for every prediction you refill the bins and calculate the means of each bin? (i.e. this is the training phase of the binning method, retraining the damage map for every iteration) Do you do something similar for the ANN and GPR? So each iteration drawing new training data and then fitting a new model to said training data, so basically training a 1000 different models? Or are you just 1000 times evaluating into a single well performing model?
I then understand why computing time blows up. But if you did so, I don’t think I would ever consider the uncertainty of ANN by training 1000 models (without much oversight). I rather have an uncertainty come from feeding a small number of ANNs 1000 different validation datasets (e.g. drawn using bootstrapping) and quantify the uncertainty from those predictions.
If you don’t train an ANN for every evaluation then it is perhaps an unfair comparison as the training of the binning strategy is building these damage lookup tables.
- Figures 20 and 21 : how come that in Figure 20 the binned strategy is non-conservative (predicted damage > actual damage) and in figure 21 it is the other way around? Where the empty bins not conservative enough?
- Figure 25 : the role of when the measurements start is an interesting comment (and understandable), I assume a similar mechanism may also reduce the amount of training data required for the ANN to converge?
These are my comments, I hope they are clear and help to further improve the current work.
Kind regards, |