Comment on wes-2021-90

This manuscript proposes a novel algorithm to identify and characterize wakes in 2D datasets. This technique is based on "thresholding" approaches used to define wake regions, and it is potentially powerful because it appears to circumvent the need to handselect a threshold, which thereby makes it easier to identify wakes in datasets with many samples. The authors demonstrate their algorithm on LES data and then apply it to a day's worth of measurements in the North Sea. In this study, this automatic thresholding algorithm can signficiantly outperform relative to a contstant threshold algorithm (e.g. Fig 15). This automatic thresholding algorithm appears to perform better than "Gaussian" techniques at times and worse at other times. All in all, this new algorithm has potential to be broadly applicable in future wake studies, and as such, I believe the algorithm has the potential to significantly contribute to scientific studies. However, I have a number of major concerns regarding the manuscript that prohibit me from accepting it in its current state. I recommend Major Revisions, and detail these concerns below.

expected that wakes would have been segmented differently. I thought that a bounding countour was going to be manually drawn around every wake. This means that the validation study very specifically tests the question of "can we automatically pick the best threshold between 0 and 1" instead of "is our new idenfitication/characterization algorithm the best algorithm for the task". This is still presumably an important question, but we wish the authors to more clearly state that the primary goal of the text is to automatically pick the best threshold for wake identification * The terminology regarding different wake identification algorithms was ambiguous, which made it difficult to assess what technique was being used at a given time * "Image processing" is a massive field of study, and as such, it is ambiguous to refer to one wake identification algorithm as simply an "impage processing" technique. Please be more specific about this terminology, and if this technique is coming from the image processing field, provide a citation as such.
* Throughout the manuscript, please make a stronger distinction when "wake identification" is being carried out vs. when "wake characterization" is being carried out, in accordance with the cited Quon et al. (2020). Unless its meaning is exceedingly obvious in the context, please refrain from using "wake detection" * e.g. Sec 4.2: The title of this section says "Wake detection". However, this subsection appears to be describing wake *characterization* moreso than wake *identification* (and "detection" implies "identification"). * The structure of the different sections and subsections was unexpected, and as such, it made the narrative of the document more difficult to follow.
* I recommend that wake identification methodology subsections should be placed next to eachother in Section 4, and similarly, wake characterization metholody subsections should be grouped together * I recommend that Section 2 be made into a subsection of the contents Section 3 * Is this truly an automatic algorithm that avoids the need to manually preprocess and manually segment? * L139-144: Is the despiking process manual or automatic? If it is manual, that potentially hinders the ability to apply the ATS algorithm on larger datasets. * Sec 4.1: This section has statements like "A similar point in the first derivative graph *can* be used" and "only a second derivative inflection point *can* be used". *Are* they used in your algorithm? Please be precise.
* L265-266: The use of "preferably" makes me think that this algorithm is not automatic * The conclusion does not sufficiently summarize the manuscript * Please summarize the strengths and weaknesses of the novel algorithm, especially relative to the other wake identification/characterization techniques * L329: How is the "free flow" wind speed calculated? Is this automatic or manual?

Minor Comments
* L3: What is the difference between a "wake pattern" and a "wake shape"? * L18: It is a drastic oversimplification to state that the velocity deifict at 5D decreases to 20%. Consider citing review articles such as Stevens and Meneveua (2017) and Prote-Agel et al. (2020) in the introduction * L20: Please provide a citation for "The typical turbine spacing in the wind farms is usually 8D", as I believe onshore and offshore spacing differ * L38: Is it correct to say that lidar measurements are in situ? I think of lidars as remote sensing instruments * L63: This sentence implies that thresholding algorithms are always applied 6-8D downwind of a turbine * L69: What is the difference between an "image" and "processed wind speed data"? Is an "image" a raw version of the wind speed data? * L116-117: I am surprised to hear that you only encounter two types of noise. I imagine there are more variables that confound your signal (e.g. solid objects). Perhaps it is more appropriate to say "two primary challenges that obfuscate the wake signal" rather than "noise". Also, please clarify what is meant by "high wind speed due to the measurement error" * L121-124: I like this description of lidar limitiations * L125: Where does the reference wind direction come from? A sonic on the mast? Wake angles? * L150: What does "reference data" refer to? * L163: What is "directional entropy"? How does it differ from Shannon entropy? * L174-177: Are all the low entropy scans indicative of strong crosswind effects? The "also" on L175 makes it seem like this is only true sometimes. Also to clarify, are "cross wind corrupted" scans and "spiked data" scans the same? Also, what is the difference between mostly blue scans (e.g. index 310) vs paritually blue scans (e.g. 405)? * L182: Scans 1-50 and 301-375 show substantial decreases. Why are smaller ranges stated? * Figure 5: Label the AV7 and AV10 wake * L210: Could you please clarify why the "bimodal subset" sees a bimodal distribution of wind speeds but the "parallel wakes" subset does not see a bimodal distribution? I don't understand why the "parallel wakes" subset also wouldn't see one large peak that represents the free flow and a second peak that represents the wakessds. Does this happen because the "bimodal" wakes largely stay within the field of view of the lidar whereas the "parallel" wakes leave? * Sec 3.4: What is the wind speed forcing of the LES? * L232: You deal with at least two types of velocity distributions: unimodal (Fig 8a) and bimodal (Fig 8c). I am confused why you say you tend to only see one peak. As a reader at this point, I am wondering how the ATS algorithm performs on unimodal vs. bimodal data. * L248-257: You say "We detect the threshold at the point where… the curvature approaches zero". But when you also say that "the curvature graph tail may fluctuate and complicate the detection of zero curvature". I am confused -do you use curvature, the second derivative, or the first derivative to select your threshold. Are you doing this on polynomial-fit curves or on the raw curves? Please clarify. This is one of the most important sections of the paper, but it is difficult to understand your algorithm. * L264: What does "shape" mean? * L269: Please demonstrate centerline detection on an instantaneous wake so we could get a sense of how this behaves on observational data * L276: Remove the word "presumably" * Fig 10c: "helper lines" and "intersections" have the same legend elements * L283-284: Why is the wake direction ambiguous under these conditions? I would think that the wake direction is especially obvious in the "aligned" scenario * L285: I asked this earlier, but where does the reference wind direction come from? * L287-288: This seems like a major limitation. In the conclusion, please note that your algorithm does not work on weak wakes. * L295: "This feature" implies that the important feature is the transition from a single Gaussian to a double Gaussian. I believe you would like to say that the Gaussian distribution of wake deficit speeds is the important feature * L297: Can you roughly quantify "small plane inclination"? * L319: Do reference wind directions and actual wake directions often significantly differ? If so, could you quantify a large discrepancy from either literature or this analysis? * L343: Is the centerline assumed to be a straight line (as is assumed with the Gaussian calculations) or it is allowed to turn? * L356: Do additional errors occur or do they not occur? * L371: Does "wake deficit" refer to the Gaussian method? * L387: What does "it" refer to? * Figs. 13 and 14: Impressive results! * Figure 15: Why is the Corrupted data included here but excluded in Table 2? * L430: Could you please remind me -does the ATS centerline detection only work for the closest wake shape? Is that why the algorithm doesn't detect the centerlines in the far wake? * L450: The previous section also compared Gauss and the ATS. Please be more specific with this header. * L465-466: I am surprised that the estimated wake direction deviates so strongly from the reference wind direction. You cite a few studies in the following paragraphs. How large are the deviations in those studies? * L490: See my comment about L210. Also, does your quantification of "near wake" agree with standard definitions of "near wake"? Also, please state why someone would want to distinguish between "near wake" and "far wake" within a lidar scan. * L505-506: As written, a reader would not understand that you developed a new preprocessing methodology. Please make that clearer.