Leveraging Signal Processing and Machine Learning for Automated Fault Detection in Wind Turbine Drivetrains

Jamil, Faras; Peeters, Cédric; Verstraeten, Timothy; Helsen, Jan

doi:https://doi.org/10.5194/wes-2024-114

Preprints

https://doi.org/10.5194/wes-2024-114

Preprints

25 Nov 2024

| 25 Nov 2024

Status: a revised version of this preprint was accepted for the journal WES and is expected to appear here in due course.

Leveraging Signal Processing and Machine Learning for Automated Fault Detection in Wind Turbine Drivetrains

Faras Jamil, Cédric Peeters, Timothy Verstraeten, and Jan Helsen

Abstract. Wind energy is considered a sustainable renewable energy source; however, it faces the challenge of significant operating and maintenance costs. The research proposes a hybrid fault detection method to combine the physical domain knowledge with the machine learning models to provide an overview of the health of wind turbine drivetrain components. Signal processing indicators are computed from raw vibration signals measured from strategically placed accelerometers over drivetrain components. It produces an immense number of indicators as each indicator is sensitive towards certain types of faults, and manual monitoring becomes an unfeasible task. The machine learning models are trained using signal processing indicators and SCADA data. The normal behaviour modelling technique is employed to learn the healthy operation of the machine from data collected during healthy machine operation. The trained normal behaviour machine learning models label each indicator in a healthy or faulty state over time. The labelled state-of-the-art signal processing indicators are fused to provide a high-level health status overview of wind turbine drivetrain components. It helps to derive the required details from many condition indicators, which is valuable when managing multiple components in a single wind turbine across an entire wind farm. The proposed hybrid fault detection method is validated on an offshore wind farm with multiple years of condition monitoring data. It provides a high-level health overview that is readily understandable for non-expert wind farm operators, and for more detailed fault analysis, experts can conduct a comprehensive inspection.

Received: 12 Sep 2024 – Discussion started: 25 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Faras Jamil, Cédric Peeters, Timothy Verstraeten, and Jan Helsen

Status: closed

RC1:
'Comment on wes-2024-114', Anonymous Referee #1, 07 Dec 2024
The paper presents a robust hybrid methodology for fault detection. However, several areas require further improvement to enhance its clarity, rigor, and general applicability.

Clarity of Presentation: Figures, especially 6–9, are vital for understanding the results but are difficult to interpret due to anonymized axes and scales. While confidentiality is necessary, providing normalized or generalized labels (e.g., "Normalized Time," "Fault Indicator Count") would make the visualizations more accessible without compromising sensitive data. Additionally, the Bayesian Ridge Regression model is only briefly introduced; a more detailed justification for its use compared to other regression techniques would improve the methodological section.

Technical Gaps: While SCADA data is mentioned as an input, the integration process with vibration data is underexplored. Elaborating on preprocessing steps or synchronization challenges would provide a more comprehensive picture. Moreover, the paper does not report performance metrics such as precision, recall, or false alarm rates for the NBMs or hybrid system. Quantitative validation is crucial to assessing the method’s practical value.

Scalability Concerns: The approach requires training individual NBMs for each condition indicator across different operating regimes, which is computationally intensive and may be impractical for large-scale deployment. Although the paper acknowledges this issue, it does not propose concrete steps to address it. Exploring techniques like transfer learning, ensemble models, or feature reduction could mitigate this limitation.

Limited Discussion of Related Work: The paper could better contextualize its contributions by comparing the proposed method to other state-of-the-art approaches, such as fully data-driven deep learning systems. Highlighting the relative strengths and weaknesses of the hybrid approach would provide a clearer perspective on its novelty and utility.

Broader Applicability: The reliance on both vibration and SCADA data limits the applicability of the method to turbines equipped with these systems. The paper would benefit from discussing adaptations for turbines with only partial data availability or exploring how the method might generalize to other industrial systems.

Results Presentation: While the paper discusses case studies qualitatively, it lacks numerical summaries or statistical analysis of the detection performance. Providing such data would strengthen the evidence for the method’s effectiveness. Additionally, the scalability of the method across a fleet of wind turbines needs to be demonstrated more convincingly.

In summary, the paper provides valuable contributions to wind turbine condition monitoring but requires refinements in clarity, technical depth, and validation to maximize its impact. These revisions would ensure a more comprehensive and convincing presentation of the hybrid methodology.
Citation: https://doi.org/10.5194/wes-2024-114-RC1
- AC1: 'Reply on RC1', Faras Jamil, 03 Mar 2025
  
  We thank the Reviewer for taking the time to provide detailed feedback on our manuscript. Our response is provided below:
  
  Feedback:
  
  Clarity of Presentation: Figures, especially 6–9, are vital for understanding the results but are difficult to interpret due to anonymized axes and scales. While confidentiality is necessary, providing normalized or generalized labels (e.g., "Normalized Time," "Fault Indicator Count") would make the visualizations more accessible without compromising sensitive data. Additionally, the Bayesian Ridge Regression model is only briefly introduced; a more detailed justification for its use compared to other regression techniques would improve the methodological section.
  
  Response:
  The data used to validate the method is private, and we are not allowed to share specific details regarding the machine configurations or the monitoring period. Therefore, anonymised axes are used to present the results. However, as highlighted in the feedback, the high-level health indicator in each result figure (specifically the (i)th subfigure) shows the fault indicator and counts the number of condition indicators that are exhibiting a fault trend at a given time.
  Various regression techniques were evaluated before selecting Bayesian Ridge Regression. The details regarding the comparison of different regression models are now included in the manuscript, specifically in the "Normal Behavior Models" subsection (line 220).
  
  Feedback:
  
  Technical Gaps: While SCADA data is mentioned as an input, the integration process with vibration data is underexplored. Elaborating on preprocessing steps or synchronization challenges would provide a more comprehensive picture. Moreover, the paper does not report performance metrics such as precision, recall, or false alarm rates for the NBMs or hybrid system. Quantitative validation is crucial to assessing the method’s practical value.
  
  Response:
  As we are using real wind farm data, determining the exact time a fault is introduced into the system is challenging, making it difficult to provide a precise quantifiable performance measurement of the method. However, as suggested by the reviewer, we have conducted a performance analysis on data from 10 wind turbines (in the performance analysis subsection on line 310), based on the condition monitoring report and fault cases confirmed by technicians through manual inspection. This study provides a more quantitative assessment of fault prediction, including evaluation metrics such as precision, recall, and F1-score.
  The integration of SCADA data and vibration data is described in the "Hybrid Condition Monitoring Fault Detection Method" section (line 130).
  
  Feedback:
  
  Scalability Concerns: The approach requires training individual NBMs for each condition indicator across different operating regimes, which is computationally intensive and may be impractical for large-scale deployment. Although the paper acknowledges this issue, it does not propose concrete steps to address it. Exploring techniques like transfer learning, ensemble models, or feature reduction could mitigate this limitation.
  Response:
  
  The current approach offers a detailed analysis of each condition indicator but requires significant computational resources since separate NBMs must be trained for each indicator in every operating regime. To address this challenge, future work will focus on developing an explainable deep learning model that can adapt to all condition indicators simultaneously. This will provide both a high-level assessment of the turbine’s overall health and a detailed analysis of individual fault trends while significantly reducing computational demands.
  
  Feedback:
  
  Limited Discussion of Related Work: The paper could better contextualize its contributions by comparing the proposed method to other state-of-the-art approaches, such as fully data-driven deep learning systems. Highlighting the relative strengths and weaknesses of the hybrid approach would provide a clearer perspective on its novelty and utility.
  Response:
  The related work section has been expanded by adding more information about SCADA-based condition monitoring in the introduction section (line 50).
  
  Feedback:
  
  Broader Applicability: The reliance on both vibration and SCADA data limits the applicability of the method to turbines equipped with these systems. The paper would benefit from discussing adaptations for turbines with only partial data availability or exploring how the method might generalize to other industrial systems.
  Response:
  This research focuses on vibration-based condition monitoring and presents a framework for assessing the health of wind turbine drivetrain components using vibration data. While SCADA-based methods exist in the literature, vibration-based approaches are generally more reliable. Additional details on SCADA-based condition monitoring have been incorporated into the introduction section (line 50). Moreover, future work will focus on extracting operational information directly from high-quality vibration data to reduce reliance on SCADA data. This update is included in the discussion section (line 370).
  
  Feedback:
  
  Results Presentation: While the paper discusses case studies qualitatively, it lacks numerical summaries or statistical analysis of the detection performance. Providing such data would strengthen the evidence for the method’s effectiveness. Additionally, the scalability of the method across a fleet of wind turbines needs to be demonstrated more convincingly.
  Response:
  To quantitatively evaluate the performance of the proposed method, we have added a confusion matrix in the performance analysis subsection (line 310), along with evaluation metrics such as precision, recall, and F1-score, to assess fault detection effectiveness.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC1
RC2:
'Comment on wes-2024-114', Anonymous Referee #2, 21 Dec 2024

Would be interesting to see one of these studies beyond simulation.

Citation: https://doi.org/10.5194/wes-2024-114-RC2
- AC3: 'Reply on RC2', Faras Jamil, 04 Mar 2025
  
  We sincerely thank the Reviewer for taking the time to review our research and provide valuable feedback. Below is our response:
  Feeback:
  
  Would be interesting to see one of these studies beyond simulation.
  
  Response:
  
  The proposed method has been validated using real wind farm data, with results presented using anonymized axes due to confidentiality constraints. Additionally, to provide the quantitative evaluation, a performance analysis (detailed in the performance analysis subsection on line 310) is conducted on datasets from 10 wind turbines. This analysis includes a confusion matrix, which provides a quantitative assessment of the method.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC3
RC3:
'Comment on wes-2024-114', Anonymous Referee #3, 23 Dec 2024

The health monitoring or fault detection is one of the most significant research directions to improve the management of offshore wind turbines. This manuscript aims to develop a hybrid fault detection method that combines physical domain knowledge with machine learning models to provide an overview of the health of wind turbine drivetrain components. It provides a solid methodology of health evaluation for wind turbines and the performance of the model is validated by collected real data from wind farm.
Overall, I consider this manuscript well written with clear overview of the problem addressed and contribution provided. The methodology used in the manuscript is well designed and presented. I have one suggestion below that I would like to get discussed and addressed by the authors before publication.
In the model validation session, the authors only showcase the prediction result in figure 7-9 without further statistics on the results. It is not straightforward to get a sense of the model performance. Hence, could the authors provide a more quantified measurement for the performance of the trained machine learning model? For example, what is the percentage of correctly predicted faults? What is the ratio of false positive and false negative?

Citation: https://doi.org/10.5194/wes-2024-114-RC3
- AC2: 'Reply on RC3', Faras Jamil, 03 Mar 2025
  
  We thank the Reviewer for providing constructive feedback on our research. Our response is provided below:
  Feedback:
  
  In the model validation session, the authors only showcase the prediction result in figure 7-9 without further statistics on the results. It is not straightforward to get a sense of the model performance. Hence, could the authors provide a more quantified measurement for the performance of the trained machine learning model? For example, what is the percentage of correctly predicted faults? What is the ratio of false positive and false negative?
  
  Response:
  As suggested by the reviewer, we have conducted a performance analysis on data from 10 wind turbines (in the performance analysis subsection on line 310), based on the condition monitoring report and fault cases confirmed by technicians through manual inspection. This study offers a more quantitative assessment of fault prediction, including false alarms and missed fault detections.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC2

Status: closed

RC1:
'Comment on wes-2024-114', Anonymous Referee #1, 07 Dec 2024
The paper presents a robust hybrid methodology for fault detection. However, several areas require further improvement to enhance its clarity, rigor, and general applicability.

Clarity of Presentation: Figures, especially 6–9, are vital for understanding the results but are difficult to interpret due to anonymized axes and scales. While confidentiality is necessary, providing normalized or generalized labels (e.g., "Normalized Time," "Fault Indicator Count") would make the visualizations more accessible without compromising sensitive data. Additionally, the Bayesian Ridge Regression model is only briefly introduced; a more detailed justification for its use compared to other regression techniques would improve the methodological section.

Technical Gaps: While SCADA data is mentioned as an input, the integration process with vibration data is underexplored. Elaborating on preprocessing steps or synchronization challenges would provide a more comprehensive picture. Moreover, the paper does not report performance metrics such as precision, recall, or false alarm rates for the NBMs or hybrid system. Quantitative validation is crucial to assessing the method’s practical value.

Scalability Concerns: The approach requires training individual NBMs for each condition indicator across different operating regimes, which is computationally intensive and may be impractical for large-scale deployment. Although the paper acknowledges this issue, it does not propose concrete steps to address it. Exploring techniques like transfer learning, ensemble models, or feature reduction could mitigate this limitation.

Limited Discussion of Related Work: The paper could better contextualize its contributions by comparing the proposed method to other state-of-the-art approaches, such as fully data-driven deep learning systems. Highlighting the relative strengths and weaknesses of the hybrid approach would provide a clearer perspective on its novelty and utility.

Broader Applicability: The reliance on both vibration and SCADA data limits the applicability of the method to turbines equipped with these systems. The paper would benefit from discussing adaptations for turbines with only partial data availability or exploring how the method might generalize to other industrial systems.

Results Presentation: While the paper discusses case studies qualitatively, it lacks numerical summaries or statistical analysis of the detection performance. Providing such data would strengthen the evidence for the method’s effectiveness. Additionally, the scalability of the method across a fleet of wind turbines needs to be demonstrated more convincingly.

In summary, the paper provides valuable contributions to wind turbine condition monitoring but requires refinements in clarity, technical depth, and validation to maximize its impact. These revisions would ensure a more comprehensive and convincing presentation of the hybrid methodology.
Citation: https://doi.org/10.5194/wes-2024-114-RC1
- AC1: 'Reply on RC1', Faras Jamil, 03 Mar 2025
  
  We thank the Reviewer for taking the time to provide detailed feedback on our manuscript. Our response is provided below:
  
  Feedback:
  
  Clarity of Presentation: Figures, especially 6–9, are vital for understanding the results but are difficult to interpret due to anonymized axes and scales. While confidentiality is necessary, providing normalized or generalized labels (e.g., "Normalized Time," "Fault Indicator Count") would make the visualizations more accessible without compromising sensitive data. Additionally, the Bayesian Ridge Regression model is only briefly introduced; a more detailed justification for its use compared to other regression techniques would improve the methodological section.
  
  Response:
  The data used to validate the method is private, and we are not allowed to share specific details regarding the machine configurations or the monitoring period. Therefore, anonymised axes are used to present the results. However, as highlighted in the feedback, the high-level health indicator in each result figure (specifically the (i)th subfigure) shows the fault indicator and counts the number of condition indicators that are exhibiting a fault trend at a given time.
  Various regression techniques were evaluated before selecting Bayesian Ridge Regression. The details regarding the comparison of different regression models are now included in the manuscript, specifically in the "Normal Behavior Models" subsection (line 220).
  
  Feedback:
  
  Technical Gaps: While SCADA data is mentioned as an input, the integration process with vibration data is underexplored. Elaborating on preprocessing steps or synchronization challenges would provide a more comprehensive picture. Moreover, the paper does not report performance metrics such as precision, recall, or false alarm rates for the NBMs or hybrid system. Quantitative validation is crucial to assessing the method’s practical value.
  
  Response:
  As we are using real wind farm data, determining the exact time a fault is introduced into the system is challenging, making it difficult to provide a precise quantifiable performance measurement of the method. However, as suggested by the reviewer, we have conducted a performance analysis on data from 10 wind turbines (in the performance analysis subsection on line 310), based on the condition monitoring report and fault cases confirmed by technicians through manual inspection. This study provides a more quantitative assessment of fault prediction, including evaluation metrics such as precision, recall, and F1-score.
  The integration of SCADA data and vibration data is described in the "Hybrid Condition Monitoring Fault Detection Method" section (line 130).
  
  Feedback:
  
  Scalability Concerns: The approach requires training individual NBMs for each condition indicator across different operating regimes, which is computationally intensive and may be impractical for large-scale deployment. Although the paper acknowledges this issue, it does not propose concrete steps to address it. Exploring techniques like transfer learning, ensemble models, or feature reduction could mitigate this limitation.
  Response:
  
  The current approach offers a detailed analysis of each condition indicator but requires significant computational resources since separate NBMs must be trained for each indicator in every operating regime. To address this challenge, future work will focus on developing an explainable deep learning model that can adapt to all condition indicators simultaneously. This will provide both a high-level assessment of the turbine’s overall health and a detailed analysis of individual fault trends while significantly reducing computational demands.
  
  Feedback:
  
  Limited Discussion of Related Work: The paper could better contextualize its contributions by comparing the proposed method to other state-of-the-art approaches, such as fully data-driven deep learning systems. Highlighting the relative strengths and weaknesses of the hybrid approach would provide a clearer perspective on its novelty and utility.
  Response:
  The related work section has been expanded by adding more information about SCADA-based condition monitoring in the introduction section (line 50).
  
  Feedback:
  
  Broader Applicability: The reliance on both vibration and SCADA data limits the applicability of the method to turbines equipped with these systems. The paper would benefit from discussing adaptations for turbines with only partial data availability or exploring how the method might generalize to other industrial systems.
  Response:
  This research focuses on vibration-based condition monitoring and presents a framework for assessing the health of wind turbine drivetrain components using vibration data. While SCADA-based methods exist in the literature, vibration-based approaches are generally more reliable. Additional details on SCADA-based condition monitoring have been incorporated into the introduction section (line 50). Moreover, future work will focus on extracting operational information directly from high-quality vibration data to reduce reliance on SCADA data. This update is included in the discussion section (line 370).
  
  Feedback:
  
  Results Presentation: While the paper discusses case studies qualitatively, it lacks numerical summaries or statistical analysis of the detection performance. Providing such data would strengthen the evidence for the method’s effectiveness. Additionally, the scalability of the method across a fleet of wind turbines needs to be demonstrated more convincingly.
  Response:
  To quantitatively evaluate the performance of the proposed method, we have added a confusion matrix in the performance analysis subsection (line 310), along with evaluation metrics such as precision, recall, and F1-score, to assess fault detection effectiveness.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC1
RC2:
'Comment on wes-2024-114', Anonymous Referee #2, 21 Dec 2024

Would be interesting to see one of these studies beyond simulation.

Citation: https://doi.org/10.5194/wes-2024-114-RC2
- AC3: 'Reply on RC2', Faras Jamil, 04 Mar 2025
  
  We sincerely thank the Reviewer for taking the time to review our research and provide valuable feedback. Below is our response:
  Feeback:
  
  Would be interesting to see one of these studies beyond simulation.
  
  Response:
  
  The proposed method has been validated using real wind farm data, with results presented using anonymized axes due to confidentiality constraints. Additionally, to provide the quantitative evaluation, a performance analysis (detailed in the performance analysis subsection on line 310) is conducted on datasets from 10 wind turbines. This analysis includes a confusion matrix, which provides a quantitative assessment of the method.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC3
RC3:
'Comment on wes-2024-114', Anonymous Referee #3, 23 Dec 2024

The health monitoring or fault detection is one of the most significant research directions to improve the management of offshore wind turbines. This manuscript aims to develop a hybrid fault detection method that combines physical domain knowledge with machine learning models to provide an overview of the health of wind turbine drivetrain components. It provides a solid methodology of health evaluation for wind turbines and the performance of the model is validated by collected real data from wind farm.
Overall, I consider this manuscript well written with clear overview of the problem addressed and contribution provided. The methodology used in the manuscript is well designed and presented. I have one suggestion below that I would like to get discussed and addressed by the authors before publication.
In the model validation session, the authors only showcase the prediction result in figure 7-9 without further statistics on the results. It is not straightforward to get a sense of the model performance. Hence, could the authors provide a more quantified measurement for the performance of the trained machine learning model? For example, what is the percentage of correctly predicted faults? What is the ratio of false positive and false negative?

Citation: https://doi.org/10.5194/wes-2024-114-RC3
- AC2: 'Reply on RC3', Faras Jamil, 03 Mar 2025
  
  We thank the Reviewer for providing constructive feedback on our research. Our response is provided below:
  Feedback:
  
  In the model validation session, the authors only showcase the prediction result in figure 7-9 without further statistics on the results. It is not straightforward to get a sense of the model performance. Hence, could the authors provide a more quantified measurement for the performance of the trained machine learning model? For example, what is the percentage of correctly predicted faults? What is the ratio of false positive and false negative?
  
  Response:
  As suggested by the reviewer, we have conducted a performance analysis on data from 10 wind turbines (in the performance analysis subsection on line 310), based on the condition monitoring report and fault cases confirmed by technicians through manual inspection. This study offers a more quantitative assessment of fault prediction, including false alarms and missed fault detections.
  
  Citation: https://doi.org/10.5194/wes-2024-114-AC2

Faras Jamil, Cédric Peeters, Timothy Verstraeten, and Jan Helsen

Viewed

Total article views: 472 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
333	114	25	472	28	41

HTML: 333
PDF: 114
XML: 25
Total: 472
BibTeX: 28
EndNote: 41

Views and downloads (calculated since 25 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	41	11	2	54
Dec 2024	67	27	4	98
Jan 2025	27	9	0	36
Feb 2025	25	3	1	29
Mar 2025	42	23	4	69
Apr 2025	41	14	2	57
May 2025	16	1	3	20
Jun 2025	42	17	8	67
Jul 2025	27	5	1	33
Aug 2025	5	4	0	9

Cumulative views and downloads (calculated since 25 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	41	11	2	54
Dec 2024	67	27	4	98
Jan 2025	27	9	0	36
Feb 2025	25	3	1	29
Mar 2025	42	23	4	69
Apr 2025	41	14	2	57
May 2025	16	1	3	20
Jun 2025	42	17	8	67
Jul 2025	27	5	1	33
Aug 2025	5	4	0	9

Viewed (geographical distribution)

Total article views: 440 (including HTML, PDF, and XML) Thereof 440 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Aug 2025

Short summary

A hybrid fault detection method is proposed, which combines physical domain knowledge with machine learning models to automatically detect mechanical faults in wind turbine drivetrain components. It offers detailed insights for experts while giving operators a high-level overview of the machine's health to assist in planning effective maintenance strategies. It was validated on multiple years of wind farm data and the potential faults were accurately predicted, which was confirmed by experts.


Total:	0
HTML:	0
PDF:	0
XML:	0