the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Minimum Open Data Subset for Wind Power Prediction
Abstract. Accurate wind power prediction is required for grid integration of renewables, minimizing curtailment of renewable energy, and performing resource assessments. Prior research has explored the use of numerical weather prediction, reanalysis datasets, and observational data in power prediction and resource assessment applications. Observational data is spatially limited and often proprietary. Reanalysis datasets are available globally, but have a large spatial resolution and therefore do not capture the effects of complex geography well. Numerical weather prediction simulations allow for high spatial resolution flow models, but require significant processing resources and computational time. This work combines historical wind power production data, observational data, MERRA-2 reanalysis, and WRF model data at three wind farms in Ontario, Canada to determine the optimal data source, combination of data sources, and variables for prediction of wind power using a random forests model. Results show that a model combining select data from all three data sources, including a combination of wind speed, time, and other weather variables, improves predictive performance by up to 57 % over the benchmark power curve model. Analysis of feature importance shows that aggregating wind speed allows the model to make better use of additional weather features. The minimum subset of input data for the best performing model, which achieves a mean absolute error (MAE) of 0.071 across all sites, consists of averaged wind speed, temperature, wind direction, pressure, air density, and time variables (hour, day and month).
- Preprint
(9611 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on wes-2025-29', Anonymous Referee #1, 12 Apr 2025
Dear authors and editor, while this manuscript is of fair technical value I do not consider it worth a publication in WES in its current state for the following reasons:
- objectives and context: the objectives of a Wind Resource and Energy Yield Assessments (WRA and EYA) are primarily to estimate the long term net production of a wind farm. The paper is focusing on metrics (MAE and RMSE) which are not commonly used for WRA and EYA. A revised version of the paper should use the commonly accepted framework, so as to bring value to practitioners in the field.
- methods: the methods used by practitioners (measurement data analysis, long term correction, spatial extrapolation, gross and net (including wakes) energy yield, operational losses) are not discussed in this paper. A revised value of the paper should at least discuss these methods and how the proposed alternatives add value compare to these traditional methods. See for reference the WES article https://wes.copernicus.org/articles/6/311/2021/ and the reports from the Wind Plant Performance Prediction (WP3) NREL project. Also, a revised version of the manuscript should dwelve much more into details regarding the wind climate (macro-meso-micro) at every wind farm, through a description of the sites in terms of large-scale forcing, orography, roughness, atmospheric stability, wind turbine layout and characteristics. This information (derived from model and measurements) should be used to discuss the differences in model results (the model works differently for the third wind farm). See example such as https://doi.org/10.1127/metz/2021/1068. Lastly, a revised value of the paper should compare several reanalysis (at least ERA5 should be added).
- readibility: the manuscript needs to be shortened, in particular the part on the literature review. I understand the work derives from a Master Thesis, for which the style used (technical report) is fine. But here for a WES paper, the manuscript should be more concise and clearly insist on the novelty/added value of the proposed approach compared to existing frameworks.All the best,
Rémi Gandoin
Citation: https://doi.org/10.5194/wes-2025-29-RC1 -
RC2: 'Comment on wes-2025-29', Anonymous Referee #2, 23 Apr 2025
"general comments"
This paper investigates different ways of estimating the historical wind power available in a region. It uses the recorded actual wind power generation from three test locations in Canada to evaluate the skill of different approaches. Three different sources of atmospheric data are used to generate wind power: meteorological stations, WRF forecast, and MERRA-2 reanalysis. The skill of the sources are considered separately, as well as in combination. The importance of different atmospheric variables is also estimated. The model with the best skill scores used variables from all sources, and used 10 variables, emphasizing the importance of using atmospheric variables beyond simply wind speed. This is a well-written paper, with a logical and clear methodology, and presents interesting results and discussion.
"specific comments"
- As the paper uses inputs from reanalysis and met stations, it should be made clear that this method can not be used for operational forecasting.
- There are well-published biases within the MERRA-2 reanalysis dataset. As the authors mention in their conclusions, it would probably have been better to have used the ERA5 dataset instead, or, ideally, to have compared the skill of the two for the particular locations used in this study.
- 10 metre met station wind speed was extrapolated (not interpolated) to hub height using a logarithmic wind speed profile. There are many issues with this, including the importance of atmospheric stability, this could be highlighted as a source of uncertainty.
- Is MERRA-2 also extrapolated from 10m wind? Note: ERA5 records wind speed at 10m and 100m, which is an advantage here.
- Was WRF wind output at turbine hub height?
- The paper is based on only four months of data, which is quite a short time period. Results would be more rigorous if at least one year of simulations were generated.
- Table 4: min and max don’t mean much for wind direction. Perhaps something else, like standard deviation, would be more relevant?
"technical corrections"
- Table 1 does not seem to be referred to in the text.
- Line 168: typo “acquired from the and”
- Table 4: typo: column heading “WRF1” should, I think, be “WF1”
- Line 194: missing reference “acquired from the and”provided power curves (?)”
- Line 222: type: “Figure 3a” should be “Figure 4a”
- Line 265: perhaps change “Scenario 5” to “the last column”?
Conor Sweeney, UCD, Dublin, Ireland.
Citation: https://doi.org/10.5194/wes-2025-29-RC2 -
RC3: 'Comment on wes-2025-29', Anonymous Referee #3, 14 May 2025
The paper combines WRF, a random forest model and meteorological stations to improve predictions of wind farm output. This may have relevance for power forecasting. However, in its current shape I unfortunately have to reject the paper. Many figures, tables and the appendices are never discussed: all information provided in the paper should contribute to a clear story and if not discussed should be removed. I suggest to summarize the results in a more clear way, for example by aggregating the results of the three wind farms and presenting less numbers or reducing the number of variables that you discuss from the random forest model. If they are contributing so marginally to better results, just remove them from figures and tables and mention it shortly in the text instead. The title also has to be revised (see below). If you put "open data" in the title you would expect some data to be available as part of the paper, but it seems like the open part is only about the data that is used in the paper. If the data is somewhere on zenodo the paper would be more useful.
Title: The title does not cover the actual contents of the paper. The main ingredients are mesoscale modelling, the random forest model and observations. That should be made clear from the title somehow.
l3: The paper is only relevant for power forecasting and not for wind resource assessment. For wind resource assessment the use of 10 m masts would never be accepted and the small improvements in RMSE are only relevant for power forecasting. For wind resource assessment you will need to take aspects like long-term correction etc. into account as well.
l4: large spatial resolution -> coarse spatial resolution
Fig 1: Add the abbreviations WF1-3 also in the map, this makes the text easier to follow.
l116: What does noa mean? Better to cite the paper instead: https://journals.ametsoc.org/view/journals/bams/104/8/BAMS-D-21-0075.1.xml
l155: How did you include the positions of current wind farms? Which power and thrust curves were used?
l168: acquired from the and verified -> acquired from what?
Table 4: this table contains too much information that is not discussed in the text. Remove or condense.
l226: you mean the impact of precipitation was limited or the actual precipitation (in mm?) was limited?
l205: Why is it a problem if it is always wind speed that is classified as most important? This represents that wind speed IS simply the most important for predicting power. Why would you artificially add more variables to you problem if they don't add more information?Citation: https://doi.org/10.5194/wes-2025-29-RC3
Status: closed
-
RC1: 'Comment on wes-2025-29', Anonymous Referee #1, 12 Apr 2025
Dear authors and editor, while this manuscript is of fair technical value I do not consider it worth a publication in WES in its current state for the following reasons:
- objectives and context: the objectives of a Wind Resource and Energy Yield Assessments (WRA and EYA) are primarily to estimate the long term net production of a wind farm. The paper is focusing on metrics (MAE and RMSE) which are not commonly used for WRA and EYA. A revised version of the paper should use the commonly accepted framework, so as to bring value to practitioners in the field.
- methods: the methods used by practitioners (measurement data analysis, long term correction, spatial extrapolation, gross and net (including wakes) energy yield, operational losses) are not discussed in this paper. A revised value of the paper should at least discuss these methods and how the proposed alternatives add value compare to these traditional methods. See for reference the WES article https://wes.copernicus.org/articles/6/311/2021/ and the reports from the Wind Plant Performance Prediction (WP3) NREL project. Also, a revised version of the manuscript should dwelve much more into details regarding the wind climate (macro-meso-micro) at every wind farm, through a description of the sites in terms of large-scale forcing, orography, roughness, atmospheric stability, wind turbine layout and characteristics. This information (derived from model and measurements) should be used to discuss the differences in model results (the model works differently for the third wind farm). See example such as https://doi.org/10.1127/metz/2021/1068. Lastly, a revised value of the paper should compare several reanalysis (at least ERA5 should be added).
- readibility: the manuscript needs to be shortened, in particular the part on the literature review. I understand the work derives from a Master Thesis, for which the style used (technical report) is fine. But here for a WES paper, the manuscript should be more concise and clearly insist on the novelty/added value of the proposed approach compared to existing frameworks.All the best,
Rémi Gandoin
Citation: https://doi.org/10.5194/wes-2025-29-RC1 -
RC2: 'Comment on wes-2025-29', Anonymous Referee #2, 23 Apr 2025
"general comments"
This paper investigates different ways of estimating the historical wind power available in a region. It uses the recorded actual wind power generation from three test locations in Canada to evaluate the skill of different approaches. Three different sources of atmospheric data are used to generate wind power: meteorological stations, WRF forecast, and MERRA-2 reanalysis. The skill of the sources are considered separately, as well as in combination. The importance of different atmospheric variables is also estimated. The model with the best skill scores used variables from all sources, and used 10 variables, emphasizing the importance of using atmospheric variables beyond simply wind speed. This is a well-written paper, with a logical and clear methodology, and presents interesting results and discussion.
"specific comments"
- As the paper uses inputs from reanalysis and met stations, it should be made clear that this method can not be used for operational forecasting.
- There are well-published biases within the MERRA-2 reanalysis dataset. As the authors mention in their conclusions, it would probably have been better to have used the ERA5 dataset instead, or, ideally, to have compared the skill of the two for the particular locations used in this study.
- 10 metre met station wind speed was extrapolated (not interpolated) to hub height using a logarithmic wind speed profile. There are many issues with this, including the importance of atmospheric stability, this could be highlighted as a source of uncertainty.
- Is MERRA-2 also extrapolated from 10m wind? Note: ERA5 records wind speed at 10m and 100m, which is an advantage here.
- Was WRF wind output at turbine hub height?
- The paper is based on only four months of data, which is quite a short time period. Results would be more rigorous if at least one year of simulations were generated.
- Table 4: min and max don’t mean much for wind direction. Perhaps something else, like standard deviation, would be more relevant?
"technical corrections"
- Table 1 does not seem to be referred to in the text.
- Line 168: typo “acquired from the and”
- Table 4: typo: column heading “WRF1” should, I think, be “WF1”
- Line 194: missing reference “acquired from the and”provided power curves (?)”
- Line 222: type: “Figure 3a” should be “Figure 4a”
- Line 265: perhaps change “Scenario 5” to “the last column”?
Conor Sweeney, UCD, Dublin, Ireland.
Citation: https://doi.org/10.5194/wes-2025-29-RC2 -
RC3: 'Comment on wes-2025-29', Anonymous Referee #3, 14 May 2025
The paper combines WRF, a random forest model and meteorological stations to improve predictions of wind farm output. This may have relevance for power forecasting. However, in its current shape I unfortunately have to reject the paper. Many figures, tables and the appendices are never discussed: all information provided in the paper should contribute to a clear story and if not discussed should be removed. I suggest to summarize the results in a more clear way, for example by aggregating the results of the three wind farms and presenting less numbers or reducing the number of variables that you discuss from the random forest model. If they are contributing so marginally to better results, just remove them from figures and tables and mention it shortly in the text instead. The title also has to be revised (see below). If you put "open data" in the title you would expect some data to be available as part of the paper, but it seems like the open part is only about the data that is used in the paper. If the data is somewhere on zenodo the paper would be more useful.
Title: The title does not cover the actual contents of the paper. The main ingredients are mesoscale modelling, the random forest model and observations. That should be made clear from the title somehow.
l3: The paper is only relevant for power forecasting and not for wind resource assessment. For wind resource assessment the use of 10 m masts would never be accepted and the small improvements in RMSE are only relevant for power forecasting. For wind resource assessment you will need to take aspects like long-term correction etc. into account as well.
l4: large spatial resolution -> coarse spatial resolution
Fig 1: Add the abbreviations WF1-3 also in the map, this makes the text easier to follow.
l116: What does noa mean? Better to cite the paper instead: https://journals.ametsoc.org/view/journals/bams/104/8/BAMS-D-21-0075.1.xml
l155: How did you include the positions of current wind farms? Which power and thrust curves were used?
l168: acquired from the and verified -> acquired from what?
Table 4: this table contains too much information that is not discussed in the text. Remove or condense.
l226: you mean the impact of precipitation was limited or the actual precipitation (in mm?) was limited?
l205: Why is it a problem if it is always wind speed that is classified as most important? This represents that wind speed IS simply the most important for predicting power. Why would you artificially add more variables to you problem if they don't add more information?Citation: https://doi.org/10.5194/wes-2025-29-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
151 | 44 | 11 | 206 | 12 | 14 |
- HTML: 151
- PDF: 44
- XML: 11
- Total: 206
- BibTeX: 12
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1