Minimum Open Data Subset for Wind Power Prediction

von Zuben, Elizabeth; Schell, Kristen R.

doi:10.5194/wes-2025-29

Preprints

https://doi.org/10.5194/wes-2025-29

Preprints

20 Mar 2025

| 20 Mar 2025

Status: this discussion paper is a preprint. It has been under review for the journal Wind Energy Science (WES). The manuscript was not accepted for further review after discussion.

Minimum Open Data Subset for Wind Power Prediction

Elizabeth von Zuben and Kristen R. Schell

Abstract. Accurate wind power prediction is required for grid integration of renewables, minimizing curtailment of renewable energy, and performing resource assessments. Prior research has explored the use of numerical weather prediction, reanalysis datasets, and observational data in power prediction and resource assessment applications. Observational data is spatially limited and often proprietary. Reanalysis datasets are available globally, but have a large spatial resolution and therefore do not capture the effects of complex geography well. Numerical weather prediction simulations allow for high spatial resolution flow models, but require significant processing resources and computational time. This work combines historical wind power production data, observational data, MERRA-2 reanalysis, and WRF model data at three wind farms in Ontario, Canada to determine the optimal data source, combination of data sources, and variables for prediction of wind power using a random forests model. Results show that a model combining select data from all three data sources, including a combination of wind speed, time, and other weather variables, improves predictive performance by up to 57 % over the benchmark power curve model. Analysis of feature importance shows that aggregating wind speed allows the model to make better use of additional weather features. The minimum subset of input data for the best performing model, which achieves a mean absolute error (MAE) of 0.071 across all sites, consists of averaged wind speed, temperature, wind direction, pressure, air density, and time variables (hour, day and month).

Received: 17 Feb 2025 – Discussion started: 20 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Elizabeth von Zuben and Kristen R. Schell

Status: closed

RC1: 'Comment on wes-2025-29', Anonymous Referee #1, 12 Apr 2025

Dear authors and editor, while this manuscript is of fair technical value I do not consider it worth a publication in WES in its current state for the following reasons:
- objectives and context: the objectives of a Wind Resource and Energy Yield Assessments (WRA and EYA) are primarily to estimate the long term net production of a wind farm. The paper is focusing on metrics (MAE and RMSE) which are not commonly used for WRA and EYA. A revised version of the paper should use the commonly accepted framework, so as to bring value to practitioners in the field.
- methods: the methods used by practitioners (measurement data analysis, long term correction, spatial extrapolation, gross and net (including wakes) energy yield, operational losses) are not discussed in this paper. A revised value of the paper should at least discuss these methods and how the proposed alternatives add value compare to these traditional methods. See for reference the WES article https://wes.copernicus.org/articles/6/311/2021/ and the reports from the Wind Plant Performance Prediction (WP3) NREL project. Also, a revised version of the manuscript should dwelve much more into details regarding the wind climate (macro-meso-micro) at every wind farm, through a description of the sites in terms of large-scale forcing, orography, roughness, atmospheric stability, wind turbine layout and characteristics. This information (derived from model and measurements) should be used to discuss the differences in model results (the model works differently for the third wind farm). See example such as https://doi.org/10.1127/metz/2021/1068. Lastly, a revised value of the paper should compare several reanalysis (at least ERA5 should be added).

- readibility: the manuscript needs to be shortened, in particular the part on the literature review. I understand the work derives from a Master Thesis, for which the style used (technical report) is fine. But here for a WES paper, the manuscript should be more concise and clearly insist on the novelty/added value of the proposed approach compared to existing frameworks.
All the best,
Rémi Gandoin

Citation: https://doi.org/10.5194/wes-2025-29-RC1
RC2:
'Comment on wes-2025-29', Anonymous Referee #2, 23 Apr 2025
"general comments"
This paper investigates different ways of estimating the historical wind power available in a region. It uses the recorded actual wind power generation from three test locations in Canada to evaluate the skill of different approaches. Three different sources of atmospheric data are used to generate wind power: meteorological stations, WRF forecast, and MERRA-2 reanalysis. The skill of the sources are considered separately, as well as in combination. The importance of different atmospheric variables is also estimated. The model with the best skill scores used variables from all sources, and used 10 variables, emphasizing the importance of using atmospheric variables beyond simply wind speed. This is a well-written paper, with a logical and clear methodology, and presents interesting results and discussion.

"specific comments"
As the paper uses inputs from reanalysis and met stations, it should be made clear that this method can not be used for operational forecasting.

There are well-published biases within the MERRA-2 reanalysis dataset. As the authors mention in their conclusions, it would probably have been better to have used the ERA5 dataset instead, or, ideally, to have compared the skill of the two for the particular locations used in this study.

10 metre met station wind speed was extrapolated (not interpolated) to hub height using a logarithmic wind speed profile. There are many issues with this, including the importance of atmospheric stability, this could be highlighted as a source of uncertainty.

Is MERRA-2 also extrapolated from 10m wind? Note: ERA5 records wind speed at 10m and 100m, which is an advantage here.

Was WRF wind output at turbine hub height?

The paper is based on only four months of data, which is quite a short time period. Results would be more rigorous if at least one year of simulations were generated.

Table 4: min and max don’t mean much for wind direction. Perhaps something else, like standard deviation, would be more relevant?

"technical corrections"
Table 1 does not seem to be referred to in the text.

Line 168: typo “acquired from the and”

Table 4: typo: column heading “WRF1” should, I think, be “WF1”

Line 194: missing reference “acquired from the and”provided power curves (?)”

Line 222: type: “Figure 3a” should be “Figure 4a”

Line 265: perhaps change “Scenario 5” to “the last column”?

Conor Sweeney, UCD, Dublin, Ireland.
Citation: https://doi.org/10.5194/wes-2025-29-RC2
RC3: 'Comment on wes-2025-29', Anonymous Referee #3, 14 May 2025

The paper combines WRF, a random forest model and meteorological stations to improve predictions of wind farm output. This may have relevance for power forecasting. However, in its current shape I unfortunately have to reject the paper. Many figures, tables and the appendices are never discussed: all information provided in the paper should contribute to a clear story and if not discussed should be removed. I suggest to summarize the results in a more clear way, for example by aggregating the results of the three wind farms and presenting less numbers or reducing the number of variables that you discuss from the random forest model. If they are contributing so marginally to better results, just remove them from figures and tables and mention it shortly in the text instead. The title also has to be revised (see below). If you put "open data" in the title you would expect some data to be available as part of the paper, but it seems like the open part is only about the data that is used in the paper. If the data is somewhere on zenodo the paper would be more useful.
Title: The title does not cover the actual contents of the paper. The main ingredients are mesoscale modelling, the random forest model and observations. That should be made clear from the title somehow.

l3: The paper is only relevant for power forecasting and not for wind resource assessment. For wind resource assessment the use of 10 m masts would never be accepted and the small improvements in RMSE are only relevant for power forecasting. For wind resource assessment you will need to take aspects like long-term correction etc. into account as well.

l4: large spatial resolution -> coarse spatial resolution

Fig 1: Add the abbreviations WF1-3 also in the map, this makes the text easier to follow.

l116: What does noa mean? Better to cite the paper instead: https://journals.ametsoc.org/view/journals/bams/104/8/BAMS-D-21-0075.1.xml

l155: How did you include the positions of current wind farms? Which power and thrust curves were used?

l168: acquired from the and verified -> acquired from what?

Table 4: this table contains too much information that is not discussed in the text. Remove or condense.

l226: you mean the impact of precipitation was limited or the actual precipitation (in mm?) was limited?

l205: Why is it a problem if it is always wind speed that is classified as most important? This represents that wind speed IS simply the most important for predicting power. Why would you artificially add more variables to you problem if they don't add more information?

Citation: https://doi.org/10.5194/wes-2025-29-RC3

Status: closed

RC1: 'Comment on wes-2025-29', Anonymous Referee #1, 12 Apr 2025

Dear authors and editor, while this manuscript is of fair technical value I do not consider it worth a publication in WES in its current state for the following reasons:
- objectives and context: the objectives of a Wind Resource and Energy Yield Assessments (WRA and EYA) are primarily to estimate the long term net production of a wind farm. The paper is focusing on metrics (MAE and RMSE) which are not commonly used for WRA and EYA. A revised version of the paper should use the commonly accepted framework, so as to bring value to practitioners in the field.
- methods: the methods used by practitioners (measurement data analysis, long term correction, spatial extrapolation, gross and net (including wakes) energy yield, operational losses) are not discussed in this paper. A revised value of the paper should at least discuss these methods and how the proposed alternatives add value compare to these traditional methods. See for reference the WES article https://wes.copernicus.org/articles/6/311/2021/ and the reports from the Wind Plant Performance Prediction (WP3) NREL project. Also, a revised version of the manuscript should dwelve much more into details regarding the wind climate (macro-meso-micro) at every wind farm, through a description of the sites in terms of large-scale forcing, orography, roughness, atmospheric stability, wind turbine layout and characteristics. This information (derived from model and measurements) should be used to discuss the differences in model results (the model works differently for the third wind farm). See example such as https://doi.org/10.1127/metz/2021/1068. Lastly, a revised value of the paper should compare several reanalysis (at least ERA5 should be added).

- readibility: the manuscript needs to be shortened, in particular the part on the literature review. I understand the work derives from a Master Thesis, for which the style used (technical report) is fine. But here for a WES paper, the manuscript should be more concise and clearly insist on the novelty/added value of the proposed approach compared to existing frameworks.
All the best,
Rémi Gandoin

Citation: https://doi.org/10.5194/wes-2025-29-RC1
RC2:
'Comment on wes-2025-29', Anonymous Referee #2, 23 Apr 2025
"general comments"
This paper investigates different ways of estimating the historical wind power available in a region. It uses the recorded actual wind power generation from three test locations in Canada to evaluate the skill of different approaches. Three different sources of atmospheric data are used to generate wind power: meteorological stations, WRF forecast, and MERRA-2 reanalysis. The skill of the sources are considered separately, as well as in combination. The importance of different atmospheric variables is also estimated. The model with the best skill scores used variables from all sources, and used 10 variables, emphasizing the importance of using atmospheric variables beyond simply wind speed. This is a well-written paper, with a logical and clear methodology, and presents interesting results and discussion.

"specific comments"
As the paper uses inputs from reanalysis and met stations, it should be made clear that this method can not be used for operational forecasting.

There are well-published biases within the MERRA-2 reanalysis dataset. As the authors mention in their conclusions, it would probably have been better to have used the ERA5 dataset instead, or, ideally, to have compared the skill of the two for the particular locations used in this study.

10 metre met station wind speed was extrapolated (not interpolated) to hub height using a logarithmic wind speed profile. There are many issues with this, including the importance of atmospheric stability, this could be highlighted as a source of uncertainty.

Is MERRA-2 also extrapolated from 10m wind? Note: ERA5 records wind speed at 10m and 100m, which is an advantage here.

Was WRF wind output at turbine hub height?

The paper is based on only four months of data, which is quite a short time period. Results would be more rigorous if at least one year of simulations were generated.

Table 4: min and max don’t mean much for wind direction. Perhaps something else, like standard deviation, would be more relevant?

"technical corrections"
Table 1 does not seem to be referred to in the text.

Line 168: typo “acquired from the and”

Table 4: typo: column heading “WRF1” should, I think, be “WF1”

Line 194: missing reference “acquired from the and”provided power curves (?)”

Line 222: type: “Figure 3a” should be “Figure 4a”

Line 265: perhaps change “Scenario 5” to “the last column”?

Conor Sweeney, UCD, Dublin, Ireland.
Citation: https://doi.org/10.5194/wes-2025-29-RC2
RC3: 'Comment on wes-2025-29', Anonymous Referee #3, 14 May 2025

The paper combines WRF, a random forest model and meteorological stations to improve predictions of wind farm output. This may have relevance for power forecasting. However, in its current shape I unfortunately have to reject the paper. Many figures, tables and the appendices are never discussed: all information provided in the paper should contribute to a clear story and if not discussed should be removed. I suggest to summarize the results in a more clear way, for example by aggregating the results of the three wind farms and presenting less numbers or reducing the number of variables that you discuss from the random forest model. If they are contributing so marginally to better results, just remove them from figures and tables and mention it shortly in the text instead. The title also has to be revised (see below). If you put "open data" in the title you would expect some data to be available as part of the paper, but it seems like the open part is only about the data that is used in the paper. If the data is somewhere on zenodo the paper would be more useful.
Title: The title does not cover the actual contents of the paper. The main ingredients are mesoscale modelling, the random forest model and observations. That should be made clear from the title somehow.

l3: The paper is only relevant for power forecasting and not for wind resource assessment. For wind resource assessment the use of 10 m masts would never be accepted and the small improvements in RMSE are only relevant for power forecasting. For wind resource assessment you will need to take aspects like long-term correction etc. into account as well.

l4: large spatial resolution -> coarse spatial resolution

Fig 1: Add the abbreviations WF1-3 also in the map, this makes the text easier to follow.

l116: What does noa mean? Better to cite the paper instead: https://journals.ametsoc.org/view/journals/bams/104/8/BAMS-D-21-0075.1.xml

l155: How did you include the positions of current wind farms? Which power and thrust curves were used?

l168: acquired from the and verified -> acquired from what?

Table 4: this table contains too much information that is not discussed in the text. Remove or condense.

l226: you mean the impact of precipitation was limited or the actual precipitation (in mm?) was limited?

l205: Why is it a problem if it is always wind speed that is classified as most important? This represents that wind speed IS simply the most important for predicting power. Why would you artificially add more variables to you problem if they don't add more information?

Citation: https://doi.org/10.5194/wes-2025-29-RC3

Elizabeth von Zuben and Kristen R. Schell

Viewed

Total article views: 1,181 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
795	351	35	1,181	49	97

HTML: 795
PDF: 351
XML: 35
Total: 1,181
BibTeX: 49
EndNote: 97

Views and downloads (calculated since 20 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	50	10	1	61
Apr 2025	55	21	6	82
May 2025	47	17	4	68
Jun 2025	25	9	6	40
Jul 2025	14	18	1	33
Aug 2025	78	13	1	92
Sep 2025	285	7	0	292
Oct 2025	33	18	0	51
Nov 2025	25	47	8	80
Dec 2025	48	38	0	86
Jan 2026	63	17	2	82
Feb 2026	26	61	1	88
Mar 2026	46	75	5	126

Cumulative views and downloads (calculated since 20 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	50	10	1	61
Apr 2025	55	21	6	82
May 2025	47	17	4	68
Jun 2025	25	9	6	40
Jul 2025	14	18	1	33
Aug 2025	78	13	1	92
Sep 2025	285	7	0	292
Oct 2025	33	18	0	51
Nov 2025	25	47	8	80
Dec 2025	48	38	0	86
Jan 2026	63	17	2	82
Feb 2026	26	61	1	88
Mar 2026	46	75	5	126

Viewed (geographical distribution)

Total article views: 1,173 (including HTML, PDF, and XML) Thereof 1,173 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Mar 2026

Short summary

Wind energy production is directly related to several meteorological variables, but such information is not publicly available at the wind farm location. This research seeks to understand what combination of publicly available meteorological data can accurately predict wind energy production. We find that a combination of publicly available data on wind speeds, temperature, wind direction, air pressure and density, and date and time can improve wind power prediction by over 57 %.


Total:	0
HTML:	0
PDF:	0
XML:	0