Minimum Open Data Subset for Wind Power Prediction
Abstract. Accurate wind power prediction is required for grid integration of renewables, minimizing curtailment of renewable energy, and performing resource assessments. Prior research has explored the use of numerical weather prediction, reanalysis datasets, and observational data in power prediction and resource assessment applications. Observational data is spatially limited and often proprietary. Reanalysis datasets are available globally, but have a large spatial resolution and therefore do not capture the effects of complex geography well. Numerical weather prediction simulations allow for high spatial resolution flow models, but require significant processing resources and computational time. This work combines historical wind power production data, observational data, MERRA-2 reanalysis, and WRF model data at three wind farms in Ontario, Canada to determine the optimal data source, combination of data sources, and variables for prediction of wind power using a random forests model. Results show that a model combining select data from all three data sources, including a combination of wind speed, time, and other weather variables, improves predictive performance by up to 57 % over the benchmark power curve model. Analysis of feature importance shows that aggregating wind speed allows the model to make better use of additional weather features. The minimum subset of input data for the best performing model, which achieves a mean absolute error (MAE) of 0.071 across all sites, consists of averaged wind speed, temperature, wind direction, pressure, air density, and time variables (hour, day and month).