Decreasing Wind Speed Extrapolation Error via Domain-Specific Feature Extraction and Selection

Model uncertainty is a significant challenge in the wind energy industry and can lead to mischaracterization of millions of dollars’ worth of wind resource. Machine learning methods, notably deep artificial neural networks (ANNs), are capable of modeling turbulent and chaotic systems and offer a promising tool to produce high-accuracy wind speed forecasts and extrapolations. This paper uses data collected by profiling Doppler lidars over three field campaigns to investigate the 5 efficacy of using ANNs for wind speed vertical extrapolation in a variety of terrains, and quantifies the role of domain knowledge on ANN extrapolation accuracy. A series of 11 meteorological parameters (features) are used as ANN inputs and the resulting output accuracy is compared with that of both standard log law and power law extrapolations. It is found that extracted non-dimensional inputs, namely turbulence intensity, current wind speed, and previous wind speed, are the features that most reliably improve the ANN’s accuracy, providing up to a 65% and 52% increase in extrapolation accuracy over log law 10 and power law predictions, respectively. The volume of input data is also deemed important for achieving robust results. One test case is analyzed in-depth using dimensional and non-dimensional features, showing that feature non-dimensionalization drastically improves network accuracy and robustness for sparsely sampled atmospheric cases.

well when tasked with wind speed forecasting on a variety of timescales (Bilgili et al., 2007;More and Deo, 2003;Chen et al., 2019). However, wind speed measurements from meteorological towers or remote sensors often must be extrapolated in space as well as time to reach the location of interest (e.g. turbine hub height), adding another layer of forecasting complexity.
In a recent study, Mohandes and Rehman (2018) found that neural networks in conjunction with lidar data can accurately extrapolate wind speeds over flat terrain using wind speeds measured below the targeted height (extrapolation height). However, 45 it is unclear whether this finding holds for more complex terrain. Knowledge of meteorological conditions and site characteristics could be essential for optimal extrapolation accuracy. In the same vein, Li et al. (2019) found that adding turbulence intensity as an input greatly improves wind speed forecasting accuracy, suggesting that the input feature set may be highly influential for machine learning tools applied to meteorological problems. Following such developments, the present study focuses on proper extraction and selection of meteorological features across multiple sites for a neural network designed for 50 vertical extrapolation of wind speed. The novelty of this study is in addressing the following questions: First, is it possible to improve wind speed extrapolation accuracy under various terrain conditions using neural networks by invoking physics-based input features? Second, which atmospheric features should be selected to optimize the model's prediction capabilities?
Section 2 provides an introduction to neural networks and the list of input features utilized. Section 3 briefly describes the campaign sites and instrumentation utilized as well as measurement uncertainty. Section 4 presents findings of the investigation, 55 and Section 5 provides analysis and discussion. Concluding remarks are given in Section 6. 2 Model Overview

Neural Network Architecture
Artificial neural networks (ANNs) are a machine learning framework wherein a multi-layered network of nodes attempts to compute an output from a given set of inputs while eliciting (often hidden) patterns underlying a given data structure. A classic 60 feed-forward ANN layout is given in Fig. 1. These networks mimic the inner workings of the human brain and consist of four main elements: a layer of user-defined inputs, one or more hidden layers, an output layer, and the weighted connections that adjoin any hidden layer to that before and after itself. Each layer is made up of nodes, which gather information from the previous layer, perform an activation function, and send the altered information to the next layer. ANNs with multiple hidden layers (deep neural networks) are often much better at unearthing patterns in complex, nonlinear systems. These networks learn 65 best when supplied with large datasets and a well-selected feature set. Poor feature extraction or selection can lead the network to find a pattern that is either misleading or potentially incorrect. In other words: garbage in, garbage out.
ANNs first go through a training phase where they learn the structure of a system. Batches of training data are fed into the network, which produces an output. This output is then compared to the actual output, which is known a priori. The network then backpropagates the error through the system via stochastic gradient descent (SGD), starting from the last layer and ending 70 at the first. During this process, the weights between layers are altered to produce a robust network physiology. This process is repeated for as many iterations as is desired, with the network seeing all training data in each iteration.
At the end of each iteration, a set of validation data is given to the network to ensure that the network is not over-fitting the training data. At the end of training, the network is given a third set of data, known as testing data, that has been unseen by the ANN theretofore. The network's performance is characterized by its prediction accuracy on the testing data, defined by a 75 certain error or loss metric C. This study uses the mean absolute percentage error (MAPE, Eq. 1) as the loss metric due to ease of comparison with industry metrics and its insensitivity to non-dimensionalization (Appendix A).
where N is the number of observations, y i the observed output, andŷ i the network output. The ANN used here has a similar framework as Mohandes and Rehman (2018), containing four hidden layers with 30, 15, 10, and 5 nodes, respectively, 80 descending until the final output layer that has a single node. Research on the effect of increasing the number of hidden layers shows that deeper networks are better able to approximate highly complex systems (Aggarwal, 2018). The number of hidden layers generally is a function of the number of input and output arguments used in the ANN as well as the expected nonlinearity in the system. The true depth of an ANN is generally concluded based on several trial and error runs. Increasing the number of hidden layers in our case, however, did not yield higher extrapolation accuracy.

85
There were also two dropout layers, located after the first and second hidden layers, that protect against over-fitting. The activation function in each hidden layer was the hyperbolic tangent, while the output layer had a linear activation function.
The MAPE cost function was utilized as the cost function C. The Adam optimization algorithm (Kingma and Ba, 2014) was implemented to enhance SGD, and all trials were discontinued after no more than 1,000 iterations through the entire training dataset. All datasets were split into three distinct pieces: training data (50%), validation data (25%), and testing data (25%). In 90 order to minimize bias, all data was randomly split before each of the 10 runs for every test case. From these 10 runs, the best, average, and standard deviation of the testing data MAPE were recorded. Tests were performed with different input features and different heights at various site locations to confirm that bias from a given site and/or measurement height was removed.

95
Our main hypothesis is that more informed meteorological inputs lead to lower model extrapolation error and possibly lower error than can be achieved by existing models. All meteorological inputs utilized in this study are listed alongside their respective definitions in Appendix B. To ensure that the model performs better than that achieved via simple analysis or with unadulterated inputs, we consider four base cases. The first is a power law extrapolation, a simple algorithmic representation of how wind speed varies with height, where U α is the streamwise wind speed at the height of interest, U r the streamwise wind speed at a reference height, z the height of interest, z r the reference height, and α a power law coefficient that characterizes the shear between z and z r . The α value was derived dynamically for each individual period (Shu et al., 2016). The second base case is the log law extrapolation under neutral conditions. The formulation of the log law when the wind speed is known at a reference height can be given as where U L is the wind speed at extrapolation height, d the zero-plane displacement, and z 0 the roughness length. Both d and z 0 are determined based on local topographic information (Holmes, 2018). The log law extrapolation (and more generally Monin-Obukhov similarity theory) is expected to perform poorly for the complex terrain sites due to the lack of stationarity and horizontal homogeneity (Fernando et al., 2015).

110
The other two base cases involve using nearly raw meteorological data as input features. The third base case uses only the streamwise wind speeds (U ) below the height of interest as inputs, while the fourth base case uses U , wind direction (Dir), and hour (Hr) as inputs. The hour is formatted as a cosine curve to ensure continuity between days, while the direction is formatted from −1 → 1 to alleviate scaling issues.
Neural network inputs are taken at 20m intervals to a maximum of 80m below the height of interest (e.g., for an output 115 at 120m, data from 100m, 80m, 60m, and 40m are used). The lowest measurement height available was 40m. Because sites (Section 3) had different instrumentation, the only features used are those obtained by a single profiling lidar. All lidar data are 10-minute averaged. Three non-dimensional features are extracted from the lidar data, namely turbulence intensity (T I = σ U / U ; σ U is the standard deviation of the wind speed), non-dimensional streamwise wind speeds (U n ; non-dimensionalized by U 20m below the height of interest), and non-dimensional streamwise wind speed from the previous time period (U p ). U p is 120 the only input feature utilized that extended up to the height of interest (i.e. we assume that the previous period's wind speed at the extrapolation height is known), and bold lettering on these three features indicates that they are non-dimensional quantities.
Three additional features are also extracted: vertical wind shear (dudz = ∂U / ∂z ), local terrain slope in the direction of incoming flow (φ), and vertical wind speed (W ). The non-dimensional input features were selected considering their robustness in inputting more accurate features (e.g., possible compensation of measurement errors in formulating non-dimensional variables) 125 and ability of non-dimensional variables to better represent flow structures (Barenblatt and Isaakovich, 1996). Features are used in various combinations in order to determine which provide useful information to the network and which provide unnecessary or redundant information that lead to confusion. All input features were included in a final test to show that simply throwing multitudes of data at the network yields poor results.
It is typical industry practice to normalize (i.e. standardize) input variables, wherein an input variable x is scaled tox via where µ is the variable's mean and σ is the variable's standard deviation (Aggarwal, 2018). This technique is particularly useful when input variables have Gaussian distributions and cover multiple scales. However, none of the variables in our study had such a distribution, and many inputs already have similar scaling. Testing showed that standardization had no discernible impact on network performance (not shown), and therefore the input features were kept in their unaltered state. The non-135 dimensionalization performed followed typical fluid dynamical practices (Barenblatt and Isaakovich, 1996).
The subscript 1 (e.g., U p,1 ) denotes that the input value was only taken at the height of interest, subscript 2 (e.g., W 2 ) denotes that the input value was taken at 20m below the height of interest, and subscript 3 (e.g., U p,3 ) denotes that the input value was taken at the height of interest and 40m below. Input variables without a subscript 1 or 3 were taken from all four heights below the extrapolation height. Additionally, because a vast majority of industrial wind turbines do not produce power 140 at exceedingly low wind speeds, all cases with streamwise velocity 20m below the extrapolation height (U 1 ) < 3ms −1 were removed before testing. The highest wind speed value recorded at any site was less that 23 ms −1 , below the standard cut-off limit of 25 ms −1 (Markou and Larsen, 2009).

Site Description and Instrumentation
Data from three international field campaigns, whose locations can be seen in Fig. 2a, were used in this study. The authors 145 participated in each of these campaigns by deployment of instruments and data analysis. The Wind Forecasting Improvement Project 2 (WFIP2) was a multi-year field campaign focused on improving the predictability of hub-height winds for wind energy applications in complex terrain ). An 18-month field campaign took place in the US Pacific Northwest from October 2015 to March 2017. Several remote sensing and in-situ sensors were located in a region with distributed commercial wind farms along the Columbia river basin. This study focuses on using vertical profiling lidar (Leosphere's Windcube 150 V1) data collected by the University of Colorado at Boulder from the so-called Wasco Site for a period of 15 months (Bodini et al., 2019;Lundquist, 2017). The lidar's location can be seen as the orange marker in Fig. 2b. The surrounding terrain is complex (although nominally less so than that at Perdigão to be described below), with neighboring wind farms to the east of the lidar. Any periods with missing data at multiple heights were ignored in the analysis. A Signal-to-Noise Ratio (SNR) and availability threshold (30%) recommended by the manufacturer is used to remove any potentially bad data.

155
The Coupled Air-Sea Processes for Electro-magnetic ducting Research (CASPER) field campaign was focused on measurement and modeling of the Marine Atmospheric Coastal Boundary Layer (MACBL) to better predict the interaction of EM propagation and atmospheric turbulence (Wang et al., 2018). Two field campaigns were conducted during CASPER, can be seen as the blue marker in Fig. 2c. The data were filtered using the SNR and availability threshold recommended by the manufacturer. Datasets available at all heights were selected for this study.
The final study is the Perdigão campaign, a multinational project that took place in the Spring and Summer of 2017 aimed at improving microscale modeling for wind energy applications . Conducted in the Castelo Branco region 165 of Portugal, the campaign deployed an array of state-of-the-art sensors to measure wind flow features within and around a complex double-ridge topography. The ridges are spaced approximately 1.4km apart with a valley in between. Both ridges rise approximately 250m above the surrounding topography, which mainly consists of rolling hills and farmland. Over four months of data were taken from a Leosphere profiling lidar, denoted by the black marker in Fig. 2d, which was located on top of the northern ridge of the Perdigão double-ridge. This particular location was selected due to the multitude of complex flow patterns 170 seen at this location during the campaign. A meteorological tower was located adjacent to the lidar, but it only rose to 100m above ground level, below all extrapolation heights. Profiler data available at all heights were used for this study, and any data below 30% availability over 10 minutes were ignored.
The uncertainty of the wind Doppler lidar measurements is expected to be within 2% (Lundquist et al., 2015Giyanani et al., 2015;Kim et al., 2016;Newsom et al., 2017;Newman and Clifton, 2017). Owing to a lack of secondary measurements 175 at the locations and heights of interest, all lidar measurements are treated as true. Table 1 shows for each case the best testing extrapolation accuracy at all sites. The total number of (randomly split) validation and testing samples for each case is also shown for reference. The table is color coded, with the best accuracy in yellow and the worst in red. At first glance it is obvious that the network's accuracy is highly dependent not only on the inputs used, but also 180 the site location and data availability. The site with the highest extrapolation accuracy is the nominally mildly complex WFIP2 site, which also has the most robust dataset. The highly complex Perdigão site has the worst extrapolation accuracy, with the accuracy of the offshore CASPER site between the two. The best MAPE achieved for all heights (underlined), with each site below 2%, meets and often exceeds industry standards (Langreder and Jogararu, 2017).

Results
The power law performed better than the log law and was therefore used as a baseline for comparison in Table 2. As this 185 table shows, the two ANN baseline cases (one utilizing U , Dir, and Hr, as well one only utilizing U ; first two rows of Tables 1 and 2) performed almost equally well and showed a slight improvement over the power law extrapolation. However, there is no clear distinction between the results of the two cases and therefore Dir and Hr can be presumed to have no effect on prediction accuracy. When U is replaced by U n , the network accuracy again improves, providing a result 10-33% more accurate than the power law extrapolation. T I and U p,1 are the most beneficial secondary input features when used alongside U n . While T I 190 improves network accuracy in all except at the CASPER site, U p,1 is more impactful, improving accuracy up to 52% over the power law extrapolation. T I was chosen as the second input for cases with three input features because it is the most beneficial feature that includes information about the flow's turbulence levels and to some extent the atmospheric stability, information that is expected to be highly influential in determining the flow, particularly at the complex terrain sites.
A majority of the third input features, specifically U α , dudz, φ, φ 2 , and W , have negligible or negative effects on extrap-195 olation accuracy. There are exceptions to this rule, nevertheless, as U α considerably improves accuracy for CASPER and φ 2 improves accuracy at 160m height for Perdigão. With a single exception, the best extrapolation accuracy is obtained when U n , network. With all inputs, the best extrapolation accuracy is up to 67% worse compared to the input case that obtains the best 200 result (100m CASPER, Table 1).

Discussion
A brief analysis shows that extracted non-dimensional meteorological input features (U n , T I, and U p ) drastically improve the network's extrapolation accuracy, allowing it to perform much better than conventional log law and power law extrapolations.
However, this uptick in accuracy does not continue as more features are added. As can be seen in Fig. 3, using more than three 205 input features for Perdigão actually reduced network accuracy. This is most obvious when all possible features are thrown into the network. The input noise and redundancy reduces the network's ability to find usable patterns. Excess information, much of it redundant, confuses the network.
Two tests were performed to determine whether this improvement in accuracy is derived from feature non-dimensionalization.
Because the network performed best at Perdigão with input features of U n , T I, and U p,3 , the same inputs were then given 210 to the network, but in dimensional form (i.e. U , σ U , and U p,3 ). The dark blue bar in Fig. 3 shows that the network performed significantly worse when given dimensional features. In fact, the network performs just as poorly with dimensional features as it does when given all the input features indiscriminately, showing that non-dimensionalization has a significant impact on network performance.
Next, the 160m Perdigão extrapolation with input features U n , U p,3 , and T I was analyzed in-depth. In order to determine  given in Fig. 4. The left column shows network outputs when given dimensional features, whereas the right column shows the results obtained using non-dimensional features (herein referred to as the dimensional and non-dimensional network, respectively). Fig. 4a and b show a comparison of true wind speed and that predicted by the network. It is immediately obvious that, upon approaching sparse sample regions, the dimensional network begins to fail, clearly underpredicting high wind speeds.

220
The non-dimensional network, however, does not have this problem and accurately extrapolates these higher wind speeds.
An elementary indicator of the network's predictive power is the coefficient of determination R 2 , given by where y i andŷ i have the same meanings as in Eq. 1 andȳ is the mean observed output. Non-dimensionalization improves R 2 from 99.3% to 99.6%. While this is a clear improvement, it does not tell the whole story. Non-dimensionalization minimizes 225 the network's dependence on wind speed, possibly by forcing it to calculate the amount of shear between reference and extrapolation heights, which is more easily determined with the assistance of T I. Therefore, it may be expected that nondimensionalization reduces error at high wind speeds where there is a deficiency of samples.
The decrease in error variance is seen in Fig. 4c   where σ ε is the standard deviation of the root mean square error and U the mean wind speed. For non-dimensional testing, predicted wind speeds are first transformed back into the dimensional space (i.e. U n → U ) prior to error calculation in order to find true wind speed extrapolation uncertainty. The total uncertainty, a measure of error variability, is reported in the top row of Fig. 4, but the change in η with height can be seen in the figure's middle row. At low wind speeds (< 4ms −1 ) with 235 a large sample size the dimensional network actually outperforms the non-dimensional network. As wind speeds increase, both the dimensional and non-dimensional networks' uncertainties decrease at a similar rate until the sample size begins to decrease at roughly 10ms −1 . At high wind speeds, the dimensional network's uncertainty begins to increase, eventually rising to almost 2% at extrapolated wind speeds > 15ms −1 . The non-dimensional network's uncertainty, meanwhile, continues to decrease as wind speed increases, eventually reaching values as low as 0.5%. This is once again due to the fact that the non-240 dimensional network is better accounting for the wind shear that is crucial for extrapolation. High wind speeds no longer appear to the network as outliers, allowing the network to better extrapolate much higher wind speeds than otherwise possible.
Non-dimensionalization therefore decreases output variability in sparse dimensional space, producing less volatile outputs and a more robust network.
Lastly, the change in M AP E with wind speed can be seen in Fig. 4e and f. As with uncertainty, the dimensional network's 245 M AP E increases dramatically with wind speed due to sample sparsity. Non-dimensionalization once again nearly eliminates this effect, as the M AP E consistently decreases for extrapolated wind speeds < 16ms −1 . Whereas the uncertainty denotes error variability, M AP E denotes overall prediction error. As is clear in Fig. 4a, the dimensional network has an obvious bias at high wind speeds, systematically under-predicting extrapolation wind speed. This is apparent in Fig. 4e, as M AP E increases to more than 10% at higher wind speeds. The non-dimensional network does not have this problem, again due to the fact that 250 the network is oblivious to the dimensional wind speed, minimizing the prediction's dependence upon total wind speed. We therefore conclude that non-dimensionalization decreases both total error and error variability in regions with a sparsity of samples by eliminating the dependence on wind speed.
CASPER is most sensitive to the choice of input features. This may be due to two factors. First, the site may have flow dynamics for which our current list of inputs cannot account (such as the Catalina Eddy near the Californian Bight (Parish 255 et al., 2013), and marine offshore internal boundary layers (Garratt, 1990) observed near that site). Additionally, it is likely that the amount of CASPER data available is not adequate for the network to accurately parse more complex hidden patterns.
Less data could lead the network to overemphasize noisy perturbations as opposed to larger meteorological trends. It is telling that even with the small amount of data available the ANN is sometimes more than 50% more accurate than the power law extrapolation technique.

260
Although the best extrapolation accuracy occurs at WFIP2, the largest improvement over the power law is at CASPER and Perdigão. This may be due to the fact that the power law extrapolation performed well at WFIP2 to begin with, suggesting that WFIP2 may have the simplest flow pattern of the three sites. The amount of data available did not seem to improve network performance but likely stabilized the network against noise.
We determine that of the features analyzed, the non-dimensional input features, U n , U p , and T I, most reliably help the 265 efficacy of the ANN. Extracting the non-dimensional wind speed gives the network a better idea of the general trend it needs to spot and adds more uniformity to the input samples. T I specifies the amount of turbulence and hence momentum diffusive capacity within the system (i.e. velocity gradients), a property that none of the other input features are able to directly convey.
Lastly, providing the ANN with the previous period's wind speed drastically improves accuracy. This is the only feature that contains information about the flow's history. All three of these features are important because they give the network new 270 insightful information about evolving aspects (dynamics) of the flow.
Some of the other input features (φ, W , Dir) are less impactful for extrapolation, with minor effects that are site and height dependent. Adding irrelevant inputs increases the system's noise and, unless an abundance of data is available, can cause the ANN to model coincidental or conflicting patterns. Other features (dudz, U ) provide redundant information. These features typically fail to improve network accuracy, can slow the training process, and are best left out. Lastly, U α can act as a positive 275 or negative influence on the network because α is dependent on other parameters such as U n , U p , and T I. If the power law model is reasonably accurate or has a clear repetitive bias, U α could be a useful input feature that provides the ANN with a dependable indication of wind shear. Otherwise, it adds misleading noise to the input feature set by thwarting the steering that U n , U p , and T I would provide toward an accurate extrapolation.
It is obvious that just the right amount of scaled meteorological information is necessary to achieve optimal extrapola-280 tion accuracy. It is also useful to simplify the modeled system whenever possible, provided that the simplification does not remove necessary information. An example is the difference in extrapolation accuracy between U and U n . Before the nondimensionalization, the ANN has to find a baseline wind speed and predict the vertical wind shear. With non-dimensionalization, the baseline wind speed is a constant and the network is able to exploit possible self-similarity properties of the velocity profile.
Whatever information lost during non-dimensionalization is more than compensated by the improved model robustness and 285 removal of some measurement inaccuracies, allowing for better generalization over regions in the input domain that would have a scarce amount of data (i.e. extrapolated wind speed > 14ms −1 ). This is only a first step in investigating how mindful feature extraction and selection can improve ANN accuracy for meteorological predictions in wind engineering. Further improvement may be possible through the addition of other meteorological elements, particularly atmospheric stability (although we expect when inputs consist of different height levels and with specifi-290 cation of turbulence level, the effects of stratification is indirectly taken into account). Future studies are needed to investigate the efficacy of using non-dimensional meteorological variables to improve wind speed forecasting. Recurrent neural networks should also be utilized to test how alternative combinations of meteorological features, combined with extensive knowledge of the system's history, can improve wind speed forecasting.

295
Model uncertainty is a vexing problem in the wind energy industry that has vast economic implications. It has been shown that standard wind energy vertical extrapolation methods are outdated and can no longer serve their purpose of efficiently predicting and extrapolating meteorological properties accurately under various conditions (Sfyri et al., 2018;Stiperski et al., 2019). This problem can be mitigated by employing machine learning tools that have made great strides in the past few decades. Newer and the capability to delve into turbulent, nonlinear systems and may therefore be used as a tool to assist models, although blindly using ANNs without a dynamic underpinning is vacuous. Domain knowledge, especially on governing dynamical variables, can greatly assist these systems in finding underlying trends that govern atmospheric phenomena.
This study investigated how feature extraction and selection can increase ANN wind speed vertical extrapolation accuracy.
Various meteorological features were combined to test their effectiveness as ANN inputs. It was found that, on the average,

305
ANN vertical extrapolation error decreases by 15% when using U n as a singular input feature rather that U . Two other extracted non-dimensional features, T I and U p , also led to increased extrapolation accuracy. The accuracy obtained by the ANN was up to 65% and 53% better than that obtained by a log law and power law vertical extrapolations, respectively.
Vertical extrapolation error was minimized to as low as 1.06% over 20m, but too many network inputs actually caused a reduction in network accuracy. The 160m extrapolation at Perdigão was analyzed in depth to determine the effects of feature 310 non-dimensionalization. In addition to an improved correlation with measured wind speeds, non-dimensionalization led to a decrease in both total extrapolation error and variability, particularly at high wind speeds. The non-dimensional input features created a robust network that improved predictions even in rare and underrepresented cases. This shows that with sufficient data and proper feature extraction and selection, ANNs are able to improve upon the current industry standard vertical extrapolation accuracy.

315
Future studies are planned to investigate feature extraction and selection for wind speed predictions over a variety of timescales using a recurrent neural network. Identification of robust non-dimensional variables is expected to give ANNs a better perspective of atmospheric conditions. We hope that machine learning tools, combined with proper feature selection and extraction, will reduce atmospheric model uncertainty to a fraction of what it is today.
Input and target variables are altered for each individual test; example codes used for this study maybe found at https://github.com/dvassall/.

Appendix A: MAPE Magnitude Invariance
Our goal is to ensure that the loss function's magnitude is invariant regardless of output scaling, allowing a fair comparison between dimensional and non-dimensional networks. For a simple feed-forward neural network with j output nodes, our output 325 error can be defined as E = 1 n Σ c Σ j e cj where c is the number of samples in a batch and e cj is the error (given by a user-defined loss function) seen by each output node for each sample in the batch. If we are using a mean absolute percentage error loss function, meaning that we may define our error metric as e cj = 100 |y cj −ŷ cj | y cj where y cj is the true target output,ŷ cj is the predicted target output, and the vertical lines denote the absolute value. For 330 convenience consider a single sample in a single batch (this same analysis can be expanded to multiple samples over multiple batches because of the linear nature of the summation). We will refer to the true and predicted values as y andŷ, respectively.
We can now define the true and predicted (dimensional) outputs (y d andŷ d , respectively) as well as the true and predicted non-dimensional outputs (y n = y d /a andŷ n =ŷ d /a, respectively, where a is a non-dimensionalization variable unique to each individual case). We can find the dimensional error e d to be 335 e d = 100 Likewise, the non-dimensional error e n can be written as proving that the error's magnitude is invariant under non-dimensionalization. This is not true for loss metrics such as mean squared error or mean absolute error, where