ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation. |
This module contains classes and procedures for normalizing univariate or multivariate samples by arbitrary amounts along specific directions. More...
Data Types | |
interface | getNormed |
Generate a sample of shape (nsam) , or (ndim, nsam) or (nsam, ndim) that is normalized by the specified input shift and scale along the specified axis dim .More... | |
interface | setNormed |
Return a sample of shape (nsam) , or (ndim, nsam) or (nsam, ndim) that is normalized by the specified input shift and scale along the specified axis dim .More... | |
type | zscore_type |
This is the derived type whose instances are meant to signify a sample shifting by an amount equal to the negative of the sample mean and scaling the result by an amount equal to the inverse of the sample standard deviation or an equivalent measure. More... | |
Variables | |
character(*, SK), parameter | MODULE_NAME = "@pm_sampleNorm" |
type(zscore_type), parameter | zscore = zscore_type() |
This module contains classes and procedures for normalizing univariate or multivariate samples by arbitrary amounts along specific directions.
Normalization can have a wide variety of meanings in science.
In this module, it refers to the creation of a shifted and scaled version of a sample, where the intention is that these normalized values allow the comparison of corresponding normalized values for different datasets in a way that eliminates the effects of certain gross influences.
The procedures of this module facilitate the computation of the following popular sample normalizations, among others:
The standard score (z-score) is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured.
Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.
If the population mean and population standard deviation are known, a raw score x is converted into a standard score by,
\begin{equation} z = {x - \mu \over \sigma} ~, \end{equation}
where:
When the population mean and the population standard deviation are unknown, the standard score may be estimated by using the sample mean and sample standard deviation as estimates of the population values.
In these cases, the z-score is given by,
\begin{equation} z = {x - {\hat\mu} \over \hat\sigma} ~, \end{equation}
where:
Also known as min-max scaling or min-max normalization, it consists of rescaling the range of features to scale the range in \([0, 1]\) or \([−1, 1]\).
Selecting the target range depends on the nature of the data.
The general formula for a min-max of \([0, 1]\) is given as:
\begin{equation} \tilde x = \frac {x - {\text{min}}(x)}{{\text{max}}(x)-{\text{min}}(x)} ~, \end{equation}
where \(x\) is an original value and \(\tilde x\) is the normalized value.
For example, suppose that we have the students weight data, and the students weights span [160 pounds, 200 pounds].
To rescale this data, we first subtract \(160\) from each student weight and divide the result by \(40\) (the difference between the maximum and minimum weights).
To rescale a range between an arbitrary set of values \([a, b]\), the formula becomes:
\begin{equation} \tilde x = a + {\frac {(x-{\text{min}}(x))(b-a)}{{\text{max}}(x)-{\text{min}}(x)}} ~, \end{equation}
where \(a, b\) are the min-max values.
\begin{equation} \tilde x = {\frac {x-{\bar {x}}}{{\text{max}}(x)-{\text{min}}(x)}} ~, \end{equation}
where \(x\) is an original value and \(\tilde x\) is the normalized value and \({\bar{x}} = {\text{average}}(x)\) is the mean of that feature vector.
There is another form of the means normalization which divides by the standard deviation which is also called standardization.
shift and
scale` arguments), such interfaces were not added to this module for the following reasons:weight
and variance correction
arguments, thus significantly complicating the interfaces of this module with little gain.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
character(*, SK), parameter pm_sampleNorm::MODULE_NAME = "@pm_sampleNorm" |
Definition at line 132 of file pm_sampleNorm.F90.
type(zscore_type), parameter pm_sampleNorm::zscore = zscore_type() |
Definition at line 154 of file pm_sampleNorm.F90.