ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation. |
This module contains classes and procedures for computing properties related to the correlation matrices of random samples. More...
Data Types | |
type | corcoef_type |
This is an abstract derived type for constructing concrete derived types to distinguish various procedure signatures that require different correlation coefficients (e.g., pearson, spearman, kendall, ...).More... | |
interface | getCor |
Generate and return the (Pearson) correlation coefficient or matrix of a pair of (weighted) time series x(1:nsam) and y(1:nsam) or of an input (weighted) array of shape (ndim, nsam) or (nsam, ndim) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.More... | |
interface | getRho |
Generate and return the Spearman rank correlation matrix of the input (weighted) sample of shape (ndim, nsam) or (nsam, ndim) or the Spearman rank correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.More... | |
type | kendall_type |
This is a concrete derived type whose instances are exclusively used to signify the kendall type of correlation coefficients. More... | |
type | kendallA_type |
This is a concrete derived type whose instances are exclusively used to signify the kendallA type of correlation coefficients. More... | |
type | kendallB_type |
This is a concrete derived type whose instances are exclusively used to signify the kendallB type of correlation coefficients. More... | |
type | pearson_type |
This is a concrete derived type whose instances are exclusively used to signify the pearson type of correlation coefficients. More... | |
interface | setCor |
Return the (weighted) correlation matrix corresponding to the input (weighted) covariance matrix or return the (weighted) sample Pearson correlation matrix of the input array of shape (ndim, nsam) or (nsam, ndim) or the Pearson correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.More... | |
interface | setCordance |
Compute and return the Cordance vector of the input data series x and y .More... | |
interface | setRho |
Return the Spearman rank correlation matrix of the input (weighted) sample of shape (ndim, nsam) or (nsam, ndim) or the Spearman rank correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.More... | |
type | spearman_type |
This is a concrete derived type whose instances are exclusively used to signify the spearman type of correlation coefficients. More... | |
Variables | |
character(*, SK), parameter | MODULE_NAME = "@pm_sampleCor" |
type(kendall_type), parameter | kendall = kendall_type() |
This is a scalar parameter object of type kendall_type that is exclusively used to signify the kendall type of correlation coefficients.More... | |
type(kendallA_type), parameter | kendallA = kendallA_type() |
This is a scalar parameter object of type kendallA_type that is exclusively used to signify the kendallA type of correlation coefficients.More... | |
type(kendallB_type), parameter | kendallB = kendallB_type() |
This is a scalar parameter object of type kendallB_type that is exclusively used to signify the kendallB type of correlation coefficients.More... | |
type(pearson_type), parameter | pearson = pearson_type() |
This is a scalar parameter object of type pearson_type that is exclusively used to signify the pearson type of correlation coefficients.More... | |
type(spearman_type), parameter | spearman = spearman_type() |
This is a scalar parameter object of type spearman_type that is exclusively used to signify the spearman type of correlation coefficients.More... | |
This module contains classes and procedures for computing properties related to the correlation matrices of random samples.
The correlation matrix of N random variables X_{1}, \ldots, X_{N} is the N\times N matrix \rho whose (i, j) entry is,
\begin{equation} \rho_{ij} := \up{COR}(X_{i}, X_{j}) = \frac{\up{COV}(X_{i}, X_{j})}{\sigma_{X_{i}} \sigma_{X_{j}}}, \quad {\text{if}} ~ \sigma_{X_{i}} \sigma_{X_{j}} > 0 ~. \end{equation}
Thus the diagonal entries are all identically one.
If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables X_{i} / \sigma(X_{i}) for i = 1, \dots, N.
This applies both to the matrix of population correlations (in which case \sigma is the population standard deviation), and to the matrix of sample correlations (in which case \sigma denotes the sample standard deviation).
Consequently, each is necessarily a positive-semidefinite matrix.
Moreover, the correlation matrix is strictly positive definite if no variable can have all its values exactly generated as a linear function of the values of the others.
The correlation matrix is symmetric because the correlation between X_{i} and X_{j} is the same as the correlation between X_{j} and X_{i}.
\begin{equation} \rho_{ij} = \frac{\Sigma_{ij}} { \sqrt{\Sigma_{ii}} \times \sqrt{\Sigma_{jj}} } ~, \end{equation}
where \rho represents the correlation matrix, \Sigma represents the covariance matrix, and \sigma represents the standard deviation.Let \{(X_{1}, Y_{1}), \ldots, (X_{N}, Y_{N})\} be a set of N observations of the joint random variables X and Y.
A concordant pair is a pair of observations, each on two variables, (X_i, Y_i) and (X_j, Y2_j), having the property that
\begin{equation} \up{sgn}(X_j - X_i) ~=~ \up{sgn}(Y_j - Y_i) ~, \end{equation}
where \up{sgn} is the signum function defined as:
\begin{equation} \up{sgn}(x) = \begin{cases} -1, & x < 0 ~, \\ 0, & x = 0 ~, \\ 1, & x > 0 ~, \end{cases} \end{equation}
that is, in a concordant pair, both elements of one pair are either greater than, equal to, or less than the corresponding elements of the other pair.
In contrast, a discordant pair is a pair of two-variable observations such that,
\begin{equation} \up{sgn} (X_j - X_i) ~=~ -\up{sgn}(Y_j - Y_i) ~, \end{equation}
that is, if one pair contains a higher value of X then the other pair contains a higher value of Y.
Sample concordance is relevant to computing the Kendall correlation coefficient or in hypothesis testing.
However, in many situations it is also important to distinguish tied pairs from concordant pairs.
Therefore, more precisely, any pair of observations (X_{i}, Y_{i}) and (X_{j}, Y_{j}) are considered,
The generic interface setCordance of this module returns the sample cordance tuple/vector comprised of the number of x-ties, y-ties, concordant pairs, and discordant pairs.
The naive method of computing sample concordance quickly becomes expensive for large N, because there are {N \choose 2} = \frac{N (N - 1)}{2} (the binomial coefficient) number of ways to choose two items from N items.
This makes the naive algorithm of complexity \mathcal{O}(N^{2}).
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.
The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.
This module contains generic algorithms for computing the following popular sample correlation coefficients.
The Pearson product-moment correlation coefficient, also known as r, R, or the Pearson r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.
This is the best-known and most commonly used type of correlation coefficient.
When the term correlation coefficient is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
Definition
The Pearson correlation coefficient, when applied to a population, is commonly represented by the Greek letter \rho (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient.
Given a pair of random variables (X,Y), the formula for \rho is,
\begin{equation} \rho_{X,Y}={\frac {\up{cov} (X,Y)}{\sigma_{X}\sigma_{Y}}} ~, \end{equation}
where
The formula for \rho can be also expressed in terms of mean and expectation. Since,
\begin{equation} \up{cov}(X,Y) = \up{\mathbb {E}} [(X-\mu _{X})(Y-\mu _{Y})] ~, \end{equation}
the formula for \rho can also be written as,
\begin{equation} \rho_{X,Y} = {\frac{\up{\mathbb{E}} [(X - \mu_{X})(Y - \mu_{Y})]}{\sigma _{X}\sigma _{Y}}} ~, \end{equation}
where
The Pearson correlation coefficient, when applied to a sample, is commonly represented by r_{xy} and may be referred to as the sample correlation coefficient or the sample Pearson correlation coefficient.
We can obtain a formula for r_{xy} by substituting estimates of the covariances and variances based on a sample into the formula above.
Given paired data \left\{(x_{1}, y_{1}), \ldots,(x_{n}, y_{n})\right\} consisting of n pairs, r_{xy} is defined as,
\begin{equation} r_{xy} = {\frac {\sum_{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}} ~, \end{equation}
where
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables.
For a sample of size n, the n raw scores X_{i}, Y_{i} are converted to ranks \up{R}({X_{i}}), \up{R}({Y_{i}}), and r_{s} is computed as
\begin{equation} r_{s} = \rho_{\up {R} (X),\up {R} (Y)} = {\frac {\up {cov} (\up {R} (X),\up {R} (Y))}{\sigma _{\up {R} (X)}\sigma _{\up {R} (Y)}}} ~, \end{equation}
where
When all n ranks are distinct integers, it can be computed using the formula,
\begin{equation} r_{s} = 1 - {\frac{6\sum d_{i}^{2}}{n(n^{2}-1)}} ~, \end{equation}
where
The Kendall rank correlation coefficient, commonly referred to as the Kendall \tau coefficient, is a statistic used to measure the ordinal association between two measured quantities.
It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities.
It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.
Definition
Let (x_{1},y_{1}), \ldots,(x_{n},y_{n}) be a set of observations of the joint random variables X and Y, such that all the values x_{i} and y_{i} are unique.
Any pair of observations (x_{i},y_{i}) and (x_{j},y_{j}), where i < j, are said to be concordant if the sort order of (x_{i}, x_{j}) and (y_{i},y_{j}) agrees,
that is, if either both x_{i} > x_{j} and y_{i} > y_{j} holds or both x_{i} < x_{j} and y_{i} < y_{j}; otherwise they are said to be discordant.
The Kendall \tau coefficient is defined as:
\begin{equation} \tau = {\frac {({\text{number of concordant pairs}})-({\text{number of discordant pairs}})}{({\text{number of pairs}})}} = 1 - {\frac {2({\text{number of discordant pairs}})}{n \choose 2}} ~. \end{equation}
where {n \choose 2} = \frac{n(n-1)}{2} is the binomial coefficient for the number of ways to choose two items from n items.
Accounting for ties
A pair \{(x_{i}, y_{i}), (x_{j}, y_{j})\} is said to be tied if and only if x_{i} = x_{j} or y_{i} = y_{j}.
A tied pair is neither concordant nor discordant.
When tied pairs arise in the data, the coefficient may be modified in a number of ways to keep it in the range [−1, 1]:
Tau-a
The Tau-a statistic tests the strength of association of the cross tabulations.
Both variables have to be ordinal.
The Tau-a coefficient does not make any adjustment for ties.
It is defined as:
\begin{equation} \tau_{A} = {\frac {n_{c}-n_{d}}{n_{0}}} ~, \end{equation}
where n_c, n_d and n_0 are defined as in below for Tau-b coefficient.
Tau-b
The Tau-b statistic, unlike Tau-a, makes adjustments for ties.
Values of Tau-b range from −1 ( 100\% negative association, or perfect inversion) to +1 ( 100\% positive association, or perfect agreement).
A value of zero indicates the absence of association.
The Kendall Tau-b coefficient is defined as:
\begin{equation} \tau_{B} = \frac{n_{c}-n_{d}}{\sqrt {(n_{0}-n_{1})(n_{0}-n_{2})}} ~, \end{equation}
where
\begin{aligned} n_{0} &= n(n-1)/2 \\ n_{1} &= \sum _{i}t_{i}(t_{i}-1)/2 \\ n_{2} &= \sum _{j}u_{j}(u_{j}-1)/2 \\ n_{c} &= \text{Number of concordant pairs} \\ n_{d} &= \text{Number of discordant pairs} \\ t_{i} &= \text{Number of tied values in the } i^{\text{th}} \text{ group of ties for the first quantity} \\ u_{j} &= \text{Number of tied values in the } j^{\text{th}} \text{ group of ties for the second quantity} \end{aligned}
There are also other definitions of Kendall \tau which are not considered in the current version of the ParaMonte library.
complex
of arbitrary kind type parameter, similar to procedures of pm_sampleVar.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
type(kendall_type), parameter pm_sampleCor::kendall = kendall_type() |
This is a scalar parameter
object of type kendall_type that is exclusively used to signify the kendall type of correlation coefficients.
For example usage, see the documentation of the target procedure requiring this object.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
Definition at line 372 of file pm_sampleCor.F90.
type(kendallA_type), parameter pm_sampleCor::kendallA = kendallA_type() |
This is a scalar parameter
object of type kendallA_type that is exclusively used to signify the kendallA type of correlation coefficients.
For example usage, see the documentation of the target procedure requiring this object.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
Definition at line 437 of file pm_sampleCor.F90.
type(kendallB_type), parameter pm_sampleCor::kendallB = kendallB_type() |
This is a scalar parameter
object of type kendallB_type that is exclusively used to signify the kendallB type of correlation coefficients.
For example usage, see the documentation of the target procedure requiring this object.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
Definition at line 502 of file pm_sampleCor.F90.
character(*, SK), parameter pm_sampleCor::MODULE_NAME = "@pm_sampleCor" |
Definition at line 292 of file pm_sampleCor.F90.
type(pearson_type), parameter pm_sampleCor::pearson = pearson_type() |
This is a scalar parameter
object of type pearson_type that is exclusively used to signify the pearson type of correlation coefficients.
For example usage, see the documentation of the target procedure requiring this object.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
Definition at line 560 of file pm_sampleCor.F90.
type(spearman_type), parameter pm_sampleCor::spearman = spearman_type() |
This is a scalar parameter
object of type spearman_type that is exclusively used to signify the spearman type of correlation coefficients.
For example usage, see the documentation of the target procedure requiring this object.
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
Definition at line 617 of file pm_sampleCor.F90.