Processing math: 0%
ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
All Classes Namespaces Files Functions Variables Pages
pm_sampleCor Module Reference

This module contains classes and procedures for computing properties related to the correlation matrices of random samples. More...

Data Types

type  corcoef_type
 This is an abstract derived type for constructing concrete derived types to distinguish various procedure signatures that require different correlation coefficients (e.g., pearson, spearman, kendall, ...).
More...
 
interface  getCor
 Generate and return the (Pearson) correlation coefficient or matrix of a pair of (weighted) time series x(1:nsam) and y(1:nsam) or of an input (weighted) array of shape (ndim, nsam) or (nsam, ndim) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.
More...
 
interface  getRho
 Generate and return the Spearman rank correlation matrix of the input (weighted) sample of shape (ndim, nsam) or (nsam, ndim) or the Spearman rank correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.
More...
 
type  kendall_type
 This is a concrete derived type whose instances are exclusively used to signify the kendall type of correlation coefficients.
More...
 
type  kendallA_type
 This is a concrete derived type whose instances are exclusively used to signify the kendallA type of correlation coefficients.
More...
 
type  kendallB_type
 This is a concrete derived type whose instances are exclusively used to signify the kendallB type of correlation coefficients.
More...
 
type  pearson_type
 This is a concrete derived type whose instances are exclusively used to signify the pearson type of correlation coefficients.
More...
 
interface  setCor
 Return the (weighted) correlation matrix corresponding to the input (weighted) covariance matrix or return the (weighted) sample Pearson correlation matrix of the input array of shape (ndim, nsam) or (nsam, ndim) or the Pearson correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.
More...
 
interface  setCordance
 Compute and return the Cordance vector of the input data series x and y.
More...
 
interface  setRho
 Return the Spearman rank correlation matrix of the input (weighted) sample of shape (ndim, nsam) or (nsam, ndim) or the Spearman rank correlation coefficient a pair of (weighted) time series x(1:nsam) and y(1:nsam) where ndim is the number of data dimensions (the number of data attributes) and nsam is the number of data points.
More...
 
type  spearman_type
 This is a concrete derived type whose instances are exclusively used to signify the spearman type of correlation coefficients.
More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_sampleCor"
 
type(kendall_type), parameter kendall = kendall_type()
 This is a scalar parameter object of type kendall_type that is exclusively used to signify the kendall type of correlation coefficients.
More...
 
type(kendallA_type), parameter kendallA = kendallA_type()
 This is a scalar parameter object of type kendallA_type that is exclusively used to signify the kendallA type of correlation coefficients.
More...
 
type(kendallB_type), parameter kendallB = kendallB_type()
 This is a scalar parameter object of type kendallB_type that is exclusively used to signify the kendallB type of correlation coefficients.
More...
 
type(pearson_type), parameter pearson = pearson_type()
 This is a scalar parameter object of type pearson_type that is exclusively used to signify the pearson type of correlation coefficients.
More...
 
type(spearman_type), parameter spearman = spearman_type()
 This is a scalar parameter object of type spearman_type that is exclusively used to signify the spearman type of correlation coefficients.
More...
 

Detailed Description

This module contains classes and procedures for computing properties related to the correlation matrices of random samples.

Correlation matrix

The correlation matrix of N random variables X_{1}, \ldots, X_{N} is the N\times N matrix \rho whose (i, j) entry is,

\begin{equation} \rho_{ij} := \up{COR}(X_{i}, X_{j}) = \frac{\up{COV}(X_{i}, X_{j})}{\sigma_{X_{i}} \sigma_{X_{j}}}, \quad {\text{if}} ~ \sigma_{X_{i}} \sigma_{X_{j}} > 0 ~. \end{equation}

Thus the diagonal entries are all identically one.
If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables X_{i} / \sigma(X_{i}) for i = 1, \dots, N.
This applies both to the matrix of population correlations (in which case \sigma is the population standard deviation), and to the matrix of sample correlations (in which case \sigma denotes the sample standard deviation).
Consequently, each is necessarily a positive-semidefinite matrix.
Moreover, the correlation matrix is strictly positive definite if no variable can have all its values exactly generated as a linear function of the values of the others.
The correlation matrix is symmetric because the correlation between X_{i} and X_{j} is the same as the correlation between X_{j} and X_{i}.

Note
The best way to compute the correlation matrix of a sample is to first compute the covariance matrix of the sample via the the relevant procedures in pm_sampleCov and then call the relevant procedures of this module to convert the covariance matrix to the corresponding correlation matrix.
The elements of the correlation matrix can be computed via the following equation,

\begin{equation} \rho_{ij} = \frac{\Sigma_{ij}} { \sqrt{\Sigma_{ii}} \times \sqrt{\Sigma_{jj}} } ~, \end{equation}

where \rho represents the correlation matrix, \Sigma represents the covariance matrix, and \sigma represents the standard deviation.
The vector of standard deviations can be readily extracted from the covariance matrix via getMatCopy.

Sample Cordance

Let \{(X_{1}, Y_{1}), \ldots, (X_{N}, Y_{N})\} be a set of N observations of the joint random variables X and Y.
A concordant pair is a pair of observations, each on two variables, (X_i, Y_i) and (X_j, Y2_j), having the property that

\begin{equation} \up{sgn}(X_j - X_i) ~=~ \up{sgn}(Y_j - Y_i) ~, \end{equation}

where \up{sgn} is the signum function defined as:

\begin{equation} \up{sgn}(x) = \begin{cases} -1, & x < 0 ~, \\ 0, & x = 0 ~, \\ 1, & x > 0 ~, \end{cases} \end{equation}

that is, in a concordant pair, both elements of one pair are either greater than, equal to, or less than the corresponding elements of the other pair.
In contrast, a discordant pair is a pair of two-variable observations such that,

\begin{equation} \up{sgn} (X_j - X_i) ~=~ -\up{sgn}(Y_j - Y_i) ~, \end{equation}

that is, if one pair contains a higher value of X then the other pair contains a higher value of Y.

Sample concordance is relevant to computing the Kendall correlation coefficient or in hypothesis testing.
However, in many situations it is also important to distinguish tied pairs from concordant pairs.
Therefore, more precisely, any pair of observations (X_{i}, Y_{i}) and (X_{j}, Y_{j}) are considered,

  1. X-tied if \up{sgn} (X_j - X_i) = 0.
  2. Y-tied if \up{sgn} (Y_j - Y_i) = 0.
  3. concordant if \ms{sgn}(X_{i} - X_{j}) = \ms{sgn}(Y_{i} - Y_{j}) where \ms{sgn}(\cdots) is the sign function.
  4. discordant if \ms{sgn}(X_{i} - X_{j}) \neq \ms{sgn}(Y_{i} - Y_{j}).

The generic interface setCordance of this module returns the sample cordance tuple/vector comprised of the number of x-ties, y-ties, concordant pairs, and discordant pairs.

Sample Cordance Algorithm

The naive method of computing sample concordance quickly becomes expensive for large N, because there are {N \choose 2} = \frac{N (N - 1)}{2} (the binomial coefficient) number of ways to choose two items from N items.
This makes the naive algorithm of complexity \mathcal{O}(N^{2}).

Correlation coefficient

A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.
The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.

This module contains generic algorithms for computing the following popular sample correlation coefficients.

Pearson correlation coefficient

The Pearson product-moment correlation coefficient, also known as r, R, or the Pearson r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.
This is the best-known and most commonly used type of correlation coefficient.
When the term correlation coefficient is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.

Definition

The Pearson correlation coefficient, when applied to a population, is commonly represented by the Greek letter \rho (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient.
Given a pair of random variables (X,Y), the formula for \rho is,

\begin{equation} \rho_{X,Y}={\frac {\up{cov} (X,Y)}{\sigma_{X}\sigma_{Y}}} ~, \end{equation}

where

  1. \up{cov} is the covariance.
  2. \sigma_{X} is the standard deviation of X.
  3. \sigma_Y is the standard deviation of Y.

The formula for \rho can be also expressed in terms of mean and expectation. Since,

\begin{equation} \up{cov}(X,Y) = \up{\mathbb {E}} [(X-\mu _{X})(Y-\mu _{Y})] ~, \end{equation}

the formula for \rho can also be written as,

\begin{equation} \rho_{X,Y} = {\frac{\up{\mathbb{E}} [(X - \mu_{X})(Y - \mu_{Y})]}{\sigma _{X}\sigma _{Y}}} ~, \end{equation}

where

  1. \sigma_Y and \sigma_{X} are defined as above.
  2. \mu_{X} is the mean of X.
  3. \mu_{Y} is the mean of Y.
  4. \up{\mathbb{E}} is the expectation.

The Pearson correlation coefficient, when applied to a sample, is commonly represented by r_{xy} and may be referred to as the sample correlation coefficient or the sample Pearson correlation coefficient.
We can obtain a formula for r_{xy} by substituting estimates of the covariances and variances based on a sample into the formula above.
Given paired data \left\{(x_{1}, y_{1}), \ldots,(x_{n}, y_{n})\right\} consisting of n pairs, r_{xy} is defined as,

\begin{equation} r_{xy} = {\frac {\sum_{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{{\sqrt {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}{\sqrt {\sum _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}}} ~, \end{equation}

where

  1. n is sample size.
  2. x_{i},y_{i} are the individual sample points indexed with i.
  3. \textstyle {\bar{x}} = {\frac{1}{n}} \sum_{i=1}^{n}x_{i} (the sample mean); and analogously for \bar{y}.

Spearman rank correlation coefficient

The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables.
For a sample of size n, the n raw scores X_{i}, Y_{i} are converted to ranks \up{R}({X_{i}}), \up{R}({Y_{i}}), and r_{s} is computed as

\begin{equation} r_{s} = \rho_{\up {R} (X),\up {R} (Y)} = {\frac {\up {cov} (\up {R} (X),\up {R} (Y))}{\sigma _{\up {R} (X)}\sigma _{\up {R} (Y)}}} ~, \end{equation}

where

  1. \rho denotes the usual Pearson correlation coefficient, but applied to the rank variables,
  2. \up{cov} (\up {R} (X),\up {R} (Y)) is the covariance of the rank variables,
  3. \sigma_{\up {R} (X)} and \sigma_{\up {R} (Y)} are the standard deviations of the rank variables.

When all n ranks are distinct integers, it can be computed using the formula,

\begin{equation} r_{s} = 1 - {\frac{6\sum d_{i}^{2}}{n(n^{2}-1)}} ~, \end{equation}

where

  1. d_{i}=\up {R} (X_{i})-\up{R} (Y_{i}) is the difference between the two ranks of each observation,
  2. n is the number of observations.

Kendall rank correlation coefficient

The Kendall rank correlation coefficient, commonly referred to as the Kendall \tau coefficient, is a statistic used to measure the ordinal association between two measured quantities.
It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities.
It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

Definition

Let (x_{1},y_{1}), \ldots,(x_{n},y_{n}) be a set of observations of the joint random variables X and Y, such that all the values x_{i} and y_{i} are unique.
Any pair of observations (x_{i},y_{i}) and (x_{j},y_{j}), where i < j, are said to be concordant if the sort order of (x_{i}, x_{j}) and (y_{i},y_{j}) agrees,
that is, if either both x_{i} > x_{j} and y_{i} > y_{j} holds or both x_{i} < x_{j} and y_{i} < y_{j}; otherwise they are said to be discordant.
The Kendall \tau coefficient is defined as:

\begin{equation} \tau = {\frac {({\text{number of concordant pairs}})-({\text{number of discordant pairs}})}{({\text{number of pairs}})}} = 1 - {\frac {2({\text{number of discordant pairs}})}{n \choose 2}} ~. \end{equation}

where {n \choose 2} = \frac{n(n-1)}{2} is the binomial coefficient for the number of ways to choose two items from n items.

Accounting for ties

A pair \{(x_{i}, y_{i}), (x_{j}, y_{j})\} is said to be tied if and only if x_{i} = x_{j} or y_{i} = y_{j}.
A tied pair is neither concordant nor discordant.
When tied pairs arise in the data, the coefficient may be modified in a number of ways to keep it in the range [−1, 1]:

Tau-a

The Tau-a statistic tests the strength of association of the cross tabulations.
Both variables have to be ordinal.
The Tau-a coefficient does not make any adjustment for ties.
It is defined as:

\begin{equation} \tau_{A} = {\frac {n_{c}-n_{d}}{n_{0}}} ~, \end{equation}

where n_c, n_d and n_0 are defined as in below for Tau-b coefficient.

Tau-b

The Tau-b statistic, unlike Tau-a, makes adjustments for ties.
Values of Tau-b range from −1 ( 100\% negative association, or perfect inversion) to +1 ( 100\% positive association, or perfect agreement).
A value of zero indicates the absence of association.
The Kendall Tau-b coefficient is defined as:

\begin{equation} \tau_{B} = \frac{n_{c}-n_{d}}{\sqrt {(n_{0}-n_{1})(n_{0}-n_{2})}} ~, \end{equation}

where

\begin{aligned} n_{0} &= n(n-1)/2 \\ n_{1} &= \sum _{i}t_{i}(t_{i}-1)/2 \\ n_{2} &= \sum _{j}u_{j}(u_{j}-1)/2 \\ n_{c} &= \text{Number of concordant pairs} \\ n_{d} &= \text{Number of discordant pairs} \\ t_{i} &= \text{Number of tied values in the } i^{\text{th}} \text{ group of ties for the first quantity} \\ u_{j} &= \text{Number of tied values in the } j^{\text{th}} \text{ group of ties for the second quantity} \end{aligned}

There are also other definitions of Kendall \tau which are not considered in the current version of the ParaMonte library.

See also
pm_sampling
pm_sampleACT
pm_sampleCCF
pm_sampleCor
pm_sampleCov
pm_sampleConv
pm_sampleECDF
pm_sampleMean
pm_sampleNorm
pm_sampleQuan
pm_sampleScale
pm_sampleShift
pm_sampleWeight
pm_sampleAffinity
pm_sampleVar
Correlation coefficient
Test:
test_pm_sampleCor
Todo:
High Priority: The procedures of this module should be extended to support samples of type complex of arbitrary kind type parameter, similar to procedures of pm_sampleVar.


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, Tuesday 01:45 AM, August 21, 2018, Dallas, TX

Variable Documentation

◆ kendall

type(kendall_type), parameter pm_sampleCor::kendall = kendall_type()

This is a scalar parameter object of type kendall_type that is exclusively used to signify the kendall type of correlation coefficients.

For example usage, see the documentation of the target procedure requiring this object.

See also
kendall
pearson
spearman
corcoef_type
kendall_type
pearson_type
spearman_type


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 372 of file pm_sampleCor.F90.

◆ kendallA

type(kendallA_type), parameter pm_sampleCor::kendallA = kendallA_type()

This is a scalar parameter object of type kendallA_type that is exclusively used to signify the kendallA type of correlation coefficients.

For example usage, see the documentation of the target procedure requiring this object.

See also
pearson
kendall
kendallA
kendallB
spearman
corcoef_type
pearson_type
kendall_type
kendallA_type
kendallB_type
spearman_type


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 437 of file pm_sampleCor.F90.

◆ kendallB

type(kendallB_type), parameter pm_sampleCor::kendallB = kendallB_type()

This is a scalar parameter object of type kendallB_type that is exclusively used to signify the kendallB type of correlation coefficients.

For example usage, see the documentation of the target procedure requiring this object.

See also
pearson
kendall
kendallA
kendallB
spearman
corcoef_type
pearson_type
kendall_type
kendallA_type
kendallB_type
spearman_type


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 502 of file pm_sampleCor.F90.

◆ MODULE_NAME

character(*, SK), parameter pm_sampleCor::MODULE_NAME = "@pm_sampleCor"

Definition at line 292 of file pm_sampleCor.F90.

◆ pearson

type(pearson_type), parameter pm_sampleCor::pearson = pearson_type()

This is a scalar parameter object of type pearson_type that is exclusively used to signify the pearson type of correlation coefficients.

For example usage, see the documentation of the target procedure requiring this object.

See also
kendall
pearson
spearman
corcoef_type
kendall_type
pearson_type
spearman_type


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 560 of file pm_sampleCor.F90.

◆ spearman

type(spearman_type), parameter pm_sampleCor::spearman = spearman_type()

This is a scalar parameter object of type spearman_type that is exclusively used to signify the spearman type of correlation coefficients.

For example usage, see the documentation of the target procedure requiring this object.

See also
kendall
pearson
spearman
corcoef_type
kendall_type
pearson_type
spearman_type


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 617 of file pm_sampleCor.F90.