ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_statest Module Reference

This module contains classes and procedures for performing various statistical tests. More...

Data Types

interface  getProbKS
 Generate and return the probability of the null-hypothesis that sample1 of size nsam1 originates from the same distribution as that of sample2 of size nsam2 or from the Uniform distribution or other distribution whose custom CDF is given.
More...
 
interface  setProbKS
 Return the probability and the corresponding Kolmogorov distribution quantile of the null-hypothesis that sample1 of size nsam1 originates from the same distribution as that of sample2 of size nsam2 or from the Uniform distribution or other distribution whose custom CDF is given.
More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_statest"
 

Detailed Description

This module contains classes and procedures for performing various statistical tests.

Kolmogorov-Smirnov (KS) Test

The Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).
In essence, the test answers the question How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution? or, in the second case, How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?.
It is named after Andrey Kolmogorov and Nikolai Smirnov.

The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples.
The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case).
In the one-sample case, the distribution considered under the null hypothesis may be continuous, purely discrete or mixed.
In the two-sample case, the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.
However, the two sample test can also be performed under more general conditions that allow for discontinuity, heterogeneity and dependence across samples.
The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test.

Note
The effective sample size in a two-sample KS test is computed as,

\begin{equation} N_e = \frac{N_1 N_2}{N_1 + N_2} ~, \end{equation}

where \(N_1\) and \(N_2\) represent the sizes of the first and the second samples in the test respectively.
The nature of the approximation involved in the two-sample KS test is that it becomes asymptotically accurate as the effective sample size \(N_e\) becomes large, but the approximation is reasonable even for as low as \(N_e\approx 4\).

Testing for Normality

In the special case of testing for normality of the distribution, samples are shifted, scaled, or standardized and compared with a standard normal distribution.
This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic.
Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test.
However, these other tests have their own disadvantages.
For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.

See also
pm_statest
pm_distKolm
pm_distanceKolm
Test:
test_pm_statest


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, March 22, 2012, 2:21 PM, National Institute for Fusion Studies, The University of Texas Austin

Variable Documentation

◆ MODULE_NAME

character(*, SK), parameter pm_statest::MODULE_NAME = "@pm_statest"

Definition at line 85 of file pm_statest.F90.