ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_distBinom Module Reference

This module contains classes and procedures for computing various statistical quantities related to the Binomial distribution. More...

Data Types

type  distBinom_type
 This is the derived type for signifying distributions that are of type Binomial as defined in the description of pm_distBinom. More...
 
interface  getBinomCDF
 Generate and return the Cumulative Distribution Function (CDF) of the Binomial distribution for an input nsuc within the discrete integer support of the distribution.
More...
 
interface  getBinomLogPMF
 Generate and return the natural logarithm of the Probability Mass Function (PMF) of the Binomial distribution for an input nsuc within the discrete integer support of the distribution.
More...
 
interface  setBinomCDF
 Return the Cumulative Distribution Function (CDF) of the Binomial distribution.
More...
 
interface  setBinomLogPMF
 Return the natural logarithm of the Probability Mass Function (PMF) of the Binomial distribution for an input nsuc within the discrete integer support of the distribution.
More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_distBinom"
 

Detailed Description

This module contains classes and procedures for computing various statistical quantities related to the Binomial distribution.

Specifically, this module contains routines for computing the following quantities of the Binomial distribution:

  1. the Probability Mass Function (PMF)
  2. the Cumulative Distribution Function (CDF)
  3. the Random Number Generation from the distribution (RNG)
  4. the Inverse Cumulative Distribution Function (ICDF) or the Quantile Function

The binomial distribution with parameters \(n\) and \(p\) is the discrete probability distribution of the number of successes in a sequence of \(n\) independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome:
success (with probability \(p\)) or failure (with probability \(q = 1 - p\)).
A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process.
For a single trial, i.e., \(n = 1\), the binomial distribution is a Bernoulli distribution.
The binomial distribution is the basis for the popular binomial test of statistical significance.
The binomial distribution is frequently used to model the number of successes in a sample of size \(n\) drawn with replacement from a population of size \(N\).
If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution.
However, for \(N\) much larger than \(n\), the binomial distribution remains a good approximation, and is widely used.

Probability Mass Function

In general, if the random variable \(X\) follows the binomial distribution with parameters \(\mathbb{N}\) and \(p \in [0, 1]\), we write \(X \sim B(n, p)\).
The probability of getting exactly \(k\) successes in \(n\) independent Bernoulli trials (with the same rate \(p\)) is given by the probability mass function:

\begin{equation} \large f(k, n, p) = \Pr(X = k) = \binom{n}{k} p^{k}(1-p)^{n-k} ~, \end{equation}

for \(k = 0, 1, 2, ..., n\), where

\begin{equation} \large \binom{n}{k} = \frac{n!}{k!(n-k)!} ~, \end{equation}

is the binomial coefficient, hence the distribution name.
The formula can be understood as follows:
\(p^k q^{n-k}\) is the probability of obtaining the sequence of n Bernoulli trials in which the first \(k\) trials are successes and the remaining last \(n - k\) trials result in failure.
Since the trials are independent with probabilities remaining constant between them, any sequence of \(n\) trials with \(k\) successes and \(n - k\) failures) has the same probability of being achieved regardless of positions of successes within the sequence.
There are \(\binom{n}{k}\) such sequences, since the binomial coefficient \(\binom{n}{k}\) counts the number of ways to choose the positions of the \(k\) successes among the \(n\) trials.
The binomial distribution is concerned with the probability of obtaining any of these sequences, meaning the probability of obtaining one of them ( \(p^k q^{n-k}\)) must be added \(\binom{n}{k}\) times, hence,

\begin{equation} \large \Pr(X=k) = \binom{n}{k} p^{k} (1-p)^{n-k} ~. \end{equation}

Cumulative Distribution Function

The cumulative distribution function can be expressed as:

\begin{equation} \large \up{CDF}(k; n, p) = \Pr(X\leq k) = \sum_{i=0}^{\lfloor k\rfloor}{n \choose i}p^{i}(1-p)^{n-i} ~, \end{equation}

where \(\lfloor k\rfloor\) is the floor under \(k\), i.e., the greatest integer less than or equal to \(k\).

It can also be represented in terms of the regularized incomplete Beta function, as follows:

\begin{eqnarray} \large \up{CDF}(k;n,p) &=& \Pr(X\leq k) \\ &=& I_{1-p}(n-k,k+1) \\ &=& (n-k){n \choose k}\int _{0}^{1-p}t^{n-k-1}(1-t)^{k}\,dt ~. \end{eqnarray}

Sums of binomials

If \(X \sim B(n, p)\) and \(Y \sim B(m, p)\) are independent binomial variables with the same probability \(p\), then \(X + Y\) is again a binomial variable; its distribution is \(Z = X + Y \sim B(n+m, p)\):

\begin{eqnarray} \large \up{P}(Z=k) &=& \sum_{i=0}^{k}\left[{\binom {n}{i}}p^{i}(1-p)^{n-i}\right]\left[{\binom {m}{k-i}}p^{k-i}(1-p)^{m-k+i}\right] \\ &=& \binom{n+m}{k} p^{k}(1-p)^{n+m-k} ~. \end{eqnarray}

A Binomial distributed random variable \(X \sim B(n, p)\) can be considered as the sum of \(n\) Bernoulli distributed random variables.
Thus, the sum of two Binomial distributed random variable \(X \sim B(n, p)\) and \(Y \sim B(m, p)\) is equivalent to the sum of \(n + m\) Bernoulli distributed random variables, which means \(Z = X + Y \sim B(n + m, p)\).

However, if \(X\) and \(Y\) do not have the same probability \(p\), then the variance of the sum will be smaller than the variance of a binomial variable distributed as \(B(n+m,{\bar{p}})\).

Test:
test_pm_distBinom


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, Oct 16, 2009, 11:14 AM, Michigan

Variable Documentation

◆ MODULE_NAME

character(*, SK), parameter pm_distBinom::MODULE_NAME = "@pm_distBinom"

Definition at line 133 of file pm_distBinom.F90.