ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_distanceBhat Module Reference

This module contains classes and procedures for computing the Bhattacharyya statistical distance between two probability distributions. More...

Data Types

interface  getDisBhat
 Generate and return the Bhattacharyya distance of two univariate (discrete or continuous) distributions. More...
 

Variables

character(*, SK), parameter MODULE_NAME = "@pm_distanceBhat"
 

Detailed Description

This module contains classes and procedures for computing the Bhattacharyya statistical distance between two probability distributions.

The Bhattacharyya distance is a quantity which represents a notion of similarity between two probability distributions.
It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.
The Bhattacharyya distance is not a metric, despite being named a distance, since it does not obey the triangle inequality.

History

Both the Bhattacharyya distance and the Bhattacharyya coefficient are named after Anil Kumar Bhattacharyya, a statistician who worked in the 1930s at the Indian Statistical Institute.
He developed the method to measure the distance between two non-normal distributions and illustrated this with the classical multinomial populations, this work despite being submitted for publication in 1941, appeared almost five years later in Sankhya.
Consequently, Professor Bhattacharyya started working toward developing a distance metric for probability distributions that are absolutely continuous with respect to the Lebesgue measure and published his progress in 1942, at Proceedings of the Indian Science Congress and the final work has appeared in 1943 in the Bulletin of the Calcutta Mathematical Society.

Definition using Probability Theory

For probability distributions \(P\) and \(Q\) on the same domain \(\mathcal{X}\), the Bhattacharyya distance is defined as,

\begin{equation} D_{B}(P, Q) = -\ln\left(\up{BC}(P,Q)\right) ~, \end{equation}

where

\begin{equation} \up{BC}(P, Q) = \sum_{x \in {\mathcal{X}}}{\sqrt {P(x)Q(x)}} ~, \end{equation}

is the Bhattacharyya coefficient for discrete probability distributions.
For continuous probability distributions, \(P(dx) = p(x)dx\) and \(Q(dx) = q(x)dx\) where \(p(x)\) and \(q(x)\) are the probability density functions, the Bhattacharyya coefficient is defined as,

\begin{equation} \up{BC}(P, Q) = \int_{\mathcal{X}}{\sqrt{p(x)q(x)}}\,dx ~. \end{equation}

Definition using Measure Theory

More generally, given two probability measures \(P, Q\) on a measurable space \((\mathcal{X}, \mathcal{B})\), let \(\lambda\) be a sigma finite measure such that \(P\) and \(Q\) are absolutely continuous with respect to \(\lambda\), that is, such that \(P(dx) = p(x) \lambda(dx)\), and \(Q(dx) = q(x)\lambda(dx)\) for probability density functions \(p, q\) with respect to \(\lambda\) defined \(\lambda\)-almost everywhere.
Such a measure, even such a probability measure, always exists, for example, \(\lambda = \frac{1}{2}(P + Q)\).
Then the Bhattacharyya measure on \(({\mathcal {X}},{\mathcal {B}})\) is defined by,

\begin{equation} \up{bc}(dx|P,Q)={\sqrt {p(x)q(x)}}\,\lambda (dx)={\sqrt {{\frac {P(dx)}{\lambda (dx)}}(x){\frac {Q(dx)}{\lambda (dx)}}(x)}}\lambda (dx) ~. \end{equation}

The definition does not depend on the measure \(\lambda\), for if we choose a measure \(\mu\) such that \(\lambda\) and another measure choice \(\lambda'\) are absolutely continuous, i.e., \(\lambda = l(x)\mu\) and \(\lambda '=l'(x)\mu\), then,

\begin{equation} P(dx) = p(x)\lambda (dx) = p'(x)\lambda '(dx)=p(x)l(x)\mu (dx)=p'(x)l'(x)\mu (dx) ~, \end{equation}

and similarly for \(Q\).
We then have,

\begin{equation} \up{bc}(dx | P, Q) = {\sqrt{p(x)q(x)}} \, \lambda (dx) = {\sqrt {p(x)q(x)}} \, l(x) \mu(x) = {\sqrt{p(x)l(x)q(x) \, l(x)}} \mu(dx) = {\sqrt{p'(x)l'(x)q'(x)l'(x)}} \, \mu(dx) = {\sqrt{p'(x)q'(x)}} \, \lambda'(dx) ~. \end{equation}

Then define the Bhattacharyya coefficient as,

\begin{equation} \up{BC}(P,Q) = \int_{\mathcal{X}} \up{bc}(dx|P, Q) = \int_{\mathcal{X}}{\sqrt {p(x)q(x)}}\,\lambda (dx) ~. \end{equation}

By the above, the quantity \(\up{BC}(P,Q)\) does not depend on \(\lambda\), and by the Cauchy inequality \(0\leq \up{BC}(P,Q)\leq 1\).
In particular if \(P(dx) = p(x)Q(dx)\) is absolutely continuous w.r.t. to \(Q\) with Radon Nikodym derivative \(p(x) = {\frac {P(dx)}{Q(dx)}}(x)\), then,

\begin{equation} \up{BC}(P,Q) = \int_{\mathcal{X}}{\sqrt {p(x)}}Q(dx) = \int_{\mathcal{X}} {\sqrt {\frac{P(dx)}{Q(dx)}}} Q(dx) = E_{Q} \left[{\sqrt{\frac{P(dx)}{Q(dx)}}}\right] ~. \end{equation}

Properties of the Bhattacharyya Distance

  1. The conditions \(0\leq \up{BC}\leq 1\) and \(0\leq D_{B}\leq \infty\) hold for the Bhattacharyya coefficient and distance, respectively.
  2. The Bhattacharyya distance \(D_{B}\) does not obey the triangle inequality, though the Hellinger distance \(\sqrt{1 - \up{BC}(p,q)}\) does.

Connection with Total Variation Distance (TVD)

The Bhattacharyya distance \(H(P, Q)\) and the total variation distance (or statistical distance) \(\delta(P,Q)\) are related as follows,

\begin{equation} H^{2}(P, Q)\leq \delta(P, Q)\leq {\sqrt{2}}H(P, Q) ~. \end{equation}

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

See also
pm_distanceBhat
pm_distanceEuclid
pm_distanceHellinger
pm_distanceKolm
pm_distanceMahal
Test:
test_pm_distanceBhat


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, March 22, 2012, 2:21 PM, National Institute for Fusion Studies, The University of Texas Austin

Variable Documentation

◆ MODULE_NAME

character(*, SK), parameter pm_distanceBhat::MODULE_NAME = "@pm_distanceBhat"

Definition at line 137 of file pm_distanceBhat.F90.