ParaMonte MATLAB 3.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
kde2d.m
Go to the documentation of this file.
1%> \brief
2%> Compute the Gaussian kernel density estimate of the input bivariate data.
3%>
4%> \details
5%> This function offers a fast and accurate bivariate
6%> kernel density estimator with diagonal bandwidth matrix.<br>
7%> The kernel is assumed to be Gaussian.<br>
8%> The two bandwidth parameters are chosen optimally without ever
9%> using/assuming a parametric model for the data or any *rules of thumb*.<br>
10%> Particularly, this algorithm is immune to accuracy failures in the estimation
11%> of multimodal densities with widely separated modes (see examples below).<br>
12%>
13%> \param[in] data : The input matrix of shape ``[nobs, 2]`` containing the
14%> continuous data whose density estimate is to be returned.<br>
15%> \param[in] resol : The input scalar whole number representing the size of
16%> the ``resol`` by ``resol`` grid over which the density
17%> is computed where ``resol`` has to be a power of ``2``,
18%> otherwise ``resol = 2 ^ ceil(log2(resol))`` will be used.<br>
19%> (**optional**. If missing or empty, it will be set to ``2^8``.)
20%> \param[in] xymin : The input vector of size ``2`` containing the lower limits
21%> of the bounding box over which the density is computed.<br>
22%> (**optional**. If missing or empty, it will be set to an appropriate value as prescribed in the note below.)
23%> \param[in] xymax : The input vector of size ``2`` containing the upper limits
24%> of the bounding box over which the density is computed.<br>
25%> (**optional**. If missing or empty, it will be set to an appropriate value as prescribed in the note below.)
26%>
27%> \return
28%> ``bandwidth`` : The output row vector with the two optimal bandwidths for a bivaroate Gaussian kernel.<br>
29%> The format of the vector is ``bandwidth = [bandwidth_x, bandwidth_y]``.<br>
30%> <br>
31%> ``density`` : The output ``resol`` by ``resol`` matrix containing
32%> the density values over the ``resol`` by ``resol`` grid.<br>
33%> This argument is not computed unless the output argument is present.<br>
34%> <br>
35%> ``meshx`` : The x-axis meshgrid over which the variable "density" has been computed.<br>
36%> This output is particularly useful in combination with MATLAB intrinsic ``surf``.<br>
37%> The intended usage is ``surf(meshx, meshy, density)``.<br>
38%> <br>
39%> ``meshy`` : The y-axis meshgrid over which the variable "density" has been computed.<br>
40%> This output is particularly useful in combination with MATLAB intrinsic ``surf``.<br>
41%> The intended usage is ``surf(meshx, meshy, density)``.<br>
42%>
43%> \interface{kde2d}
44%> \code{.m}
45%>
46%> [bandwidth, density, meshx, meshy] = pm.stats.hist.kde2d(data);
47%> [bandwidth, density, meshx, meshy] = pm.stats.hist.kde2d(data, resol);
48%> [bandwidth, density, meshx, meshy] = pm.stats.hist.kde2d(data, resol, xymin);
49%> [bandwidth, density, meshx, meshy] = pm.stats.hist.kde2d(data, resol, xymin, xymax);
50%>
51%> \endcode
52%>
53%> \note
54%> The procedure for computing the default values for the
55%> input arguments ``xymin`` and ``xymax`` are as follows:<br>
56%> \code{.m}
57%> dmax = max(data, [], 1);
58%> dmin = min(data, [], 1);
59%> dlim = dmax - dmin;
60%> xymax = dmax + dlim / 4;
61%> xymin = dmin - dlim / 4;
62%> \endcode
63%>
64%> \see
65%> Kernel density estimation via diffusion, Z. I. Botev, J. F. Grotowski,
66%> and D. P. Kroese (2010) Annals of Statistics, Volume 38, Number 5, pages 2916-2957.<br>
67%> [pm.vis.SubplotContour](@ref SubplotContour)<br>
68%> [pm.vis.CascadeContour](@ref CascadeContour)<br>
69%> [pm.vis.PlotContour](@ref PlotContour)<br>
70%> [pm.vis.TileContour](@ref TileContour)<br>
71%> [pm.vis.SubplotContourf](@ref SubplotContourf)<br>
72%> [pm.vis.CascadeContourf](@ref CascadeContourf)<br>
73%> [pm.vis.PlotContourf](@ref PlotContourf)<br>
74%> [pm.vis.TileContourf](@ref TileContourf)<br>
75%> [pm.vis.SubplotContour3](@ref SubplotContour3)<br>
76%> [pm.vis.CascadeContour3](@ref CascadeContour3)<br>
77%> [pm.vis.PlotContour3](@ref PlotContour3)<br>
78%> [pm.vis.TileContour3](@ref TileContour3)<br>
79%>
80%> \example{mvnmix}
81%> \include{lineno} example/stats/hist/kde2d/mvnmix/main.m
82%> \vis{mvnmix}
83%> \image html example/stats/hist/kde2d/mvnmix/pm.stats.hist.kde2d.mvnmix.png width=700
84%>
85%> \example{mvnmixfar}
86%> \include{lineno} example/stats/hist/kde2d/mvnmixfar/main.m
87%> \vis{mvnmixfar}
88%> \image html example/stats/hist/kde2d/mvnmixfar/pm.stats.hist.kde2d.mvnmixfar.png width=700
89%>
90%> \example{sincos}
91%> \include{lineno} example/stats/hist/kde2d/sincos/main.m
92%> \vis{sincos}
93%> \image html example/stats/hist/kde2d/sincos/pm.stats.hist.kde2d.sincos.png width=700
94%>
95%> \final{kde2d}
96%>
97%> Copyright (c) 2015, Dr. Zdravko Botev
98%> All rights reserved.
99%>
100%> Redistribution and use in source and binary forms, with or without
101%> modification, are permitted provided that the following conditions are
102%> met:
103%>
104%> * Redistributions of source code must retain the above copyright
105%> notice, this list of conditions and the following disclaimer.
106%> * Redistributions in binary form must reproduce the above copyright
107%> notice, this list of conditions and the following disclaimer in
108%> the documentation and/or other materials provided with the distribution
109%> * Neither the name of the The University of New South Wales nor the names
110%> of its contributors may be used to endorse or promote products derived
111%> from this software without specific prior written permission.
112%>
113%> This software is provided by the copyright holders and contributors "as is"
114%> and any express or implied warranties, including, but not limited to, the
115%> implied warranties of merchantability and fitness for a particular purpose
116%> are disclaimed. In no event shall the copyright owner or contributors be
117%> liable for any direct, indirect, incidental, special, exemplary, or
118%> consequential damages (including, but not limited to, procurement of
119%> substitute goods or services; loss of use, data, or profits; or business
120%> interruption) however caused and on any theory of liability, whether in
121%> contract, strict liability, or tort (including negligence or otherwise)
122%> arising in any way out of the use of this software, even if advised of the
123%> possibility of such damage.
124%>
125%> \AmirShahmoradi, May 16 2016, 9:03 AM, Oden Institute for Computational Engineering and Sciences (ICES), UT Austin<br>
126function [bandwidth, density, meshx, meshy] = kde2d(data, resol, xymin, xymax)
127
128 %global nobs asq rangesq
129
130 if nargin < 2
131 resol = 2 ^ 8;
132 end
133
134 %%%% round up resol to the next power of 2;
135
136 resol = 2 ^ ceil(log2(resol));
137
138 nobs = size(data, 1);
139 if nargin < 3
140 dmax = max(data, [], 1);
141 dmin = min(data, [], 1);
142 dlim = dmax - dmin;
143 xymax = dmax + dlim / 2;
144 xymin = dmin - dlim / 2;
145 end
146 scaling = xymax - xymin;
147 if nobs <= size(data, 2)
148 error('data has to be a ``nobs`` by ``2`` array where each row represents a two dimensional observation.');
149 end
150 transformed_data = (data - repmat(xymin, nobs, 1)) ./ repmat(scaling, nobs, 1);
151
152 %%%% Bin the data uniformly using regular grid.
153
154 initial_data = ndhist(transformed_data, resol);
155
156 %%%% Discrete cosine transform of initial data.
157
158 a = dct2d(initial_data);
159
160 %%%% Now compute the optimal bandwidth^2.
161
162 asq = a .^ 2;
163 rangesq = (0 : resol - 1) .^ 2;
164 t_star = getRoot(@(t)(t - evolve(t, nobs, asq, rangesq)), nobs);
165 p_02 = func([0,2], t_star, nobs, asq, rangesq);
166 p_20 = func([2,0], t_star, nobs, asq, rangesq);
167 p_11 = func([1,1], t_star, nobs, asq, rangesq);
168 t_y = (p_02 ^ (3 / 4) / (4 * pi * nobs * p_20 ^ (3 / 4) * (p_11 + sqrt(p_20 * p_02)))) ^ (1 / 3);
169 t_x = (p_20 ^ (3 / 4) / (4 * pi * nobs * p_02 ^ (3 / 4) * (p_11 + sqrt(p_20 * p_02)))) ^ (1 / 3);
170
171 %%%% Smooth the discrete cosine transform of initial data using t_star.
172
173 a_t = exp(-(0 : resol - 1)' .^ 2 * pi ^ 2 * t_x / 2) * exp(-(0 : resol - 1) .^ 2 * pi ^ 2 * t_y / 2) .* a;
174
175 %%%% Now apply the inverse discrete cosine transform.
176
177 if nargout > 1
178 density = idct2d(a_t) * (numel(a_t) / prod(scaling));
179 %%%% remove any negative density values
180 density(density < 0) = eps;
181 [meshx,meshy] = meshgrid(xymin(1) : scaling(1) / (resol - 1) : xymax(1), xymin(2) : scaling(2) / (resol - 1) : xymax(2));
182 end
183
184 bandwidth = sqrt([t_x, t_y]) .* scaling;
185
186end
187
188%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
189%> \cond excluded
190%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
191
192function [out, time] = evolve(t, nobs, asq, rangesq)
193 sumFunc = func([0, 2], t, nobs, asq, rangesq) ...
194 + func([2, 0], t, nobs, asq, rangesq) ...
195 + func([1, 1], t, nobs, asq, rangesq) * 2;
196 time = (2 * pi * nobs * sumFunc) ^ (-1 / 3);
197 out = (t - time) / time;
198end
199
200%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
201
202function out = func(s, t, nobs, asq, rangesq)
203 if sum(s) <= 4
204 sumFunc = func([s(1) + 1, s(2)], t, nobs, asq, rangesq) ...
205 + func([s(1), s(2) + 1], t, nobs, asq, rangesq);
206 const = (1 + 1 / 2 ^ (sum(s) + 1)) / 3;
207 time = (-2 * const * K(s(1)) * K(s(2)) / nobs / sumFunc) ^ (1 / (2 + sum(s)));
208 out = psi(s, time, asq, rangesq);
209 else
210 out = psi(s, t, asq, rangesq);
211 end
212end
213
214%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
215
216function out = psi(s, time, asq, rangesq)
217 %global asq rangesq
218 %%%% The input ``s`` is a vector.
219 w = exp(-rangesq * pi ^ 2 * time) .* [1, .5 * ones(1, length(rangesq) - 1)];
220 wx = w .* (rangesq .^ s(1));
221 wy = w .* (rangesq .^ s(2));
222 out = (-1) ^ sum(s) * (wy * asq * wx') * pi ^ (2 * sum(s));
223end
224
225%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
226
227function out = K(s)
228 out = (-1) ^ s * prod((1 : 2 : 2 * s - 1)) / sqrt(2 * pi);
229end
230
231%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
232
233function data = dct2d(data)
234 %%%%
235 %%%% Compute the two-dimensional discrete cosine transform of data data is an ``ndim`` cube.
236 %%%%
237 [nrow, ncol] = size(data);
238 if nrow ~= ncol
239 error('data is not a square array!')
240 end
241 %%%% Compute weights to multiply DFT coefficients.
242 w = [1; 2 * (exp(-i * (1 : nrow - 1) * pi / (2 * nrow))).'];
243 weight = w(:, ones(1, ncol));
244 data = dct1d(dct1d(data)')';
245 function transform1d = dct1d(x)
246 %%%% Re-order the elements of the columns of x.
247 x = [x(1 : 2 : end, :); x(end : -2 : 2, :)];
248 %%%% Multiply FFT by weights.
249 transform1d = real(weight .* fft(x));
250 end
251end
252
253%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
254
255function data = idct2d(data)
256 %%%% Computes the 2 dimensional inverse discrete cosine transform.
257 [nrow, ncol] = size(data);
258 % Compute weights.
259 w = exp(i * (0 : nrow - 1) * pi / (2 * nrow)).';
260 weights = w(:, ones(1, ncol));
261 data = idct1d(idct1d(data)');
262 function out = idct1d(x)
263 y = real(ifft(weights .* x));
264 out = zeros(nrow, ncol);
265 out(1 : 2 : nrow, :) = y(1 : nrow / 2, :);
266 out(2 : 2 : nrow, :) = y(nrow : -1 : nrow / 2 + 1, :);
267 end
268end
269
270%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
271
272function binned_data = ndhist(data, nbin)
273 %%%% This function computes the histogram of an ndim-dimensional data set.
274 %%%% 'data' is ``ndat`` rows by ``ndim`` columns.
275 %%%% The input ``nbin`` is the number of bins used in each dimension.
276 %%%% The output ``binned_data`` is a hypercube of length ``nbin``.
277 [ndat, ncol] = size(data);
278 bins = zeros(ndat, ncol);
279 for icol = 1 : ncol
280 [dum, bins(:, icol)] = histc(data(:, icol), [0 : 1 / nbin : 1], 1);
281 bins(:, icol) = min(bins(:, icol), nbin);
282 end
283 %%%% Combine the vectors of 1D bin counts into a grid of ND bin counts.
284 binned_data = accumarray(bins(all(bins > 0, 2), :), 1 / ndat, nbin(ones(1, ncol)));
285end
286
287%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
288
289function root = getRoot(getFunc, nobs)
290 %%%% Find smallest root whenever there is more than one.
291 nobs = 50 * (nobs <= 50) + 1050 * (nobs >= 1050) + nobs * ((nobs < 1050) & (nobs > 50));
292 tol = 10^ -12 + 0.01 * (nobs - 50) / 1000;
293 flag = 0;
294 while flag == 0
295 try
296 root = fzero(getFunc, [0, tol]);
297 flag = 1;
298 catch
299 %%%% double search interval.
300 tol = min(tol * 2, .1);
301 end
302 %%%% if all else fails:
303 if tol == .1
304 root = fminbnd(@(x) abs(getFunc(x)), 0, .1);
305 flag = 1;
306 end
307 end
308end
309
310%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
311%> \endcond
312%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function name(in vendor)
Return the MPI library name as used in naming the ParaMonte MATLAB shared library directories.
function list()
Return a list of MATLAB strings containing the names of OS platforms supported by the ParaMonte MATLA...
function abs(in path, in style)
Return the Get absolute canonical path of a file or folder.
This is the CascadeContour3 class for generating instances of 3-dimensional Contour3 Cascade visualiz...
This is the CascadeContour class for generating instances of 2-dimensional Contour Cascade visualizat...
This is the CascadeContourf class for generating instances of 2-dimensional Contourf Cascade visualiz...
This is the Contour3 class for generating instances of 3-dimensional Contour Plot visualizations base...
Definition: PlotContour3.m:30
This is the PlotContour class for generating instances of 2-dimensional Contour Plot visualizations b...
Definition: PlotContour.m:30
This is the PlotContourf class for generating instances of 2-dimensional Contourf Plot visualizations...
Definition: PlotContourf.m:30
This is the SubplotContour3 class for generating instances of 3-dimensional Contour Subplot visualiza...
This is the SubplotContour class for generating instances of 2-dimensional Contour Subplot visualizat...
This is the SubplotContourf class for generating instances of 2-dimensional Contour Subplot visualiza...
This is the pm.vis.TileContour3 class for generating instances of 3-dimensional Contour3 Tile visuali...
Definition: TileContour3.m:20
This is the pm.vis.TileContour class for generating instances of 2-dimensional Contour Tile visualiza...
Definition: TileContour.m:20
This is the pm.vis.TileContourf class for generating instances of 2-dimensional Contourf Tile visuali...
Definition: TileContourf.m:20
function getFunc(in x, in y)
Return the value of the 2-dimensional Himmelblau function at the specified input value.
function kde2d(in data, in resol, in xymin, in xymax)
Compute the Gaussian kernel density estimate of the input bivariate data.
function root()
Return a scalar MATLAB string containing the root directory of the ParaMonte library package.
excluded
Definition: show.m:173
function which(in vendor)
Return the a MATLAB string containing the path to the first mpiexec executable binary found in system...