Compute and return an iteratively-refined set of cluster centers given the input sample using the k-means approach.
More...

Detailed Description

Compute and return an iteratively-refined set of cluster centers given the input sample using the k-means approach.

The k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into \(k\) clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
This results in a partitioning of the data space into Voronoi cells.
The k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem:
the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
For instance, better Euclidean solutions can be found using k-medians and k-medoids.
The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum.
These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling.
They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.

Algorithm

Given a set of observations \((x_1, x_2, \ldots, x_n)\), where each observation is a \(d\)-dimensional real vector, the k-means clustering aims to partition the \(n\) observations into \(k\) ( \(\leq n\)) sets \(S = \{S_1, S_2, \ldots, S_k\}\) so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance).
Formally, the objective is to find:

\begin{equation} \underset{\mathbf{S}}{\up{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf{x} \in S_{i}} \left\|\mathbf{x} -{\boldsymbol{\mu}}_{i}\right\|^{2} = {\underset{\mathbf{S}}{\up{arg\,min}}}\sum_{i=1}^{k}|S_{i}|\up{Var}S_{i} ~, \end{equation}

where \(\mu_i\) is the mean (also called centroid) of points in \(S_{i}\), i.e.

\begin{equation} {\boldsymbol {\mu_{i}}} = {\frac{1}{|S_{i}|}} \sum_{\mathbf{x} \in S_{i}} \mathbf{x} ~, \end{equation}

where \(|S_{i}|\) is the size of \(S_{i}\), and \(\|\cdot\|\) is the \(L^2\)-norm.
This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:

\begin{equation} \underset{\mathbf{S}}{\up{arg\,min}} \sum_{i=1}^{k}\,{\frac {1}{|S_{i}|}}\,\sum_{\mathbf{x}, \mathbf{y} \in S_{i}}\left\|\mathbf{x} - \mathbf{y} \right\|^{2} ~, \end{equation}

The equivalence can be deduced from identity

\begin{equation} |S_{i}|\sum_{\mathbf{x} \in S_{i}}\left\|\mathbf{x} -{\boldsymbol{\mu}}_{i}\right\|^{2} = {\frac{1}{2}}\sum _{\mathbf {x} ,\mathbf {y} \in S_{i}}\left\|\mathbf {x} -\mathbf {y} \right\|^{2} ~. \end{equation}

Since the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in different clusters.
This deterministic relationship is also related to the law of total variance in probability theory.

Performance improvements

The k-means++ seeding method yields considerable improvement in the final error of k-means algorithm.
For more information, see the documentation of setKmeansPP.

Note: The metric used within this generic interface is the Euclidean distance.

Parameters

[in,out]	rng	: The input/output scalar that can be an object of, type rngf_type, implying the use of intrinsic Fortran uniform RNG. type xoshiro256ssw_type, implying the use of xoshiro256 uniform RNG. (optional**, If this argument is present, then all `intent(inout)` arguments below have `intent(out)` argument and will be initialized using the k-means++ algorithm. If this argument is missing, then the user must initialize all of the following arguments with `intent(inout)` via k-means++ or any other method before passing them to this generic interface.)
[in,out]	membership	: The input/output vector of shape `(1:nsam)` of type `integer` of default kind IK, containing the membership of each input sample in `sample` from its nearest cluster `center`, such that `cluster(membership(i))` is the nearest cluster center to the `i`th sample `sample(:, i)` at a squared-distance of `disq(i)`. If the argument `rng` is missing, then `membership` has `intent(inout)` and must be properly initialized before calling this routine. If the argument `rng` is present, then `membership` has `intent(out)` and will contain the final cluster memberships on output.
[in,out]	disq	: The input/output vector of shape `(1:nsam)` of the same type and kind as the input argument `sample`, containing the Euclidean squared distance of each input sample in `sample` from its nearest cluster `center`. If the argument `rng` is missing, then `disq` has `intent(inout)` and must be properly initialized before calling this routine. If the argument `rng` is present, then `disq` has `intent(out)` and will contain the final squared-distances from cluster centers on output.
[in]	sample	: The input vector, or matrix of, type `real` of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128), containing the sample of `nsam` points in a `ndim`-dimensional space whose corresponding cluster centers must be computed. If `sample` is a vector of shape `(1 : nsam)` and `center` is a vector of shape `(1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in univariate space). If `sample` is a matrix of shape `(1 : ndim, 1 : nsam)` and `center` is a matrix of shape `(1 : ndim, 1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in `ndim`-dimensional space).
[in,out]	center	: The input/output vector of shape `(1:ncls)` or matrix of shape `(1 : ndim, 1 : ncls)` of the same type and kind as the input argument `sample`, containing the set of `ncls` unique cluster centers (centroids) computed based on the input sample memberships and minimum distances. If the argument `rng` is missing, then `center` has `intent(inout)` and must be properly initialized before calling this routine. If the argument `rng` is present, then `center` has `intent(out)` and will contain the final cluster centers on output.
[in,out]	size	: The input/output vector of shape `(1:ncls)` type `integer` of default kind IK, containing the sizes (number of members) of the clusters with the corresponding centers output in the argument `center`. If the argument `rng` is missing, then `size` has `intent(inout)` and must be properly initialized before calling this routine. If the argument `rng` is present, then `size` has `intent(out)` and will contain the final cluster sizes (member counts) on output.
[in,out]	potential	: The input/output vector of shape `(1:ncls)` of the same type and kind as the input argument `sample`, the `i`th element of which contains the sum of squared distances of all members of the `i`th cluster from the cluster center as output in the `i`th element of `center`. If the argument `rng` is missing, then `potential` has `intent(inout)` and must be properly initialized before calling this routine (although its values are not explicitly referenced). If the argument `rng` is present, then `potential` has `intent(out)` and will contain the final cluster potentials (sums of squared distances from cluster centers) on output.
[out]	failed	: The output scalar of type `logical` of default kind LK that is `.true.` if and only if the algorithm fails to converge within the user-specified or default criteria for convergence. Failure occurs only if `any(size < minsize) .or. maxniter < niter`.
[out]	niter	: The output scalar of type `integer` of default kind IK, containing the number of refinement iterations performed within the algorithm to achieve convergence. An output `niter` value larger than the input `maxniter` implies lack of convergence before return. (optional. If missing, the number of refinement iterations will not be output.)
[in]	maxniter	: The input non-negative scalar of type `integer` of default kind IK, containing the maximum number of refinement iterations allowed within the algorithm to achieve convergence. If convergence does not occur within the maximum specified value, the output arguments can be passed again as is to the generic interface (without the optional `rng` argument) to continue the refinement iterations until convergence. A reasonable choice can be `300` or comparable values. (optional, default = `300`.)
[in]	minsize	: The input non-negative scalar of type `integer` of default kind IK, containing the minimum allowed size of each cluster. If any cluster has any number of members below the specified `minsize`, the algorithm will return without achieving convergence. The situation can be detected of any element of the output `size` is smaller than the specified `minsize`. A reasonable choice can be `ndim = ubound(sample, 1)` or comparable values although any non-negative value including zero is possible. (optional, default = `1`.)
[in]	nfail	: The input non-negative scalar of type `integer` of default kind IK, containing the number of times the k-means algorithm is allowed to fail before returning without convergence. (optional. It can be present only if the argument `rng` is also present, allowing random initializations in case of failures.)

Possible calling interfaces ⛓

: use pm_clustering, only: setKmeans

call setKmeansPP(membership(1 : nsam) , disq(1 : nsam), sample(1 : nsam) , center(1 : ncls) , size(1 : ncls), potential(1 : ncls), failed, niter = niter, maxniter = maxniter, minsize = minsize)

call setKmeansPP(membership(1 : nsam) , disq(1 : nsam), sample(1 : ndim, 1 : nsam), center(1 : ndim, 1 : ncls), size(1 : ncls), potential(1 : ncls), failed, niter = niter, maxniter = maxniter, minsize = minsize)

call setKmeansPP(rng, membership(1 : nsam) , disq(1 : nsam), sample(1 : nsam) , center(1 : ncls) , size(1 : ncls), potential(1 : ncls), failed, niter = niter, maxniter = maxniter, minsize = minsize, nfail = nfail)

call setKmeansPP(rng, membership(1 : nsam) , disq(1 : nsam), sample(1 : ndim, 1 : nsam), center(1 : ndim, 1 : ncls), size(1 : ncls), potential(1 : ncls), failed, niter = niter, maxniter = maxniter, minsize = minsize, nfail = nfail)

pm_clustering::setKmeansPP
Compute and return an asymptotically optimal set of cluster centers for the input sample,...
Definition: pm_clustering.F90:787

pm_clustering::setKmeans
Compute and return an iteratively-refined set of cluster centers given the input sample using the k-m...
Definition: pm_clustering.F90:1292

pm_clustering
This module contains procedures and routines for the computing the Kmeans clustering of a given set o...
Definition: pm_clustering.F90:35

Warning: If the argument rng is present, then all arguments associated with setKmeansPP equally apply to this generic interface.
The condition ubound(center, rank(center)) > 0 must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(disq, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(size, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(potential, 1) must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(membership, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) <= ubound(sample, rank(sample)) must hold for the corresponding input arguments (the number of clusters must be less than or equal to the sample size).
The condition ubound(sample, 1) == ubound(center, 1) must hold for the corresponding input arguments.
The condition 0 <= maxniter must hold for the corresponding input arguments.
The condition 0 <= minsize must hold for the corresponding input arguments.
The condition 0 <= nfail must hold for the corresponding input arguments.
These conditions are verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.; The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds. The procedures of this generic interface are always impure when the input rng argument is set to an object of type rngf_type.

Remarks: The functionality of this generic interface is highly similar to setCenter, with the major difference being that setKmeans simultaneously computes the new cluster centers and sample memberships, whereas setCenter computes the new cluster centers based on given sample membership.

See also: setKmeans
setCenter
setMember
setKmeansPP
k-means clustering

Example usage ⛓

: 1program example

2

3 use pm_kind, only: SK, IK, LK

4 use pm_kind, only: RKG => RKS ! all other real kinds are also supported.

5 use pm_io, only: display_type

6 use pm_distUnif, only: getUnifRand

7 use pm_arrayResize, only: setResized

8 use pm_clustering, only: setKmeansPP

9 use pm_clustering, only: setKmeans, rngf

10

11 implicit none

12

13 logical(LK) :: failed

14 integer(IK) :: ndim, nsam, ncls, itry, niter

15 real(RKG) , allocatable :: sample(:,:), center(:,:), disq(:), potential(:)

16 integer(IK) , allocatable :: membership(:), size(:)

17 type(display_type) :: disp

18

19 disp = display_type(file = "main.out.F90")

20

21 call disp%skip

22 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

23 call disp%show("! Compute cluster centers based on an input sample and cluster memberships and member-center distances.")

24 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

25 call disp%skip

26

27 do itry = 1, 10

28 call disp%skip()

29 call disp%show("ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);")

30 ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

31 call disp%show("[ndim, nsam, ncls]")

32 call disp%show( [ndim, nsam, ncls] )

33 call disp%show("sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.")

34 sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

35 call disp%show("sample")

36 call disp%show( sample )

37 call disp%show("call setResized(disq, nsam)")

38 call setResized(disq, nsam)

39 call disp%show("call setResized(membership, nsam)")

40 call setResized(membership, nsam)

41 call disp%show("call setResized(center, [ndim, ncls])")

42 call setResized(center, [ndim, ncls])

43 call disp%show("call setResized(potential, ncls)")

44 call setResized(potential, ncls)

45 call disp%show("call setResized(size, ncls)")

46 call setResized(size, ncls)

47 call disp%skip()

48

49 call disp%show("call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.")

50 call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

51 call disp%show("failed")

52 call disp%show( failed )

53 call disp%show("niter")

54 call disp%show( niter )

55 call disp%show("disq")

56 call disp%show( disq )

57 call disp%show("membership")

58 call disp%show( membership )

59 call disp%show("potential")

60 call disp%show( potential )

61 call disp%show("center")

62 call disp%show( center )

63 call disp%show("size")

64 call disp%show( size )

65 call disp%skip()

66 end do

67

68 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

69 ! Output an example for visualization.

70 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

71

72 block

73 integer(IK) :: funit, i

74 ndim = 2

75 ncls = 5

76 nsam = 5000

77 center = getUnifRand(0., 1., ndim, ncls)

78 sample = getUnifRand(0., 1., ndim, nsam)

79 call setResized(disq, nsam)

80 call setResized(membership, nsam)

81 call setResized(potential, ncls)

82 call setResized(size, ncls)

83 ! output the sample and the initial centers.

84 call setKmeansPP(rngf, membership, disq, sample, center, size, potential)

85 open(newunit = funit, file = "setKmeansPP.center.txt")

86 do i = 1, ncls

87 write(funit, "(*(g0,:,','))") i, center(:,i)

88 end do

89 close(funit)

90 open(newunit = funit, file = "setKmeansPP.sample.txt")

91 do i = 1, nsam

92 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)

93 end do

94 close(funit)

95 call setKmeans(membership, disq, sample, center, size, potential, failed)

96 open(newunit = funit, file = "setKmeans.center.txt")

97 do i = 1, ncls

98 write(funit, "(*(g0,:,','))") i, center(:,i)

99 end do

100 close(funit)

101 open(newunit = funit, file = "setKmeans.sample.txt")

102 do i = 1, nsam

103 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)

104 end do

105 close(funit)

106 end block

107

108end program example

pm_arrayResize::setResized
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Definition: pm_arrayResize.F90:249

pm_distUnif::getUnifRand
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
Definition: pm_distUnif.F90:4150

pm_io::show
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726

pm_io::skip
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508

pm_arrayResize
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
Definition: pm_arrayResize.F90:81

pm_distUnif
This module contains classes and procedures for computing various statistical quantities related to t...
Definition: pm_distUnif.F90:274

pm_distUnif::rngf
type(rngf_type) rngf
The scalar constant object of type rngf_type whose presence signified the use of the Fortran intrinsi...
Definition: pm_distUnif.F90:2886

pm_io
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252

pm_io::disp
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393

pm_kind
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268

pm_kind::LK
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541

pm_kind::IK
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540

pm_kind::SK
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539

pm_kind::RKS
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567

pm_io::display_type
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example Windows Batch compile command via Intel ifort compiler ⛓
1del main.exe

2set PATH=..\..\..\lib;%PATH%

3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe

4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example output ⛓
1

2!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3! Compute cluster centers based on an input sample and cluster memberships and member-center distances.

4!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

5

6

7ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

8[ndim, nsam, ncls]

9+5, +2, +2

10sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

11sample

12+3.09051657, +1.26056886

13+2.13868189, +2.20415926

14+0.165876448, +4.62417698

15+3.33872294, +4.93674469

16+4.05802822, +0.188024044

17call setResized(disq, nsam)

18call setResized(membership, nsam)

19call setResized(center, [ndim, ncls])

20call setResized(potential, ncls)

21call setResized(size, ncls)

22

23call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

24failed

25F

26niter

27+1

28disq

29+0.00000000, +0.00000000

30membership

31+2, +1

32potential

33+0.00000000, +0.00000000

34center

35+1.26056886, +3.09051657

36+2.20415926, +2.13868189

37+4.62417698, +0.165876448

38+4.93674469, +3.33872294

39+0.188024044, +4.05802822

40size

41+1, +1

42

43

44ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

45[ndim, nsam, ncls]

46+4, +4, +2

47sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

48sample

49+0.964425206, +2.01776719, +0.694973171, +2.03159618

50+2.78230238, +4.44708538, +2.54506230, +3.24920464

51+2.11412859, +4.58920431, +3.20349455, +2.51365948

52+0.577360690, +2.93278432, +3.28652120, +3.56598997

53call setResized(disq, nsam)

54call setResized(membership, nsam)

55call setResized(center, [ndim, ncls])

56call setResized(potential, ncls)

57call setResized(size, ncls)

58

59call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

60failed

61F

62niter

63+1

64disq

65+0.00000000, +2.69745898, +1.59492815, +1.17197776

66membership

67+1, +2, +2, +2

68potential

69+0.00000000, +8.98029804

70center

71+0.964425206, +1.58144557

72+2.78230238, +3.41378403

73+2.11412859, +3.43545294

74+0.577360690, +3.26176524

75size

76+1, +3

77

78

79ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

80[ndim, nsam, ncls]

81+3, +4, +2

82sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

83sample

84+3.49306345, +4.84981823, +0.482063591, +1.34671211

85+2.88380194, +1.71131730, +1.93086922, +0.324480832

86+2.77829385, +0.889991820, +0.842357278, +0.735339522E-1

87call setResized(disq, nsam)

88call setResized(membership, nsam)

89call setResized(center, [ndim, ncls])

90call setResized(potential, ncls)

91call setResized(size, ncls)

92

93call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

94failed

95F

96niter

97+1

98disq

99+1.69529724, +1.69529688, +0.979797423, +0.979797602

100membership

101+1, +1, +2, +2

102potential

103+6.78118849, +3.91918969

104center

105+4.17144108, +0.914387822

106+2.29755974, +1.12767506

107+1.83414280, +0.457945615

108size

109+2, +2

110

111

112ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

113[ndim, nsam, ncls]

114+5, +3, +2

115sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

116sample

117+2.67365503, +1.73565805, +1.08126378

118+0.780725479, +3.58734083, +2.24011850

119+3.19889784, +4.77777624, +4.48111963

120+0.197524428, +4.13293648, +1.16237903

121+1.01525640, +2.67101073, +1.36856318

122call setResized(disq, nsam)

123call setResized(membership, nsam)

124call setResized(center, [ndim, ncls])

125call setResized(potential, ncls)

126call setResized(size, ncls)

127

128call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

129failed

130F

131niter

132+1

133disq

134+0.00000000, +3.21295643, +3.21295691

135membership

136+2, +1, +1

137potential

138+12.8518257, +0.00000000

139center

140+1.40846086, +2.67365503

141+2.91372967, +0.780725479

142+4.62944794, +3.19889784

143+2.64765787, +0.197524428

144+2.01978683, +1.01525640

145size

146+2, +1

147

148

149ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

150[ndim, nsam, ncls]

151+2, +7, +4

152sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

153sample

154+3.36679554, +2.60843635, +3.53108978, +0.765241981, +4.52701473, +1.37547374, +2.27166533

155+0.216679573, +2.14455223, +0.186871588, +4.49863815, +0.525346994, +3.02717018, +4.27078485

156call setResized(disq, nsam)

157call setResized(membership, nsam)

158call setResized(center, [ndim, ncls])

159call setResized(potential, ncls)

160call setResized(size, ncls)

161

162call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

163failed

164F

165niter

166+1

167disq

168+0.203566492, +0.00000000, +0.919158086E-1, +0.580307066, +0.563083470, +0.00000000, +0.580307126

169membership

170+2, +4, +2, +1, +2, +3, +1

171potential

172+2.32122850, +1.13431323, +0.00000000, +0.00000000

173center

174+1.51845360, +3.80830002, +1.37547374, +2.60843635

175+4.38471127, +0.309632719, +3.02717018, +2.14455223

176size

177+2, +3, +1, +1

178

179

180ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

181[ndim, nsam, ncls]

182+2, +3, +3

183sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

184sample

185+3.42633343, +4.87914419, +4.52042580

186+0.704374611, +3.36541390, +2.75196648

187call setResized(disq, nsam)

188call setResized(membership, nsam)

189call setResized(center, [ndim, ncls])

190call setResized(potential, ncls)

191call setResized(size, ncls)

192

193call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

194failed

195F

196niter

197+1

198disq

199+0.00000000, +0.00000000, +0.00000000

200membership

201+2, +3, +1

202potential

203+0.00000000, +0.00000000, +0.00000000

204center

205+4.52042580, +3.42633343, +4.87914419

206+2.75196648, +0.704374611, +3.36541390

207size

208+1, +1, +1

209

210

211ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

212[ndim, nsam, ncls]

213+5, +7, +4

214sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

215sample

216+0.335005224, +3.19071317, +1.81889749, +4.55224800, +2.48839808, +3.59679031, +1.09610105

217+1.55076122, +2.90543056, +1.28109694, +1.30711412, +3.08899903, +0.990077555, +4.96392870

218+0.603176951, +2.59593105, +2.10227966, +4.72403002, +1.45300925, +4.23028374, +3.30410004

219+3.65917563, +4.09179115, +3.23305631, +1.01636267, +1.72277927, +4.31291151, +4.91639090

220+0.135381818, +2.18604064, +1.21946192, +0.665618479, +0.938138664, +1.07311583, +2.86370158

221call setResized(disq, nsam)

222call setResized(membership, nsam)

223call setResized(center, [ndim, ncls])

224call setResized(potential, ncls)

225call setResized(size, ncls)

226

227call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

228failed

229F

230niter

231+1

232disq

233+0.00000000, +1.07133758, +0.00000000, +4.67686510, +4.67686462, +6.47388220, +7.29835606

234membership

235+3, +2, +4, +1, +1, +2, +2

236potential

237+18.7074585, +18.0575886, +0.00000000, +0.00000000

238center

239+3.52032304, +2.62786818, +0.335005224, +1.81889749

240+2.19805670, +2.95314574, +1.55076122, +1.28109694

241+3.08851957, +3.37677169, +0.603176951, +2.10227966

242+1.36957097, +4.44036484, +3.65917563, +3.23305631

243+0.801878572, +2.04095268, +0.135381818, +1.21946192

244size

245+2, +3, +1, +1

246

247

248ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

249[ndim, nsam, ncls]

250+5, +3, +2

251sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

252sample

253+2.67327094, +4.58450317, +2.33191395

254+4.88044834, +3.23485970, +4.64746809

255+4.03332853, +2.09349895, +3.93679595

256+0.102661848, +1.45601845, +4.11710501

257+3.25221419, +2.74718666, +0.947423279

258call setResized(disq, nsam)

259call setResized(membership, nsam)

260call setResized(center, [ndim, ncls])

261call setResized(potential, ncls)

262call setResized(size, ncls)

263

264call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

265failed

266F

267niter

268+1

269disq

270+3.05258465, +3.05258369, +0.00000000

271membership

272+2, +2, +1

273potential

274+0.00000000, +12.2103367

275center

276+2.33191395, +3.62888718

277+4.64746809, +4.05765390

278+3.93679595, +3.06341362

279+4.11710501, +0.779340148

280+0.947423279, +2.99970055

281size

282+1, +2

283

284

285ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

286[ndim, nsam, ncls]

287+1, +2, +1

288sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

289sample

290+4.38341188, +3.85437036

291call setResized(disq, nsam)

292call setResized(membership, nsam)

293call setResized(center, [ndim, ncls])

294call setResized(potential, ncls)

295call setResized(size, ncls)

296

297call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

298failed

299F

300niter

301+1

302disq

303+0.699711740E-1, +0.699713007E-1

304membership

305+1, +1

306potential

307+0.279884934

308center

309+4.11889124

310size

311+2

312

313

314ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

315[ndim, nsam, ncls]

316+2, +7, +4

317sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

318sample

319+4.91269827, +3.94869113, +3.45396805, +0.466822684, +2.31161690, +3.39903259, +2.29473591

320+4.38696861, +3.01287937, +3.21296883, +0.671095252, +1.87571108, +2.92932224, +2.89692926

321call setResized(disq, nsam)

322call setResized(membership, nsam)

323call setResized(center, [ndim, ncls])

324call setResized(potential, ncls)

325call setResized(size, ncls)

326

327call setKmeans(rngf, membership, disq, sample, center, size, potential, failed, niter) ! compute the new clusters and memberships.

328failed

329F

330niter

331+1

332disq

333+0.00000000, +0.122701362, +0.474904366E-1, +0.00000000, +0.260792822, +0.555969737E-1, +0.260792941

334membership

335+1, +4, +4, +2, +3, +4, +3

336potential

337+0.00000000, +0.00000000, +1.04317153, +0.593893051

338center

339+4.91269827, +0.466822684, +2.30317640, +3.60056400

340+4.38696861, +0.671095252, +2.38632011, +3.05172348

341size

342+1, +1, +2, +3

343

344

Postprocessing of the example output ⛓
1#!/usr/bin/env python

2

3import matplotlib.pyplot as plt

4import pandas as pd

5import numpy as np

6import glob

7import sys

8

9

10fontsize = 17

11

12prefixes = { "setKmeansPP" : "Memberships based on k-means++."

13 , "setKmeans" : "k-means refinement of k-means++ memberships."

14 }

15for prefix in list(prefixes.keys()):

16

17 legends = []

18 fig = plt.figure(figsize = 1.25 * np.array([6.4, 4.8]), dpi = 200)

19 ax = plt.subplot()

20

21 fileList = glob.glob(prefix + ".*.txt")

22 if len(fileList) == 2:

23

24 for file in fileList:

25

26 kind = file.split(".")[1]

27 df = pd.read_csv(file, delimiter = ",", header = None)

28

29 if kind == "center":

30 ax.scatter ( df.values[:,1]

31 , df.values[:,2]

32 , zorder = 100

33 , marker = "*"

34 , c = "red"

35 , s = 50

36 )

37 legends.append("center")

38 elif kind == "sample":

39 ax.scatter ( df.values[:,1]

40 , df.values[:,2]

41 , c = df.values[:, 0]

42 , s = 10

43 )

44 legends.append("sample")

45 else:

46 sys.exit("Ambiguous file exists: {}".format(file))

47

48 ax.legend(legends, fontsize = fontsize)

49 plt.xticks(fontsize = fontsize - 2)

50 plt.yticks(fontsize = fontsize - 2)

51 ax.set_xlabel("X", fontsize = 17)

52 ax.set_ylabel("Y", fontsize = 17)

53 ax.set_title(prefixes[prefix], fontsize = fontsize)

54

55 plt.axis('equal')

56 plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")

57 ax.tick_params(axis = "y", which = "minor")

58 ax.tick_params(axis = "x", which = "minor")

59 ax.set_axisbelow(True)

60 plt.tight_layout()

61

62 plt.savefig(prefix + ".png")

63

64 else:

65

66 sys.exit("Ambiguous file list exists.")

Visualization of the example output ⛓

Test:: test_pm_clustering

Final Remarks ⛓

If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Copyright: Computational Data Science Lab

Author:: Amir Shahmoradi, September 1, 2012, 12:00 AM, National Institute for Fusion Studies, The University of Texas at Austin

Definition at line 1292 of file pm_clustering.F90.

The documentation for this interface was generated from the following file:

src/fortran/main/pm_clustering.F90