Compute and return an asymptotically optimal set of cluster centers for the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.
More...

Detailed Description

Compute and return an asymptotically optimal set of cluster centers for the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.

See the documentation of pm_clusKmeans for more information on the Kmeans++ clustering algorithm.
The metric used within this generic interface is the Euclidean distance.

Parameters

[in,out]	rng	: The input/output scalar that can be an object of, type rngf_type, implying the use of intrinsic Fortran uniform RNG. type xoshiro256ssw_type, implying the use of xoshiro256** uniform RNG.
[out]	membership	: The output vector of shape `(1:nsam)` of type `integer` of default kind IK, containing the membership of each input sample in `sample` from its nearest cluster `center`, such that `cluster(membership(i))` is the nearest cluster center to the `i`th sample `sample(:, i)` at a squared-distance of `disq(i)`.
[out]	disq	: The output vector of shape `(1:nsam)` of the same type and kind as the input argument `sample`, containing the Euclidean squared distance of each input sample in `sample` from its nearest cluster `center`.
[out]	csdisq	: The output vector of shape `(1:nsam+1)` of the same type and kind as the input argument `sample`, containing the cumulative sum of the Euclidean squared distance of each input sample in `sample` from its nearest cluster `center`. While the output contents are mostly useless, this argument can aid the algorithm efficiency to resolving the need for internal space allocation. This potential speed-up is particularly relevant when the procedure is called repeatedly many times on samples of the same size.
[in]	sample	: The input vector, or matrix of, type `real` of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128), containing the sample of `nsam` points in a `ndim`-dimensional space whose corresponding cluster centers must be computed. If `sample` is a vector of shape `(1 : nsam)` and `center` is a vector of shape `(1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in univariate space). If `sample` is a matrix of shape `(1 : ndim, 1 : nsam)` and `center` is a matrix of shape `(1 : ndim, 1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in `ndim`-dimensional space).
[in]	ncls	: The input scalar of type `integer` of default kind IK, containing the number of the desired clusters to be identified in the sample. (optional, default = `ubound(center, 2)`. It must be present if and only if the output arguments `center`, `size`, and `potential` are all missing.)
[out]	center	: The output vector of shape `(1:ncls)` or matrix of shape `(1 : ndim, 1 : ncls)` of the same type and kind as the input argument `sample`, containing the set of `ncls` unique random cluster centers (centroids) selected from the input sample based on the computed memberships and minimum sample-cluster distances `disq`. (optional. If missing, no cluster `center` information will be output.)
[out]	size	: The output vector of shape `(1:ncls)` type `integer` of default kind IK, containing the sizes (number of members) of the clusters with the corresponding centers output in the argument `center`. (optional. If missing, no cluster `size` information will be output.)
[out]	potential	: The output vector of shape `(1:ncls)` of the same type and kind as the input argument `sample`, the `i`th element of which contains the sum of squared distances of all members of the `i`th cluster from the cluster center as output in the `i`th element of `center`. (optional. If missing, no cluster `potential` information will be output.)

Possible calling interfaces ⛓

: use pm_clusKmeans, only: setKmeansPP

call setKmeansPP(rng, membership(1 : nsam), disq(1 : nsam), csdisq(0 : nsam), sample(1 : ndim, 1 : nsam), ncls)

call setKmeansPP(rng, membership(1 : nsam), disq(1 : nsam), csdisq(0 : nsam), sample(1 : ndim, 1 : nsam), center(1 : ndim, 1 : ncls), size(1 : ncls), potential(1 : ncls))

pm_clusKmeans::setKmeansPP
Compute and return an asymptotically optimal set of cluster centers for the input sample,...
Definition: pm_clusKmeans.F90:1181

pm_clusKmeans
This module contains procedures and routines for the computing the Kmeans clustering of a given set o...
Definition: pm_clusKmeans.F90:113

Warning: The condition ubound(center, rank(center)) > 0 must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == size(disq, 1) must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == size(csdisq, 1) - 1 must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == size(size, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == size(potential, 1) must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == size(membership, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) <= size(sample, rank(sample)) must hold for the corresponding input arguments (the number of clusters must be less than or equal to the sample size).
The condition ubound(sample, 1) == ubound(center, 1) must hold for the corresponding input arguments.
These conditions are verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.; By definition, the number of points in the input sample must be larger than the specified number of clusters.; The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds. The procedures of this generic interface are always impure when the input rng argument is set to an object of type rngf_type.

Remarks: The functionality of this generic interface is similar to setMember, with the major difference being that setKmeansPP simultaneously computes the new cluster centers and sample memberships, whereas setMember computes the new sample memberships based on a given set of cluster centers.; The output of setKmeansPP can be directly passed to setCenter to the learn the new updated cluster centers and their sizes.

Note: Dropping the optional arguments can aid runtime performance.
This is particularly relevant when the output of this generic interface is directly passed to the k-means algorithm.

See also: setKmeans
setCenter
setMember
setKmeansPP
Arthur, D.; Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding

Example usage ⛓

: 1program example

2

3 use pm_kind, only: SK, IK, LK

4 use pm_kind, only: RKG => RKS ! all other real kinds are also supported.

5 use pm_io, only: display_type

6 use pm_distUnif, only: getUnifRand

7 use pm_arrayResize, only: setResized

8 use pm_clusKmeans, only: setKmeansPP, rngf

9

10 implicit none

11

12 integer(IK) :: ndim, nsam, ncls, itry

13 real(RKG) , allocatable :: sample(:,:), center(:,:), disq(:), csdisq(:), potential(:)

14 integer(IK) , allocatable :: membership(:), size(:)

15 type(display_type) :: disp

16

17 disp = display_type(file = "main.out.F90")

18

19 call disp%skip

20 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

21 call disp%show("! Compute cluster centers based on an input sample and cluster memberships and member-center distances.")

22 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

23 call disp%skip

24

25 do itry = 1, 10

26 call disp%skip()

27 call disp%show("ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);")

28 ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

29 call disp%show("[ndim, nsam, ncls]")

30 call disp%show( [ndim, nsam, ncls] )

31 call disp%show("sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.")

32 sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

33 call disp%show("sample")

34 call disp%show( sample )

35 call disp%show("call setResized(disq, nsam)")

36 call setResized(disq, nsam)

37 call disp%show("call setResized(csdisq, nsam + 1_IK)")

38 call setResized(csdisq, nsam + 1_IK)

39 call disp%show("call setResized(membership, nsam)")

40 call setResized(membership, nsam)

41 call disp%show("call setResized(center, [ndim, ncls])")

42 call setResized(center, [ndim, ncls])

43 call disp%show("call setResized(potential, ncls)")

44 call setResized(potential, ncls)

45 call disp%show("call setResized(size, ncls)")

46 call setResized(size, ncls)

47 call disp%skip()

48

49 call disp%show("call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.")

50 call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

51 call disp%show("disq")

52 call disp%show( disq )

53 call disp%show("csdisq")

54 call disp%show( csdisq )

55 call disp%show("membership")

56 call disp%show( membership )

57 call disp%show("potential")

58 call disp%show( potential )

59 call disp%show("center")

60 call disp%show( center )

61 call disp%show("size")

62 call disp%show( size )

63 call disp%skip()

64 end do

65

66 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

67 ! Output an example for visualization.

68 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

69

70 block

71 integer(IK) :: funit, i

72 ndim = 2

73 ncls = 5

74 nsam = 5000

75 center = getUnifRand(0., 1., ndim, ncls)

76 sample = getUnifRand(0., 1., ndim, nsam)

77 call setResized(csdisq, nsam + 1_IK)

78 call setResized(disq, nsam)

79 call setResized(membership, nsam)

80 call setResized(center, [ndim, ncls])

81 call setResized(potential, ncls)

82 call setResized(size, ncls)

83 call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential)

84 open(newunit = funit, file = "setKmeansPP.center.txt")

85 do i = 1, ncls

86 write(funit, "(*(g0,:,','))") i, center(:,i)

87 end do

88 close(funit)

89 open(newunit = funit, file = "setKmeansPP.sample.txt")

90 do i = 1, nsam

91 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)

92 end do

93 close(funit)

94 end block

95

96end program example

pm_arrayResize::setResized
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Definition: pm_arrayResize.F90:249

pm_distUnif::getUnifRand
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
Definition: pm_distUnif.F90:4150

pm_io::show
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726

pm_io::skip
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508

pm_arrayResize
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
Definition: pm_arrayResize.F90:81

pm_distUnif
This module contains classes and procedures for computing various statistical quantities related to t...
Definition: pm_distUnif.F90:274

pm_distUnif::rngf
type(rngf_type) rngf
The scalar constant object of type rngf_type whose presence signified the use of the Fortran intrinsi...
Definition: pm_distUnif.F90:2886

pm_io
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252

pm_io::disp
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393

pm_kind
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268

pm_kind::LK
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541

pm_kind::IK
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540

pm_kind::SK
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539

pm_kind::RKS
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567

pm_io::display_type
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example Windows Batch compile command via Intel ifort compiler ⛓
1del main.exe

2set PATH=..\..\..\lib;%PATH%

3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe

4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example output ⛓
1

2!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3! Compute cluster centers based on an input sample and cluster memberships and member-center distances.

4!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

5

6

7ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

8[ndim, nsam, ncls]

9+4, +6, +4

10sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

11sample

12+2.13936520, +2.17753863, +0.947716832, +2.24455094, +2.49617219, +4.08531094

13+0.651172996, +4.58386707, +4.13581896, +0.258072019, +3.09623241, +4.53492117

14+2.17715549, +0.703817308, +1.67591071, +4.84981823, +0.341176987E-2, +0.323704481

15+3.63314629, +4.74875069, +4.84903622, +2.16089058, +2.26878262, +2.55698085

16call setResized(disq, nsam)

17call setResized(csdisq, nsam + 1_IK)

18call setResized(membership, nsam)

19call setResized(center, [ndim, ncls])

20call setResized(potential, ncls)

21call setResized(size, ncls)

22

23call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

24disq

25+0.00000000, +2.66823173, +0.00000000, +0.00000000, +4.78083277, +0.00000000

26csdisq

27+0.00000000, +0.00000000, +8.59033203, +23.8827496, +23.8827496, +28.6635818, +28.6635818

28membership

29+2, +4, +4, +3, +1, +1

30potential

31+4.78083277, +0.00000000, +0.00000000, +2.66823173

32center

33+4.08531094, +2.13936520, +2.24455094, +0.947716832

34+4.53492117, +0.651172996, +0.258072019, +4.13581896

35+0.323704481, +2.17715549, +4.84981823, +1.67591071

36+2.55698085, +3.63314629, +2.16089058, +4.84903622

37size

38+2, +1, +1, +2

39

40

41ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

42[ndim, nsam, ncls]

43+3, +2, +2

44sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

45sample

46+2.59935498, +2.73426986

47+2.57192349, +0.270605683

48+4.94351578, +4.36086369

49call setResized(disq, nsam)

50call setResized(csdisq, nsam + 1_IK)

51call setResized(membership, nsam)

52call setResized(center, [ndim, ncls])

53call setResized(potential, ncls)

54call setResized(size, ncls)

55

56call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

57disq

58+0.00000000, +0.00000000

59csdisq

60+0.00000000, +0.00000000, +5.65374804

61membership

62+1, +2

63potential

64+0.00000000, +0.00000000

65center

66+2.59935498, +2.73426986

67+2.57192349, +0.270605683

68+4.94351578, +4.36086369

69size

70+1, +1

71

72

73ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

74[ndim, nsam, ncls]

75+4, +4, +4

76sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

77sample

78+4.35638189, +2.88354635, +4.50853157, +2.91200066

79+0.780458748, +2.55000567, +4.34746838, +2.98939347

80+3.27596521, +1.86596012, +4.73504066, +3.28873634

81+0.755269527, +3.99303579, +2.21679306, +4.70031929

82call setResized(disq, nsam)

83call setResized(csdisq, nsam + 1_IK)

84call setResized(membership, nsam)

85call setResized(center, [ndim, ncls])

86call setResized(potential, ncls)

87call setResized(size, ncls)

88

89call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

90disq

91+0.00000000, +0.00000000, +0.00000000, +0.00000000

92csdisq

93+0.00000000, +0.00000000, +0.00000000, +0.00000000, +2.71841335

94membership

95+3, +2, +1, +4

96potential

97+0.00000000, +0.00000000, +0.00000000, +0.00000000

98center

99+4.50853157, +2.88354635, +4.35638189, +2.91200066

100+4.34746838, +2.55000567, +0.780458748, +2.98939347

101+4.73504066, +1.86596012, +3.27596521, +3.28873634

102+2.21679306, +3.99303579, +0.755269527, +4.70031929

103size

104+1, +1, +1, +1

105

106

107ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

108[ndim, nsam, ncls]

109+3, +6, +3

110sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

111sample

112+4.24514055, +0.897808373, +0.927564502, +3.35978222, +4.77875185, +0.634441078

113+4.65616798, +1.51281857, +1.44810057, +3.76173306, +2.64224172, +1.65918112

114+0.345850587, +4.36776066, +3.06332016, +1.13277137, +4.39145947, +2.38435626

115call setResized(disq, nsam)

116call setResized(csdisq, nsam + 1_IK)

117call setResized(membership, nsam)

118call setResized(center, [ndim, ncls])

119call setResized(potential, ncls)

120call setResized(size, ncls)

121

122call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

123disq

124+0.00000000, +0.00000000, +1.70663893, +2.20311761, +0.00000000, +4.02467728

125csdisq

126+0.00000000, +0.00000000, +0.00000000, +1.70663893, +3.90975666, +20.2476349, +24.2723122

127membership

128+1, +2, +2, +1, +3, +2

129potential

130+2.20311761, +5.73131609, +0.00000000

131center

132+4.24514055, +0.897808373, +4.77875185

133+4.65616798, +1.51281857, +2.64224172

134+0.345850587, +4.36776066, +4.39145947

135size

136+2, +3, +1

137

138

139ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

140[ndim, nsam, ncls]

141+4, +7, +4

142sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

143sample

144+2.47468925, +3.65329885, +2.77629018, +2.35368824, +4.87198305, +0.579373837, +0.306171477

145+3.19468379, +1.44800127, +0.577837527, +2.03795338, +3.27591443, +0.616323948E-1, +0.865943432

146+0.283397734, +2.40631151, +0.382784009, +4.38773918, +2.25751114, +3.19946575, +3.87321019

147+0.704022348, +4.14455366, +3.30926275, +4.21819019, +0.974922180, +1.09258795, +2.93798327

148call setResized(disq, nsam)

149call setResized(csdisq, nsam + 1_IK)

150call setResized(membership, nsam)

151call setResized(center, [ndim, ncls])

152call setResized(potential, ncls)

153call setResized(size, ncls)

154

155call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

156disq

157+0.00000000, +6.31870413, +0.00000000, +7.46960211, +0.00000000, +4.58097124, +0.00000000

158csdisq

159+0.00000000, +9.72412682, +16.0428314, +16.0428314, +23.5124340, +23.5124340, +28.0934048, +28.0934048

160membership

161+4, +3, +3, +1, +2, +1, +1

162potential

163+12.0505733, +0.00000000, +6.31870413, +0.00000000

164center

165+0.306171477, +4.87198305, +2.77629018, +2.47468925

166+0.865943432, +3.27591443, +0.577837527, +3.19468379

167+3.87321019, +2.25751114, +0.382784009, +0.283397734

168+2.93798327, +0.974922180, +3.30926275, +0.704022348

169size

170+3, +1, +2, +1

171

172

173ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

174[ndim, nsam, ncls]

175+4, +1, +1

176sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

177sample

178+0.168116689

179+3.76824617

180+3.84335637

181+2.20873475

182call setResized(disq, nsam)

183call setResized(csdisq, nsam + 1_IK)

184call setResized(membership, nsam)

185call setResized(center, [ndim, ncls])

186call setResized(potential, ncls)

187call setResized(size, ncls)

188

189call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

190disq

191+0.00000000

192csdisq

193+0.00000000, +0.00000000

194membership

195+1

196potential

197+0.00000000

198center

199+0.168116689

200+3.76824617

201+3.84335637

202+2.20873475

203size

204+1

205

206

207ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

208[ndim, nsam, ncls]

209+5, +4, +2

210sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

211sample

212+4.59279203, +3.36288404, +4.68041801, +3.04189396

213+3.75420284, +0.365414619, +2.62552929, +1.73626685

214+4.54929590, +2.66871405, +2.92258310, +2.53342366

215+2.00651026, +1.38389969, +1.00021958, +4.89269209

216+2.82053685, +2.90131426, +2.05227757, +1.54346645

217call setResized(disq, nsam)

218call setResized(csdisq, nsam + 1_IK)

219call setResized(membership, nsam)

220call setResized(center, [ndim, ncls])

221call setResized(potential, ncls)

222call setResized(size, ncls)

223

224call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

225disq

226+5.53062010, +0.00000000, +0.00000000, +16.1559486

227csdisq

228+0.00000000, +5.53062010, +13.3071575, +13.3071575, +32.3443832

229membership

230+1, +2, +1, +2

231potential

232+5.53062010, +16.1559486

233center

234+4.68041801, +3.36288404

235+2.62552929, +0.365414619

236+2.92258310, +2.66871405

237+1.00021958, +1.38389969

238+2.05227757, +2.90131426

239size

240+2, +2

241

242

243ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

244[ndim, nsam, ncls]

245+4, +1, +1

246sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

247sample

248+3.30441117

249+4.63102007

250+3.74004626

251+1.11510396

252call setResized(disq, nsam)

253call setResized(csdisq, nsam + 1_IK)

254call setResized(membership, nsam)

255call setResized(center, [ndim, ncls])

256call setResized(potential, ncls)

257call setResized(size, ncls)

258

259call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

260disq

261+0.00000000

262csdisq

263+0.00000000, +0.00000000

264membership

265+1

266potential

267+0.00000000

268center

269+3.30441117

270+4.63102007

271+3.74004626

272+1.11510396

273size

274+1

275

276

277ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

278[ndim, nsam, ncls]

279+1, +8, +4

280sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

281sample

282+3.24985957, +2.60311246, +4.29312801, +2.91351557, +3.01679873, +1.15002632, +1.67796826, +1.60555029

283call setResized(disq, nsam)

284call setResized(csdisq, nsam + 1_IK)

285call setResized(membership, nsam)

286call setResized(center, [ndim, ncls])

287call setResized(potential, ncls)

288call setResized(size, ncls)

289

290call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

291disq

292+0.113127291, +0.00000000, +0.00000000, +0.00000000, +0.106674125E-1, +0.278722703, +0.00000000, +0.524436310E-2

293csdisq

294+0.00000000, +0.113127291, +0.209477380, +0.209477380, +0.209477380, +0.220144793, +0.498867512, +0.498867512, +0.504111886

295membership

296+1, +4, +3, +1, +1, +2, +2, +2

297potential

298+0.123794705, +0.283967078, +0.00000000, +0.00000000

299center

300+2.91351557, +1.67796826, +4.29312801, +2.60311246

301size

302+3, +3, +1, +1

303

304

305ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

306[ndim, nsam, ncls]

307+3, +2, +1

308sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

309sample

310+0.919935107, +3.98182631

311+3.51151609, +3.49986529

312+4.04880810, +4.76683187

313call setResized(disq, nsam)

314call setResized(csdisq, nsam + 1_IK)

315call setResized(membership, nsam)

316call setResized(center, [ndim, ncls])

317call setResized(potential, ncls)

318call setResized(size, ncls)

319

320call setKmeansPP(rngf, membership, disq, csdisq, sample, center, size, potential) ! compute the new clusters and memberships.

321disq

322+0.00000000, +9.89087105

323csdisq

324+0.00000000, +0.00000000, +9.89087105

325membership

326+1, +1

327potential

328+9.89087105

329center

330+0.919935107

331+3.51151609

332+4.04880810

333size

334+2

335

336

Postprocessing of the example output ⛓
1#!/usr/bin/env python

2

3import matplotlib.pyplot as plt

4import pandas as pd

5import numpy as np

6import glob

7import sys

8import os

9

10fontsize = 17

11fig = plt.figure(figsize = 1.25 * np.array([6.4, 4.8]), dpi = 200)

12ax = plt.subplot()

13

14parent = os.path.basename(os.path.dirname(__file__))

15pattern = parent + "*.txt"

16

17fileList = glob.glob(pattern)

18legends = []

19if len(fileList) == 2:

20 for file in fileList:

21

22 kind = file.split(".")[1]

23 prefix = file.split(".")[0]

24 df = pd.read_csv(file, delimiter = ",", header = None)

25

26 if kind == "center":

27 ax.scatter ( df.values[:, 1]

28 , df.values[:,2]

29 , zorder = 100

30 , marker = "*"

31 , c = "red"

32 , s = 50

33 )

34 legends.append("center")

35 elif kind == "sample":

36 ax.scatter ( df.values[:, 1]

37 , df.values[:,2]

38 , c = df.values[:, 0]

39 , s = 10

40 )

41 legends.append("sample")

42 else:

43 sys.exit("Ambiguous file exists: {}".format(file))

44

45 ax.legend(legends, fontsize = fontsize)

46 plt.xticks(fontsize = fontsize - 2)

47 plt.yticks(fontsize = fontsize - 2)

48 ax.set_xlabel("X", fontsize = 17)

49 ax.set_ylabel("Y", fontsize = 17)

50 ax.set_title("Membership Scatter Plot", fontsize = fontsize)

51

52 plt.axis('equal')

53 plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")

54 ax.tick_params(axis = "y", which = "minor")

55 ax.tick_params(axis = "x", which = "minor")

56 ax.set_axisbelow(True)

57 plt.tight_layout()

58

59 plt.savefig(prefix + ".png")

60else:

61 sys.exit("Ambiguous file list exists.")

Visualization of the example output ⛓

Test:: test_pm_clusKmeans

Final Remarks ⛓

If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Copyright: Computational Data Science Lab If you use or redistribute ideas based on this generic interface implementation, you should also cite the original k-means++ article:
Arthur, D.; Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding

Author:: Amir Shahmoradi, September 1, 2012, 12:00 AM, National Institute for Fusion Studies, The University of Texas Austin

Definition at line 1181 of file pm_clusKmeans.F90.

The documentation for this interface was generated from the following file:

src/fortran/main/pm_clusKmeans.F90