Compute and return an asymptotically optimal set of cluster centers for the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.
More...

Detailed Description

Compute and return an asymptotically optimal set of cluster centers for the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.

In data mining, the k-means++ is an algorithm for choosing the initial values (or seeds) for the k-means clustering algorithm.
It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problem.
It offers a way of avoiding the sometimes poor clustering found by the standard k-means algorithm.

Intuition

The intuition behind k-means++ is that spreading out the \(k\) initial cluster centers is a good thing:
The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its squared distance from the closest existing cluster center to the point.

Algorithm

The exact algorithm is as follows:

Choose one center uniformly at random among the data points.
For each data point \(x\) not chosen yet, compute \(D(x)\), the distance between \(x\) and the nearest center that has already been chosen.
Choose one new data point at random as a new center, using a weighted probability distribution where a point \(x\) is chosen with probability proportional to \(D(x)^2\).
Repeat Steps 2 and 3 until \(k\) centers have been chosen.
Now that the initial centers have been chosen, proceed using standard k-means clustering.

Performance improvements

The k-means++ seeding method yields considerable improvement in the final error of k-means algorithm.
Although the initial selection in the algorithm takes extra time, the k-means part itself converges very quickly after this seeding and thus the algorithm actually lowers the computation time.
Based on the original paper, the method yields typically 2-fold improvements in speed, and for certain datasets, close to 1000-fold improvements in error.
In these simulations the new method almost always performed at least as well as vanilla k-means in both speed and error.

Note: The metric used within this generic interface is the Euclidean distance.

Parameters

[in,out]	rng	: The input/output scalar that can be an object of, type rngf_type, implying the use of intrinsic Fortran uniform RNG. type xoshiro256ssw_type, implying the use of xoshiro256** uniform RNG.
[out]	membership	: The output vector of shape `(1:nsam)` of type `integer` of default kind IK, containing the membership of each input sample in `sample` from its nearest cluster `center`, such that `cluster(membership(i))` is the nearest cluster center to the `i`th sample `sample(:, i)` at a squared-distance of `disq(i)`.
[out]	disq	: The output vector of shape `(1:nsam)` of the same type and kind as the input argument `sample`, containing the Euclidean squared distance of each input sample in `sample` from its nearest cluster `center`.
[in]	sample	: The input vector, or matrix of, type `real` of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128), containing the sample of `nsam` points in a `ndim`-dimensional space whose corresponding cluster centers must be computed. If `sample` is a vector of shape `(1 : nsam)` and `center` is a vector of shape `(1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in univariate space). If `sample` is a matrix of shape `(1 : ndim, 1 : nsam)` and `center` is a matrix of shape `(1 : ndim, 1 : ncls)`, then the input `sample` must be a collection of `nsam` points (in `ndim`-dimensional space).
[in]	ncls	: The input scalar of type `integer` of default kind IK, containing the number of the desired clusters to be identified in the sample. (optional, default = `ubound(center, 2)`. It must be present if and only if the output arguments `center`, `size`, and `potential` are all missing.)
[out]	center	: The output vector of shape `(1:ncls)` or matrix of shape `(1 : ndim, 1 : ncls)` of the same type and kind as the input argument `sample`, containing the set of `ncls` unique random cluster centers (centroids) selected from the input sample based on the computed memberships and minimum sample-cluster distances `disq`. (optional. If missing, no cluster `center` information will be output.)
[out]	size	: The output vector of shape `(1:ncls)` type `integer` of default kind IK, containing the sizes (number of members) of the clusters with the corresponding centers output in the argument `center`. (optional. If missing, no cluster `size` information will be output.)
[out]	potential	: The output vector of shape `(1:ncls)` of the same type and kind as the input argument `sample`, the `i`th element of which contains the sum of squared distances of all members of the `i`th cluster from the cluster center as output in the `i`th element of `center`. (optional. If missing, no cluster `potential` information will be output.)

Possible calling interfaces ⛓

: use pm_clustering, only: setKmeansPP

call setKmeansPP(rng, membership(1 : nsam) , disq(1 : nsam), sample(1 : ndim, 1 : nsam), ncls)

call setKmeansPP(rng, membership(1 : nsam) , disq(1 : nsam), sample(1 : ndim, 1 : nsam), center(1 : ndim, 1 : ncls), size(1 : ncls), potential(1 : ncls))

pm_clustering::setKmeansPP
Compute and return an asymptotically optimal set of cluster centers for the input sample,...
Definition: pm_clustering.F90:787

pm_clustering
This module contains procedures and routines for the computing the Kmeans clustering of a given set o...
Definition: pm_clustering.F90:35

Warning: The condition ubound(center, rank(center)) > 0 must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(disq, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(size, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(potential, 1) must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(membership, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) <= ubound(sample, rank(sample)) must hold for the corresponding input arguments (the number of clusters must be less than or equal to the sample size).
The condition ubound(sample, 1) == ubound(center, 1) must hold for the corresponding input arguments.
These conditions are verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.; By definition, the number of points in the input sample must be larger than the specified number of clusters.; The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds. The procedures of this generic interface are always impure when the input rng argument is set to an object of type rngf_type.

Remarks: The functionality of this generic interface is similar to setMember, with the major difference being that setKmeansPP simultaneously computes the new cluster centers and sample memberships, whereas setMember computes the new sample memberships based on a given set of cluster centers.; The output of setKmeansPP can be directly passed to setCenter to the learn the new updated cluster centers and their sizes.

Note: Dropping the optional arguments can aid runtime performance.
This is particularly relevant when the output of this generic interface is directly passed to the k-means algorithm.

See also: setKmeans
setCenter
setMember
setKmeansPP
Arthur, D.; Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding

Example usage ⛓

: 1program example

2

3 use pm_kind, only: SK, IK, LK

4 use pm_kind, only: RKG => RKS ! all other real kinds are also supported.

5 use pm_io, only: display_type

6 use pm_distUnif, only: getUnifRand

7 use pm_arrayResize, only: setResized

8 use pm_clustering, only: setKmeansPP, rngf

9

10 implicit none

11

12 integer(IK) :: ndim, nsam, ncls, itry

13 real(RKG) , allocatable :: sample(:,:), center(:,:), disq(:), potential(:)

14 integer(IK) , allocatable :: membership(:), size(:)

15 type(display_type) :: disp

16

17 disp = display_type(file = "main.out.F90")

18

19 call disp%skip

20 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

21 call disp%show("! Compute cluster centers based on an input sample and cluster memberships and member-center distances.")

22 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")

23 call disp%skip

24

25 do itry = 1, 10

26 call disp%skip()

27 call disp%show("ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);")

28 ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

29 call disp%show("[ndim, nsam, ncls]")

30 call disp%show( [ndim, nsam, ncls] )

31 call disp%show("sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.")

32 sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

33 call disp%show("sample")

34 call disp%show( sample )

35 call disp%show("call setResized(disq, nsam)")

36 call setResized(disq, nsam)

37 call disp%show("call setResized(membership, nsam)")

38 call setResized(membership, nsam)

39 call disp%show("call setResized(center, [ndim, ncls])")

40 call setResized(center, [ndim, ncls])

41 call disp%show("call setResized(potential, ncls)")

42 call setResized(potential, ncls)

43 call disp%show("call setResized(size, ncls)")

44 call setResized(size, ncls)

45 call disp%skip()

46

47 call disp%show("call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.")

48 call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

49 call disp%show("disq")

50 call disp%show( disq )

51 call disp%show("membership")

52 call disp%show( membership )

53 call disp%show("potential")

54 call disp%show( potential )

55 call disp%show("center")

56 call disp%show( center )

57 call disp%show("size")

58 call disp%show( size )

59 call disp%skip()

60 end do

61

62 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

63 ! Output an example for visualization.

64 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

65

66 block

67 integer(IK) :: funit, i

68 ndim = 2

69 ncls = 5

70 nsam = 5000

71 center = getUnifRand(0., 1., ndim, ncls)

72 sample = getUnifRand(0., 1., ndim, nsam)

73 call setResized(disq, nsam)

74 call setResized(membership, nsam)

75 call setResized(center, [ndim, ncls])

76 call setResized(potential, ncls)

77 call setResized(size, ncls)

78 call setKmeansPP(rngf, membership, disq, sample, center, size, potential)

79 open(newunit = funit, file = "setKmeansPP.center.txt")

80 do i = 1, ncls

81 write(funit, "(*(g0,:,','))") i, center(:,i)

82 end do

83 close(funit)

84 open(newunit = funit, file = "setKmeansPP.sample.txt")

85 do i = 1, nsam

86 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)

87 end do

88 close(funit)

89 end block

90

91end program example

pm_arrayResize::setResized
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Definition: pm_arrayResize.F90:249

pm_distUnif::getUnifRand
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
Definition: pm_distUnif.F90:4150

pm_io::show
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726

pm_io::skip
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508

pm_arrayResize
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
Definition: pm_arrayResize.F90:81

pm_distUnif
This module contains classes and procedures for computing various statistical quantities related to t...
Definition: pm_distUnif.F90:274

pm_distUnif::rngf
type(rngf_type) rngf
The scalar constant object of type rngf_type whose presence signified the use of the Fortran intrinsi...
Definition: pm_distUnif.F90:2886

pm_io
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252

pm_io::disp
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393

pm_kind
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268

pm_kind::LK
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541

pm_kind::IK
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540

pm_kind::SK
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539

pm_kind::RKS
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567

pm_io::display_type
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example Windows Batch compile command via Intel ifort compiler ⛓
1del main.exe

2set PATH=..\..\..\lib;%PATH%

3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe

4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example output ⛓
1

2!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3! Compute cluster centers based on an input sample and cluster memberships and member-center distances.

4!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

5

6

7ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

8[ndim, nsam, ncls]

9+1, +9, +5

10sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

11sample

12+3.09760118, +4.01347351, +4.67596292, +1.24449110, +4.79837036, +0.988928974, +4.86061478, +1.86515093, +2.04817796

13call setResized(disq, nsam)

14call setResized(membership, nsam)

15call setResized(center, [ndim, ncls])

16call setResized(potential, ncls)

17call setResized(size, ncls)

18

19call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

20disq

21+0.00000000, +0.00000000, +0.340963081E-1, +0.00000000, +0.387436734E-2, +0.653119981E-1, +0.00000000, +0.334988944E-1, +0.00000000

22membership

23+3, +4, +2, +1, +2, +1, +2, +5, +5

24potential

25+0.653119981E-1, +0.379706770E-1, +0.00000000, +0.00000000, +0.334988944E-1

26center

27+1.24449110, +4.86061478, +3.09760118, +4.01347351, +2.04817796

28size

29+2, +3, +1, +1, +2

30

31

32ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

33[ndim, nsam, ncls]

34+2, +4, +3

35sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

36sample

37+0.515738130, +3.02103639, +3.82168722, +0.714060664E-1

38+2.46166372, +2.16211796, +1.28180718, +1.19970536

39call setResized(disq, nsam)

40call setResized(membership, nsam)

41call setResized(center, [ndim, ncls])

42call setResized(potential, ncls)

43call setResized(size, ncls)

44

45call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

46disq

47+0.00000000, +0.00000000, +0.00000000, +1.78996992

48membership

49+2, +1, +3, +2

50potential

51+0.00000000, +1.78996992, +0.00000000

52center

53+3.02103639, +0.515738130, +3.82168722

54+2.16211796, +2.46166372, +1.28180718

55size

56+1, +2, +1

57

58

59ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

60[ndim, nsam, ncls]

61+1, +2, +1

62sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

63sample

64+4.79888678, +1.00716412

65call setResized(disq, nsam)

66call setResized(membership, nsam)

67call setResized(center, [ndim, ncls])

68call setResized(potential, ncls)

69call setResized(size, ncls)

70

71call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

72disq

73+0.00000000, +14.3771620

74membership

75+1, +1

76potential

77+14.3771620

78center

79+4.79888678

80size

81+2

82

83

84ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

85[ndim, nsam, ncls]

86+2, +6, +3

87sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

88sample

89+4.90091419, +1.72373474, +2.38117242, +1.21245646, +3.18957949, +3.30897856

90+4.08058310, +4.79557705, +4.59610128, +2.03943491, +1.72553992, +4.17742968

91call setResized(disq, nsam)

92call setResized(membership, nsam)

93call setResized(center, [ndim, ncls])

94call setResized(potential, ncls)

95call setResized(size, ncls)

96

97call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

98disq

99+2.54363823, +0.472014874, +0.00000000, +4.00754547, +0.00000000, +0.00000000

100membership

101+1, +3, +3, +2, +2, +1

102potential

103+2.54363823, +4.00754547, +0.472014874

104center

105+3.30897856, +3.18957949, +2.38117242

106+4.17742968, +1.72553992, +4.59610128

107size

108+2, +2, +2

109

110

111ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

112[ndim, nsam, ncls]

113+3, +2, +1

114sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

115sample

116+1.14988267, +1.60304523

117+0.300022960E-1, +3.04963923

118+0.715044141, +3.05558562

119call setResized(disq, nsam)

120call setResized(membership, nsam)

121call setResized(center, [ndim, ncls])

122call setResized(potential, ncls)

123call setResized(size, ncls)

124

125call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

126disq

127+14.8016968, +0.00000000

128membership

129+1, +1

130potential

131+14.8016968

132center

133+1.60304523

134+3.04963923

135+3.05558562

136size

137+2

138

139

140ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

141[ndim, nsam, ncls]

142+3, +7, +5

143sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

144sample

145+1.31039524, +2.36078358, +4.13939905, +3.18617964, +3.27243543, +2.89390826, +4.16665030

146+1.68747127, +4.64989996, +4.30699682, +0.235319138E-1, +2.62257528, +4.26669216, +1.83290064

147+4.06491137, +4.63971186, +1.69741392, +1.90310264, +0.353788435, +4.00688076, +4.35267925

148call setResized(disq, nsam)

149call setResized(membership, nsam)

150call setResized(center, [ndim, ncls])

151call setResized(potential, ncls)

152call setResized(size, ncls)

153

154call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

155disq

156+0.00000000, +0.831545353, +0.00000000, +0.00000000, +0.00000000, +0.00000000, +7.66279030

157membership

158+4, +5, +1, +2, +3, +5, +5

159potential

160+0.00000000, +0.00000000, +0.00000000, +0.00000000, +8.49433517

161center

162+4.13939905, +3.18617964, +3.27243543, +1.31039524, +2.89390826

163+4.30699682, +0.235319138E-1, +2.62257528, +1.68747127, +4.26669216

164+1.69741392, +1.90310264, +0.353788435, +4.06491137, +4.00688076

165size

166+1, +1, +1, +1, +3

167

168

169ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

170[ndim, nsam, ncls]

171+5, +5, +5

172sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

173sample

174+3.39266276, +2.42241621, +2.80469370, +0.854519904, +4.38353205

175+0.620661080, +1.61410904, +0.478409827, +0.732208192, +1.41081989

176+0.778235793, +0.114700198E-1, +2.78694415, +2.43016815, +1.54042244

177+3.66197371, +3.86894178, +0.353656113, +4.93827820, +1.48715794

178+1.39233327, +1.44989729, +4.34943295, +4.95386887, +4.16783524

179call setResized(disq, nsam)

180call setResized(membership, nsam)

181call setResized(center, [ndim, ncls])

182call setResized(potential, ncls)

183call setResized(size, ncls)

184

185call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

186disq

187+0.00000000, +0.00000000, +0.00000000, +0.00000000, +0.00000000

188membership

189+1, +5, +4, +2, +3

190potential

191+0.00000000, +0.00000000, +0.00000000, +0.00000000, +0.00000000

192center

193+3.39266276, +0.854519904, +4.38353205, +2.80469370, +2.42241621

194+0.620661080, +0.732208192, +1.41081989, +0.478409827, +1.61410904

195+0.778235793, +2.43016815, +1.54042244, +2.78694415, +0.114700198E-1

196+3.66197371, +4.93827820, +1.48715794, +0.353656113, +3.86894178

197+1.39233327, +4.95386887, +4.16783524, +4.34943295, +1.44989729

198size

199+1, +1, +1, +1, +1

200

201

202ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

203[ndim, nsam, ncls]

204+2, +5, +5

205sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

206sample

207+2.78194094, +2.07066584, +2.54888511, +1.01118147, +1.41291142

208+0.315136909, +3.63018084, +3.14631414, +4.36529112, +4.01214123

209call setResized(disq, nsam)

210call setResized(membership, nsam)

211call setResized(center, [ndim, ncls])

212call setResized(potential, ncls)

213call setResized(size, ncls)

214

215call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

216disq

217+0.00000000, +0.00000000, +0.00000000, +0.00000000, +0.00000000

218membership

219+2, +5, +1, +3, +4

220potential

221+0.00000000, +0.00000000, +0.00000000, +0.00000000, +0.00000000

222center

223+2.54888511, +2.78194094, +1.01118147, +1.41291142, +2.07066584

224+3.14631414, +0.315136909, +4.36529112, +4.01214123, +3.63018084

225size

226+1, +1, +1, +1, +1

227

228

229ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

230[ndim, nsam, ncls]

231+2, +4, +3

232sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

233sample

234+0.713005364, +0.361892879, +0.748941004, +0.368515253

235+3.42950010, +1.48842669, +4.12447643, +4.43461227

236call setResized(disq, nsam)

237call setResized(membership, nsam)

238call setResized(center, [ndim, ncls])

239call setResized(potential, ncls)

240call setResized(size, ncls)

241

242call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

243disq

244+0.00000000, +0.00000000, +0.240907997, +0.00000000

245membership

246+1, +3, +2, +2

247potential

248+0.00000000, +0.240907997, +0.00000000

249center

250+0.713005364, +0.368515253, +0.361892879

251+3.42950010, +4.43461227, +1.48842669

252size

253+1, +2, +1

254

255

256ndim = getUnifRand(1, 5); ncls = getUnifRand(1, 5); nsam = getUnifRand(ncls, 2 * ncls);

257[ndim, nsam, ncls]

258+1, +10, +5

259sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.

260sample

261+3.23012495, +0.428308249, +3.80425668, +4.68496466, +0.673015118, +3.38467789, +4.53701115, +0.359097123E-1, +2.58815289, +0.239347816

262call setResized(disq, nsam)

263call setResized(membership, nsam)

264call setResized(center, [ndim, ncls])

265call setResized(potential, ncls)

266call setResized(size, ncls)

267

268call setKmeansPP(rngf, membership, disq, sample, center, size, potential) ! compute the new clusters and memberships.

269disq

270+0.238866098E-1, +0.357060470E-1, +0.00000000, +0.218902417E-1, +0.00000000, +0.00000000, +0.00000000, +0.413870625E-1, +0.634452105, +0.00000000

271membership

272+1, +2, +4, +3, +5, +1, +3, +2, +1, +2

273potential

274+0.658338726, +0.770931095E-1, +0.218902417E-1, +0.00000000, +0.00000000

275center

276+3.38467789, +0.239347816, +4.53701115, +3.80425668, +0.673015118

277size

278+3, +3, +2, +1, +1

279

280

Postprocessing of the example output ⛓
1#!/usr/bin/env python

2

3import matplotlib.pyplot as plt

4import pandas as pd

5import numpy as np

6import glob

7import sys

8import os

9

10fontsize = 17

11fig = plt.figure(figsize = 1.25 * np.array([6.4, 4.8]), dpi = 200)

12ax = plt.subplot()

13

14parent = os.path.basename(os.path.dirname(__file__))

15pattern = parent + "*.txt"

16

17fileList = glob.glob(pattern)

18legends = []

19if len(fileList) == 2:

20 for file in fileList:

21

22 kind = file.split(".")[1]

23 prefix = file.split(".")[0]

24 df = pd.read_csv(file, delimiter = ",", header = None)

25

26 if kind == "center":

27 ax.scatter ( df.values[:,1]

28 , df.values[:,2]

29 , zorder = 100

30 , marker = "*"

31 , c = "red"

32 , s = 50

33 )

34 legends.append("center")

35 elif kind == "sample":

36 ax.scatter ( df.values[:,1]

37 , df.values[:,2]

38 , c = df.values[:, 0]

39 , s = 10

40 )

41 legends.append("sample")

42 else:

43 sys.exit("Ambiguous file exists: {}".format(file))

44

45 ax.legend(legends, fontsize = fontsize)

46 plt.xticks(fontsize = fontsize - 2)

47 plt.yticks(fontsize = fontsize - 2)

48 ax.set_xlabel("X", fontsize = 17)

49 ax.set_ylabel("Y", fontsize = 17)

50 ax.set_title("Membership Scatter Plot", fontsize = fontsize)

51

52 plt.axis('equal')

53 plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")

54 ax.tick_params(axis = "y", which = "minor")

55 ax.tick_params(axis = "x", which = "minor")

56 ax.set_axisbelow(True)

57 plt.tight_layout()

58

59 plt.savefig(prefix + ".png")

60else:

61 sys.exit("Ambiguous file list exists.")

Visualization of the example output ⛓

Test:: test_pm_clustering

Final Remarks ⛓

If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Copyright: Computational Data Science Lab If you use or redistribute ideas based on this generic interface implementation, you should also cite the original k-means++ article:
Arthur, D.; Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding

Author:: Amir Shahmoradi, September 1, 2012, 12:00 AM, National Institute for Fusion Studies, The University of Texas at Austin

Definition at line 787 of file pm_clustering.F90.

The documentation for this interface was generated from the following file:

src/fortran/main/pm_clustering.F90