ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_clusKmeans::setCenter Interface Reference

Compute and return the centers of the clusters corresponding to the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.
More...

Detailed Description

Compute and return the centers of the clusters corresponding to the input sample, cluster membership IDs, and sample distances-squared from their corresponding cluster centers.

This generic interface is the second step in the iterative process of refining a set of initial cluster centers toward a final set of clusters and their members.
As such, this generic interface is of little use with invoking setMember beforehand.
The metric used within this generic interface is the Euclidean distance.

Parameters
[in]membership: The input vector of shape (1:nsam) of type integer of default kind IK, containing the membership of each input sample in sample from its nearest cluster center, such that cluster(membership(i)) is the nearest cluster center to the ith sample sample(:, i) at a squared-distance of disq(i).
[in]disq: The input vector of shape (1:nsam) of the same type and kind as the input argument sample, containing the Euclidean squared distance of each input sample in sample from its nearest cluster center.
[in]sample: The input vector, or matrix of,
  1. type real of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128),
containing the sample of nsam points in a ndim-dimensional space whose corresponding cluster centers must be computed.
  1. If sample is a vector of shape (1 : nsam) and center is a vector of shape (1 : ncls), then the input sample must be a collection of nsam points (in univariate space).
  2. If sample is a matrix of shape (1 : ndim, 1 : nsam) and center is a matrix of shape (1 : ndim, 1 : ncls), then the input sample must be a collection of nsam points (in ndim-dimensional space).
[out]center: The output vector of shape (1:ncls) or matrix of shape (1 : ndim, 1 : ncls) of the same type and kind as the input argument sample, containing the set of ncls cluster centers (centroids) computed based on the input sample memberships and minimum distances.
[out]size: The output vector of shape (1:ncls) type integer of default kind IK, containing the sizes (number of members) of the clusters with the corresponding centers output in the argument center.
[out]potential: The output vector of shape (1:ncls) of the same type and kind as the input argument sample, the ith element of which contains the sum of squared distances of all members of the ith cluster from the cluster center as output in the ith element of center.


Possible calling interfaces

call setCenter(sample(1 : nsam) , membership(1 : nsam) , disq(1 : nsam), center(1 : ncls) , size(1 : ncls), potential(1 : ncls))
call setCenter(sample(1 : ndim, 1 : nsam) , membership(1 : nsam) , disq(1 : nsam), center(1 : ndim, 1 : ncls), size(1 : ncls), potential(1 : ncls))
Compute and return the centers of the clusters corresponding to the input sample, cluster membership ...
This module contains procedures and routines for the computing the Kmeans clustering of a given set o...
Warning
The condition ubound(center, rank(center)) > 0 must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(disq, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(size, 1) must hold for the corresponding input arguments.
The condition ubound(center, rank(center)) == ubound(potential, 1) must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(membership, 1) must hold for the corresponding input arguments.
The condition ubound(sample, 1) == ubound(center, 1) .or. (rank(sample) == 1 .and. rank(center) == 1) must hold for the corresponding input arguments.
The condition all(0 < membership .and. membership <= ubound(center, rank(center))) must hold for the corresponding input arguments.
The condition all(0 <= disq) must hold for the corresponding input arguments.
These conditions are verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.
The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds.
Remarks
This generic interface implements one of the two major steps involved in Kmeans clustering.
See also
setKmeans
setCenter
setMember
setKmeansPP


Example usage

1program example
2
3 use pm_kind, only: SK, IK, LK
4 use pm_kind, only: RKG => RKS ! all other real kinds are also supported.
5 use pm_io, only: display_type
6 use pm_distUnif, only: getUnifRand
8 use pm_clusKmeans, only: setCenter
9 use pm_clusKmeans, only: setMember
10
11 implicit none
12
13 integer(IK) :: ndim, nsam, ncls
14 real(RKG) , allocatable :: sample(:,:), center(:,:), disq(:), potential(:)
15 integer(IK) , allocatable :: membership(:), size(:)
16 type(display_type) :: disp
17
18 disp = display_type(file = "main.out.F90")
19
20 call disp%skip
21 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
22 call disp%show("! Compute cluster centers based on an input sample and cluster memberships and member-center distances.")
23 call disp%show("!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
24 call disp%skip
25
26 call disp%skip()
27 call disp%show("ndim = getUnifRand(1, 5); nsam = getUnifRand(1, 5); ncls = getUnifRand(1, 5);")
28 ndim = getUnifRand(1, 5); nsam = getUnifRand(1, 5); ncls = getUnifRand(1, 5);
29 call disp%show("[ndim, nsam, ncls]")
30 call disp%show( [ndim, nsam, ncls] )
31 call disp%show("center = getUnifRand(0., 5., ndim, ncls) ! initialize random centers.")
32 center = getUnifRand(0., 5., ndim, ncls) ! initialize random centers.
33 call disp%show("center")
34 call disp%show( center )
35 call disp%show("sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.")
36 sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.
37 call disp%show("sample")
38 call disp%show( sample )
39 call disp%show("call setResized(disq, nsam)")
40 call setResized(disq, nsam)
41 call disp%show("call setResized(membership, nsam)")
42 call setResized(membership, nsam)
43 call disp%show("call setResized(potential, ncls)")
44 call setResized(potential, ncls)
45 call disp%show("call setResized(size, ncls)")
46 call setResized(size, ncls)
47 call disp%show("call setMember(membership, disq, sample, center) ! get sample points memberships.")
48 call setMember(membership, disq, sample, center) ! get sample points memberships.
49 call disp%skip()
50
51 call disp%show("call setCenter(membership, disq, sample, center, size, potential) ! now compute the new clusters.")
52 call setCenter(membership, disq, sample, center, size, potential) ! now compute the new clusters.
53 call disp%show("size")
54 call disp%show( size )
55 call disp%show("center")
56 call disp%show( center )
57 call disp%show("potential")
58 call disp%show( potential )
59 call disp%skip()
60
61 call disp%show("call setCenter(membership, disq, sample(1,:), center(1,:), size, potential) ! sample points memberships in one-dimension.")
62 call setCenter(membership, disq, sample(1,:), center(1,:), size, potential) ! sample points memberships in one-dimension.
63 call disp%show("size")
64 call disp%show( size )
65 call disp%show("center(1,:)")
66 call disp%show( center(1,:) )
67 call disp%show("potential")
68 call disp%show( potential )
69 call disp%skip()
70
71 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
72 ! Output an example for visualization.
73 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
74
75 block
76 integer(IK) :: funit, i
77 ndim = 2
78 ncls = 5
79 nsam = 5000
80 center = getUnifRand(0., 1., ndim, ncls)
81 sample = getUnifRand(0., 1., ndim, nsam)
82 call setResized(disq, nsam)
83 call setResized(membership, nsam)
84 call setResized(potential, ncls)
85 call setResized(size, ncls)
86 ! output the sample and the initial centers.
87 call setMember(membership, disq, sample, center)
88 open(newunit = funit, file = "setMember.center.txt")
89 do i = 1, ncls
90 write(funit, "(*(g0,:,','))") i, center(:,i)
91 end do
92 close(funit)
93 open(newunit = funit, file = "setMember.sample.txt")
94 do i = 1, nsam
95 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)
96 end do
97 close(funit)
98 call setCenter(membership, disq, sample, center, size, potential)
99 open(newunit = funit, file = "setCenter.center.txt")
100 do i = 1, ncls
101 write(funit, "(*(g0,:,','))") i, center(:,i)
102 end do
103 close(funit)
104 open(newunit = funit, file = "setCenter.sample.txt")
105 do i = 1, nsam
106 write(funit, "(*(g0,:,','))") membership(i), sample(:,i)
107 end do
108 close(funit)
109 end block
110
111end program example
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Compute and return the memberships and minimum distances of a set of input points with respect to the...
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
This module contains classes and procedures for computing various statistical quantities related to t...
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example output
1
2!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3! Compute cluster centers based on an input sample and cluster memberships and member-center distances.
4!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5
6
7ndim = getUnifRand(1, 5); nsam = getUnifRand(1, 5); ncls = getUnifRand(1, 5);
8[ndim, nsam, ncls]
9+1, +3, +1
10center = getUnifRand(0., 5., ndim, ncls) ! initialize random centers.
11center
12+1.28157318
13sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.
14sample
15+3.64694548, +1.80137849, +2.11540985
16call setResized(disq, nsam)
17call setResized(membership, nsam)
18call setResized(potential, ncls)
19call setResized(size, ncls)
20call setMember(membership, disq, sample, center) ! get sample points memberships.
21
22call setCenter(membership, disq, sample, center, size, potential) ! now compute the new clusters.
23size
24+3
25center
26+2.52124476
27potential
28+6.56046629
29
30call setCenter(membership, disq, sample(1,:), center(1,:), size, potential) ! sample points memberships in one-dimension.
31size
32+3
33center(1,:)
34+2.52124476
35potential
36+6.56046629
37
38

Postprocessing of the example output
1#!/usr/bin/env python
2
3import matplotlib.pyplot as plt
4import pandas as pd
5import numpy as np
6import glob
7import sys
8
9
10fontsize = 17
11
12prefixes = { "setMember" : "Memberships based on initial centers."
13 , "setCenter" : "Inferred new centers based on memberships."
14 }
15for prefix in list(prefixes.keys()):
16
17 legends = []
18 fig = plt.figure(figsize = 1.25 * np.array([6.4, 4.8]), dpi = 200)
19 ax = plt.subplot()
20
21 fileList = glob.glob(prefix + "*.txt")
22 if len(fileList) == 2:
23 for file in fileList:
24
25 kind = file.split(".")[1]
26 df = pd.read_csv(file, delimiter = ",", header = None)
27
28 if kind == "center":
29 ax.scatter ( df.values[:, 1]
30 , df.values[:,2]
31 , zorder = 100
32 , marker = "*"
33 , c = "red"
34 , s = 50
35 )
36 legends.append("center")
37 elif kind == "sample":
38 ax.scatter ( df.values[:, 1]
39 , df.values[:,2]
40 , c = df.values[:, 0]
41 , s = 10
42 )
43 legends.append("sample")
44 else:
45 sys.exit("Ambiguous file exists: {}".format(file))
46
47 ax.legend(legends, fontsize = fontsize)
48 plt.xticks(fontsize = fontsize - 2)
49 plt.yticks(fontsize = fontsize - 2)
50 ax.set_xlabel("X", fontsize = 17)
51 ax.set_ylabel("Y", fontsize = 17)
52 ax.set_title(prefixes[prefix], fontsize = fontsize)
53
54 plt.axis('equal')
55 plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
56 ax.tick_params(axis = "y", which = "minor")
57 ax.tick_params(axis = "x", which = "minor")
58 ax.set_axisbelow(True)
59 plt.tight_layout()
60
61 plt.savefig(prefix + ".png")
62 else:
63 sys.exit("Ambiguous file list exists.")

Visualization of the example output
Test:
test_pm_clusKmeans


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, September 1, 2012, 12:00 AM, National Institute for Fusion Studies, The University of Texas Austin

Definition at line 904 of file pm_clusKmeans.F90.


The documentation for this interface was generated from the following file: