Compute and return the memberships and minimum distances of a set of input points with respect to the an input set of cluster centers.
More...
Compute and return the memberships and minimum distances of a set of input points with respect to the an input set of cluster centers.
The membership
ID of the i
th sample sample(:,i)
is determined by computing the distance of the sample from each cluster and choosing the cluster ID j
whose center(:,j)
has the minimum distance from the sample among all clusters.
This minimum squared distance is output in disq(:,i)
.
The metric used within this generic interface is the Euclidean distance.
- Parameters
-
[in,out] | membership | : The output (or input/output) scalar or vector of shape (1:nsam) of type integer of default kind IK, containing the membership of each input sample in sample from its nearest cluster center , such that cluster(membership(i)) is the nearest cluster center to the i th sample sample(:, i) at a squared-distance of disq(i) .
-
If the optional input argument
changed is missing, then membership has intent(out) .
-
If the optional input argument
changed is present, then membership has intent(inout) .
On input, membership must contain the old cluster membership of the input sample.
|
[out] | disq | : The output scalar or vector of shape (1:nsam) of the same type and kind as the input argument sample , containing the Euclidean squared distance of each input sample in sample from its nearest cluster center .
|
[in] | sample | : The input scalar, vector, or matrix of,
-
type
real of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128),
containing the sample of nsam points in a ndim -dimensional space whose memberships and minimum distances with respect to the input center s must be computed.
-
If
sample is a scalar and center is a vector of shape (1 : ncls) , then the input sample must be the coordinate of a single sample in (univariate space) whose distance from ncls cluster center s must be computed.
-
If
sample is a vector of shape (1 : ndim) and center is a matrix of shape (1 : ndim, 1 : ncls) , then the input sample must be a single sample (in ndim -dimensional space) whose distance from ncls cluster center s must be computed.
-
If
sample is a vector of shape (1 : nsam) and center is a vector of shape (1 : ncls) , then the input sample must be a collection of nsam points (in univariate space) whose distances from ncls cluster center s must be computed.
-
If
sample is a matrix of shape (1 : ndim, 1 : nsam) and center is a matrix of shape (1 : ndim, 1 : ncls) , then the input sample must be a collection of nsam points (in ndim -dimensional space) whose distances from ncls cluster center s must be computed.
|
[in] | center | : The input vector of shape (1:ncls) or matrix of shape (1 : ndim, 1 : ncls) of the same type and kind as the input argument sample , containing the set of ncls cluster centers (centroids) with respect to which the sample memberships and minimum distances must be computed.
|
[out] | changed | : The output scalar of type logical of default kind LK that is .false. if and only if the input values for membership and disq do not change *for any** of the input sample .
In other words, a single membership update is sufficient to set the value of changed to .true. on output.
(optional. If missing, the arguments membership and disq have intent(out) and both will be computed afresh.) |
Possible calling interfaces ⛓
call setMember(membership , disq , sample , center(
1 : ncls))
call setMember(membership(
1 : nsam) , disq(
1 : nsam), sample(
1 : nsam) , center(
1 : ncls))
call setMember(membership , disq , sample(
1 : ndim) , center(
1 : ndim,
1 : ncls))
call setMember(membership(
1 : nsam) , disq(
1 : nsam), sample(
1 : ndim,
1 : nsam), center(
1 : ndim,
1 : ncls))
call setMember(membership , disq , sample , center(
1 : ncls), changed)
call setMember(membership(
1 : nsam) , disq(
1 : nsam), sample(
1 : nsam) , center(
1 : ncls), changed)
call setMember(membership , disq , sample(
1 : ndim) , center(
1 : ndim,
1 : ncls), changed)
call setMember(membership(
1 : nsam) , disq(
1 : nsam), sample(
1 : ndim,
1 : nsam), center(
1 : ndim,
1 : ncls), changed)
Compute and return the memberships and minimum distances of a set of input points with respect to the...
This module contains procedures and routines for the computing the Kmeans clustering of a given set o...
- Warning
- The condition
ubound(center, rank(center)) > 0
must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(disq, 1)
must hold for the corresponding input arguments.
The condition ubound(sample, rank(sample)) == ubound(membership, 1)
must hold for the corresponding input arguments.
The condition ubound(sample, 1) == ubound(center, 1) .or. rank(sample) == 0 .or. rank(center) == 1
must hold for the corresponding input arguments.
These conditions are verified only if the library is built with the preprocessor macro CHECK_ENABLED=1
.
-
The
pure
procedure(s) documented herein become impure
when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1
.
By default, these procedures are pure
in release
build and impure
in debug
and testing
builds.
- See also
- setKmeans
setCenter
setMember
setKmeansPP
Example usage ⛓
13 logical(LK) :: changed
14 integer(IK) :: ndim, nsam, ncls
15 real(RKG) ,
allocatable :: sample(:,:), center(:,:), disq(:)
16 integer(IK) ,
allocatable :: membership(:)
17 type(display_type) :: disp
22 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
23 call disp%show(
"! Compute memberships of a sample of points from arbitrary dimensional cluster centers.")
24 call disp%show(
"!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
28 call disp%show(
"ndim = getUnifRand(1, 5); nsam = getUnifRand(1, 5); ncls = getUnifRand(1, 5);")
32 call disp%show(
"center = getUnifRand(0., 5., ndim, ncls) ! initialize random centers.")
36 call disp%show(
"sample = getUnifRand(0., 5., ndim, nsam) ! Create a random sample.")
40 call disp%show(
"call setResized(disq, nsam)")
42 call disp%show(
"call setResized(membership, nsam)")
46 call disp%show(
"call setMember(membership, disq, sample, center) ! sample points memberships.")
47 call setMember(membership, disq, sample, center)
54 call disp%show(
"call setMember(membership, disq, sample, center, changed) ! sample points memberships.")
55 call setMember(membership, disq, sample, center, changed)
64 call disp%show(
"call setMember(membership(1), disq(1), sample(:,1), center) ! single point membership.")
65 call setMember(membership(
1), disq(
1), sample(:,
1), center)
72 call disp%show(
"call setMember(membership(1), disq(1), sample(:,1), center, changed) ! single point membership.")
73 call setMember(membership(
1), disq(
1), sample(:,
1), center, changed)
82 call disp%show(
"call setMember(membership, disq, sample(1,:), center(1,:)) ! sample points memberships in one-dimension.")
83 call setMember(membership, disq, sample(
1,:), center(
1,:))
90 call disp%show(
"call setMember(membership, disq, sample(1,:), center(1,:), changed) ! sample points memberships in one-dimension.")
91 call setMember(membership, disq, sample(
1,:), center(
1,:), changed)
100 call disp%show(
"call setMember(membership(1), disq(1), sample(1,1), center(1,:)) ! single point membership in one-dimension.")
101 call setMember(membership(
1), disq(
1), sample(
1,
1), center(
1,:))
108 call disp%show(
"call setMember(membership(1), disq(1), sample(1,1), center(1,:), changed) ! single point membership in one-dimension.")
109 call setMember(membership(
1), disq(
1), sample(
1,
1), center(
1,:), changed)
123 integer(IK) :: funit, i
131 call setMember(membership, disq, sample, center)
132 call setMember(membership, disq, sample, center, changed)
133 open(newunit
= funit, file
= "setMember.center.txt")
135 write(funit,
"(*(g0,:,','))") i, center(:,i)
138 open(newunit
= funit, file
= "setMember.sample.txt")
140 write(funit,
"(*(g0,:,','))") membership(i), sample(:,i)
Generate minimally-spaced character, integer, real sequences or sequences at fixed intervals of size ...
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Generate and return a scalar or a contiguous array of rank 1 of length s1 of randomly uniformly distr...
This is a generic method of the derived type display_type with pass attribute.
This is a generic method of the derived type display_type with pass attribute.
This module contains procedures and generic interfaces for generating ranges of discrete character,...
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
This module contains classes and procedures for computing various statistical quantities related to t...
This module contains classes and procedures for input/output (IO) or generic display operations on st...
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Generate and return an object of type display_type.
Example Unix compile command via Intel ifort
compiler ⛓
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example Windows Batch compile command via Intel ifort
compiler ⛓
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
Example Unix / MinGW compile command via GNU gfortran
compiler ⛓
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example output ⛓
12+3.31561542,
+2.40937471
13+2.37242699,
+3.82513094
14+1.21023655,
+2.40704751
15+4.88812494,
+0.415560901
25call setMember(membership, disq, sample, center)
31call setMember(membership, disq, sample, center, changed)
39call setMember(membership(
1), disq(
1), sample(:,
1), center)
45call setMember(membership(
1), disq(
1), sample(:,
1), center, changed)
53call setMember(membership, disq, sample(
1,:), center(
1,:))
59call setMember(membership, disq, sample(
1,:), center(
1,:), changed)
67call setMember(membership(
1), disq(
1), sample(
1,
1), center(
1,:))
73call setMember(membership(
1), disq(
1), sample(
1,
1), center(
1,:), changed)
Postprocessing of the example output ⛓
3import matplotlib.pyplot
as plt
11fig = plt.figure(figsize = 1.25 * np.array([6.4, 4.8]), dpi = 200)
14parent = os.path.basename(os.path.dirname(__file__))
15pattern = parent +
"*.txt"
17fileList = glob.glob(pattern)
22 kind = file.split(
".")[1]
23 prefix = file.split(
".")[0]
24 df = pd.read_csv(file, delimiter =
",", header =
None)
27 ax.scatter ( df.values[:, 1]
34 legends.append(
"center")
35 elif kind ==
"sample":
36 ax.scatter ( df.values[:, 1]
41 legends.append(
"sample")
43 sys.exit(
"Ambiguous file exists: {}".format(file))
45 ax.legend(legends, fontsize = fontsize)
46 plt.xticks(fontsize = fontsize - 2)
47 plt.yticks(fontsize = fontsize - 2)
48 ax.set_xlabel(
"X", fontsize = 17)
49 ax.set_ylabel(
"Y", fontsize = 17)
50 ax.set_title(
"Membership Scatter Plot", fontsize = fontsize)
53 plt.grid(visible =
True, which =
"both", axis =
"both", color =
"0.85", linestyle =
"-")
54 ax.tick_params(axis =
"y", which =
"minor")
55 ax.tick_params(axis =
"x", which =
"minor")
56 ax.set_axisbelow(
True)
59 plt.savefig(prefix +
".png")
61 sys.exit(
"Ambiguous file list exists.")
Visualization of the example output ⛓
- Benchmarks:
Benchmark :: The runtime performance of setMember for external membership change verification vs. in place verification by the algorithm. ⛓
12 integer(IK) :: fileUnit
13 integer(IK) ,
parameter :: nsam
= 1000
14 integer(IK) ,
parameter :: ndim
= 3_IK
15 integer(IK) ,
parameter :: ncls
= 20_IK
16 integer(IK) :: membership(nsam)
17 real(RKG) :: sample(ndim, nsam)
18 real(RKG) :: center(ndim, ncls)
19 real(RKG) :: disq(nsam)
20 type(bench_type),
allocatable :: bench(:)
21 real(RKG) :: mean(ndim)
22 integer(IK) :: idum
= 0
23 logical(LK) :: mchanged
25 bench
= [
bench_type(name
= SK_
"default", exec
= default, overhead
= setOverhead)
&
26 ,
bench_type(name
= SK_
"changed", exec
= changed, overhead
= setOverhead)
&
29 write(
*,
"(*(g0,:,' '))")
30 write(
*,
"(*(g0,:,' '))")
"membership benchmarking..."
31 write(
*,
"(*(g0,:,' '))")
33 open(newunit
= fileUnit, file
= "main.out", status
= "replace")
35 write(fileUnit,
"(*(g0,:,','))")
"ClusterCount", (bench(ibench)
%name, ibench
= 1,
size(bench))
39 write(
*,
"(*(g0,:,' '))")
"Benchmarking default() vs. changed()", nsam
41 call random_number(disq)
42 call random_number(center)
43 call random_number(sample)
45 do ibench
= 1,
size(bench)
46 bench(ibench)
%timing
= bench(ibench)
%getTiming()
49 write(fileUnit,
"(*(g0,:,','))") icls, (bench(ibench)
%timing
%mean, ibench
= 1,
size(bench))
52 write(
*,
"(*(g0,:,' '))") idum
62 subroutine setOverhead()
63 if (mchanged) idum
= idum
+ 1
68 integer(IK) :: membersnew(
size(membership))
69 call setMember(membersnew, disq, sample, center(:,
1 : icls))
70 mchanged
= all(membersnew
== membership)
75 call setMember(membership, disq, sample, center(:,
1 : icls), mchanged)
Generate and return an object of type timing_type containing the benchmark timing information and sta...
Return a uniform random scalar or contiguous array of arbitrary rank of randomly uniformly distribute...
This module contains abstract interfaces and types that facilitate benchmarking of different procedur...
integer, parameter RKD
The double precision real kind in Fortran mode. On most platforms, this is an 64-bit real kind.
This is the class for creating benchmark and performance-profiling objects.
Example Unix compile command via Intel ifort
compiler ⛓
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Example Windows Batch compile command via Intel ifort
compiler ⛓
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
Example Unix / MinGW compile command via GNU gfortran
compiler ⛓
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
Postprocessing of the benchmark output ⛓
3import matplotlib.pyplot
as plt
8dirname = os.path.basename(os.getcwd())
12df = pd.read_csv(
"main.out", delimiter =
",")
13colnames = list(df.columns.values)
19ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
22for colname
in colnames[1:]:
23 plt.plot( df[colnames[0]].values
28plt.xticks(fontsize = fontsize)
29plt.yticks(fontsize = fontsize)
30ax.set_xlabel(colnames[0], fontsize = fontsize)
31ax.set_ylabel(
"Runtime [ seconds ]", fontsize = fontsize)
32ax.set_title(
" vs. ".join(colnames[1:])+
"\nLower is better.", fontsize = fontsize)
36plt.grid(visible =
True, which =
"both", axis =
"both", color =
"0.85", linestyle =
"-")
37ax.tick_params(axis =
"y", which =
"minor")
38ax.tick_params(axis =
"x", which =
"minor")
39ax.legend ( colnames[1:]
46plt.savefig(
"benchmark." + dirname +
".runtime.png")
52ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
55plt.plot( df[colnames[0]].values
56 , np.ones(len(df[colnames[0]].values))
61for colname
in colnames[2:]:
62 plt.plot( df[colnames[0]].values
63 , df[colname].values / df[colnames[1]].values
67plt.xticks(fontsize = fontsize)
68plt.yticks(fontsize = fontsize)
69ax.set_xlabel(colnames[0], fontsize = fontsize)
70ax.set_ylabel(
"Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
71ax.set_title(
"Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}().".format(colnames[1]), fontsize = fontsize)
75plt.grid(visible =
True, which =
"both", axis =
"both", color =
"0.85", linestyle =
"-")
76ax.tick_params(axis =
"y", which =
"minor")
77ax.tick_params(axis =
"x", which =
"minor")
78ax.legend ( colnames[1:]
85plt.savefig(
"benchmark." + dirname +
".runtime.ratio.png")
Visualization of the benchmark output ⛓
Benchmark moral ⛓ The procedures under the generic interface setMember compute the new membership under two different scenarios.
-
When the input argument
changed
is missing, the memberships are computed afresh and no comparison with any old membership values is done by the algorithm.
Consequently, any such comparisons would have to be done externally to the procedure by the user, which requires another pass over the membership values for a comparison.
-
When the input argument
changed
is present, the memberships are computed afresh but before overwriting the old values, they are compared against them to detect if any memberships change at all and if so, changed = .true.
on output.
From the benchmark results above, it appears that when membership comparison with old values is desired, letting the procedure to perform the comparison (by presenting the optional argument changed
) is significantly slower than an external comparison of memberships by the user.
However, the difference is relevant only for small number of clusters involved (1 : 4) and the difference become negligible for more number of clusters.
Where is this situation relevant? Within the Kmeans algorithm this situation repeatedly occurs.
Additionally, note that the above benchmark does not include the cost of maintaining two (old and new) copies of cluster memberships which could potentially lead to additional allocation costs if it happens repeatedly, e.g., within the Kmeans algorithm.
- Test:
- test_pm_clusKmeans
Final Remarks ⛓
If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.
-
If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
-
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.
This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.
- Copyright
- Computational Data Science Lab
- Author:
- Amir Shahmoradi, September 1, 2012, 12:00 AM, National Institute for Fusion Studies, The University of Texas Austin
Definition at line 249 of file pm_clusKmeans.F90.