ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_matrixTrans::setMatTrans Interface Reference

Generate and return the transpose of the input matrix of arbitrary type and kind using a cache-oblivious approach.
More...

Detailed Description

Generate and return the transpose of the input matrix of arbitrary type and kind using a cache-oblivious approach.

In computing, a cache-oblivious (or cache-transcendent) algorithm is a method designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter.
An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally.
Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes.
Cache-oblivious algorithms are contrasted with explicit loop tiling, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

Typically, a cache-oblivious algorithm works by a recursive divide-and-conquer algorithm, where the problem is divided into smaller and smaller subproblems.
Eventually, one reaches a subproblem size that fits into the cache, regardless of the cache size.
For example, an optimal cache-oblivious matrix multiplication is obtained by recursively dividing each matrix into four sub-matrices to be multiplied, multiplying the submatrices in a depth-first fashion.
In tuning for a specific machine, one may use a hybrid algorithm which uses loop tiling tuned for the specific cache sizes at the bottom level but otherwise uses the cache-oblivious algorithm.

Parameters
[in,out]source: The input/output matrix (of rank 2) of either
  1. type character of kind any supported by the processor (e.g., SK, SKA, SKD , or SKU) or,
  2. type integer of kind any supported by the processor (e.g., IK, IK8, IK16, IK32, or IK64) or,
  3. type logical of kind any supported by the processor (e.g., LK) or,
  4. type complex of kind any supported by the processor (e.g., CK, CK32, CK64, or CK128) or,
  5. type real of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128) or,
whose contents will be Symmetric or Hermitian transposed.
  1. If the output matrix argument destin is missing, the result of transposition will be written to source.
    This is possible only if the input source is a square matrix.
  2. If the output matrix argument destin is present, the result of transposition will be written to destin.
    As such, the input source has intent(in) will not be modified by the algorithm.
[out]destin: The output matrix of the same type and kind, but transposed shape of source containing the transposition. (optional. If missing, the transposition result will be written to the input source, in which case, source must be square.)
[in]bsize: The input positive scalar integer of default kind IK representing the minimum submatrix size.
Any input source or subset of it whose size along both dimensions is below bsize will be transposed via the default Fortran transpose() procedure.
(optional. default = 32)
[in]operation: The input scalar that can be,
  1. the constant transHerm exclusively when source is of type complex of kind any supported by the processor (e.g., CK, CK32, CK64, or CK128). implying that a Hermitian transpose of the specified subset of source is to be computed and stored.
This argument is merely a convenience to differentiate the different procedure functionalities within this generic interface.
(optional. If missing, the Symmetric transposition will be returned for complex matrices.)


Possible calling interfaces

call setMatTrans(source(1:ndim,1:ndim))
call setMatTrans(source(1:ndim,1:ndim), bsize)
call setMatTrans(source(1:ndim,1:ndim), operation)
call setMatTrans(source(1:ndim,1:ndim), operation, bsize)
call setMatTrans(source(1:nrow,1:ncol), destin(1:ncol,1:nrow))
call setMatTrans(source(1:nrow,1:ncol), destin(1:ncol,1:nrow), bsize)
call setMatTrans(source(1:nrow,1:ncol), destin(1:ncol,1:nrow), operation)
call setMatTrans(source(1:nrow,1:ncol), destin(1:ncol,1:nrow), operation, bsize)
Generate and return the transpose of the input matrix of arbitrary type and kind using a cache-oblivi...
This module contains abstract and concrete derived types and procedures related to various common mat...
type(transHerm_type), parameter transHerm
This is a scalar parameter object of type transHerm_type that is exclusively used to request Hermitia...
Warning
The condition 0 < bsize must hold for the corresponding input arguments.
The condition size(source, 1) == size(source, 2) must hold when the output argument destin is missing.
The condition size(source, 1) == size(destin, 2) .and. size(source, 2) == size(destin, 1) must hold for the corresponding input arguments.
The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds.
Remarks
Based on some relevant benchmarks performed, the contiguous attribute for the input arguments does not appear to have any noticeable impact on the performance of the algorithm in the release optimized compilation modes.
See also
pm_matrixCopy


Example usage

1program example
2
3 use pm_kind, only: SK, IK, LK, CK, RK
5 use pm_distUnif, only: setUnifRand
7 use pm_io, only: display_type
8
9 implicit none
10
11 type(display_type) :: disp
12 disp = display_type(file = "main.out.F90")
13
14 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
16 block
17
18 character(2) :: matA(5,10), matB(10,5)
19
20 call disp%skip()
21 call disp%show("call setUnifRand(matA, 'AA', 'ZZ')")
22 call setUnifRand(matA, 'AA', 'ZZ')
23 call disp%show("matA")
24 call disp%show( matA , deliml = SK_"""" )
25 call disp%show("call setMatTrans(matA, matB)")
26 call setMatTrans(matA, matB)
27 call disp%show("matB")
28 call disp%show( matB , deliml = SK_"""" )
29 call disp%skip()
30
31 end block
32
33 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
34
35 block
36
37 character(2) :: matA(10,10)
38
39 call disp%skip()
40 call disp%show("call setUnifRand(matA, 'AA', 'ZZ')")
41 call setUnifRand(matA, 'AA', 'ZZ')
42 call disp%show("matA")
43 call disp%show( matA , deliml = SK_"""" )
44 call disp%show("call setMatTrans(matA)")
45 call setMatTrans(matA)
46 call disp%show("matA")
47 call disp%show( matA , deliml = SK_"""" )
48 call disp%skip()
49
50 end block
51
52 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
53
54end program example
Return a uniform random scalar or contiguous array of arbitrary rank of randomly uniformly distribute...
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508
This module contains classes and procedures for computing various statistical quantities related to t...
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268
integer, parameter RK
The default real kind in the ParaMonte library: real64 in Fortran, c_double in C-Fortran Interoperati...
Definition: pm_kind.F90:543
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541
integer, parameter CK
The default complex kind in the ParaMonte library: real64 in Fortran, c_double_complex in C-Fortran I...
Definition: pm_kind.F90:542
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539
This module contains abstract and concrete derived types that are required for compile-time resolutio...
type(lowDia_type), parameter lowDia
This is a scalar parameter object of type lowDia_type that is exclusively used to request lower-diago...
type(uppLowDia_type), parameter uppLowDia
This is a scalar parameter object of type uppLowDia_type that is exclusively used to request full dia...
type(uppLow_type), parameter uppLow
This is a scalar parameter object of type uppLow_type that is exclusively used to request upper-lower...
type(uppDia_type), parameter uppDia
This is a scalar parameter object of type uppDia_type that is exclusively used to request upper-diago...
type(dia_type), parameter dia
This is a scalar parameter object of type dia_type that is exclusively used to request unit (or Ident...
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example output
1
2call setUnifRand(matA, 'AA', 'ZZ')
3matA
4"WQ", "YF", "IQ", "VL", "QH", "SO", "LM", "WX", "VU", "ZM"
5"AQ", "PC", "HE", "ET", "BZ", "CZ", "LW", "OC", "PD", "DF"
6"VM", "XV", "BK", "DX", "ID", "EP", "GW", "VR", "AH", "DS"
7"FO", "IP", "DO", "IM", "RG", "WA", "XB", "WL", "TU", "LU"
8"JJ", "CU", "GW", "PU", "CJ", "YQ", "LF", "KA", "GM", "GS"
9call setMatTrans(matA, matB)
10matB
11"WQ", "AQ", "VM", "FO", "JJ"
12"YF", "PC", "XV", "IP", "CU"
13"IQ", "HE", "BK", "DO", "GW"
14"VL", "ET", "DX", "IM", "PU"
15"QH", "BZ", "ID", "RG", "CJ"
16"SO", "CZ", "EP", "WA", "YQ"
17"LM", "LW", "GW", "XB", "LF"
18"WX", "OC", "VR", "WL", "KA"
19"VU", "PD", "AH", "TU", "GM"
20"ZM", "DF", "DS", "LU", "GS"
21
22
23call setUnifRand(matA, 'AA', 'ZZ')
24matA
25"HO", "FX", "KW", "XR", "BD", "QD", "LA", "SM", "CI", "YT"
26"PQ", "FZ", "SH", "BE", "FS", "JW", "VY", "FU", "CM", "BS"
27"VJ", "OA", "HV", "DM", "MM", "XS", "MH", "EG", "VF", "NE"
28"IQ", "DW", "HW", "DV", "JH", "OK", "KG", "KL", "KB", "BW"
29"IE", "KL", "MV", "LJ", "IH", "ZD", "YM", "WF", "DY", "IN"
30"RZ", "OC", "FR", "JI", "PB", "WC", "IV", "MA", "PM", "XA"
31"MO", "SG", "JA", "UV", "BU", "VS", "MV", "WG", "MD", "VF"
32"SU", "EO", "RL", "PB", "MJ", "XN", "BG", "SN", "PG", "HH"
33"LT", "ZQ", "FL", "LQ", "RF", "GV", "YD", "VE", "IM", "HN"
34"HD", "YH", "GE", "PY", "GX", "VF", "TT", "PC", "SA", "EO"
35call setMatTrans(matA)
36matA
37"HO", "PQ", "VJ", "IQ", "IE", "RZ", "MO", "SU", "LT", "HD"
38"FX", "FZ", "OA", "DW", "KL", "OC", "SG", "EO", "ZQ", "YH"
39"KW", "SH", "HV", "HW", "MV", "FR", "JA", "RL", "FL", "GE"
40"XR", "BE", "DM", "DV", "LJ", "JI", "UV", "PB", "LQ", "PY"
41"BD", "FS", "MM", "JH", "IH", "PB", "BU", "MJ", "RF", "GX"
42"QD", "JW", "XS", "OK", "ZD", "WC", "VS", "XN", "GV", "VF"
43"LA", "VY", "MH", "KG", "YM", "IV", "MV", "BG", "YD", "TT"
44"SM", "FU", "EG", "KL", "WF", "MA", "WG", "SN", "VE", "PC"
45"CI", "CM", "VF", "KB", "DY", "PM", "MD", "PG", "IM", "SA"
46"YT", "BS", "NE", "BW", "IN", "XA", "VF", "HH", "HN", "EO"
47
48
Benchmarks:


Benchmark :: The runtime performance of setMatTrans vs. Fortran intrinsic transpose().

1#define MatB_ENABLED 0
2! Test the performance of `transpose()` vs. `setMatTrans()`.
3program benchmark
4
5 use iso_fortran_env, only: error_unit
6 use pm_kind, only: IK, RKG => RK, RK, SK
7 use pm_distUnif, only: setUnifRand
8 use pm_bench, only: bench_type
9
10 implicit none
11
12 integer(IK) :: i
13 integer(IK) :: fileUnit
14 integer(IK) :: rank, irank
15 integer(IK) , parameter :: NRANK = 20_IK
16 integer(IK) , parameter :: NBENCH = 2_IK
17 real(RKG) :: dummySum = 0._RKG
18 real(RKG) , allocatable :: matA(:,:)
19#if MatB_ENABLED
20 real(RKG) , allocatable :: matB(:,:)
21#endif
22 type(bench_type) :: bench(NBENCH)
23
24 bench(1) = bench_type(name = SK_"setMatTrans", exec = setMatTrans , overhead = setOverhead)
25 bench(2) = bench_type(name = SK_"transpose", exec = transpose , overhead = setOverhead)
26
27
28 write(*,"(*(g0,:,' '))")
29 write(*,"(*(g0,:,' '))") "setMatTrans() vs. transpose()"
30 write(*,"(*(g0,:,' '))")
31
32 open(newunit = fileUnit, file = "main.out", status = "replace")
33
34 write(fileUnit, "(*(g0,:,', '))") "MatrixRank", (bench(i)%name, i = 1, NBENCH)
35
36 loopOverMatrixRank: do irank = 1, NRANK
37
38 rank = 1.5**irank
39 allocate(matA(rank, rank)); call setUnifRand(matA)
40#if MatB_ENABLED
41 allocate(matB(rank, rank)); call setUnifRand(matB)
42#endif
43 write(*,"(*(g0,:,' '))") "Benchmarking with rank", rank
44
45 do i = 1, NBENCH
46 bench(i)%timing = bench(i)%getTiming(minsec = 0.07_RK)
47 end do
48
49 write(fileUnit,"(*(g0,:,', '))") rank, (bench(i)%timing%mean, i = 1, NBENCH)
50 deallocate(matA)
51#if MatB_ENABLED
52 deallocate(matB)
53#endif
54 end do loopOverMatrixRank
55 write(*,"(*(g0,:,' '))") dummySum
56 write(*,"(*(g0,:,' '))")
57
58 close(fileUnit)
59
60contains
61
62 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
63 ! procedure wrappers.
64 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65
66 subroutine setOverhead()
67 call getDummy()
68 end subroutine
69
70 subroutine getDummy()
71#if MatB_ENABLED
72 dummySum = dummySum + matB(1,1)
73#else
74 dummySum = dummySum + matA(1,1)
75#endif
76 end subroutine
77
78 subroutine setMatTrans()
79 block
81#if MatB_ENABLED
82 call setMatTrans(matA, matB)
83#else
84 call setMatTrans(matA)
85#endif
86 call getDummy()
87 end block
88 end subroutine
89
90 subroutine transpose()
91 block
92 intrinsic :: transpose
93#if MatB_ENABLED
94 matB = transpose(matA)
95#else
96 matA = transpose(matA)
97#endif
98 call getDummy()
99 end block
100 end subroutine
101
102end program benchmark
Generate and return an object of type timing_type containing the benchmark timing information and sta...
Definition: pm_bench.F90:574
This module contains abstract interfaces and types that facilitate benchmarking of different procedur...
Definition: pm_bench.F90:41
This is the class for creating benchmark and performance-profiling objects.
Definition: pm_bench.F90:386

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Postprocessing of the benchmark output
1#!/usr/bin/env python
2
3import matplotlib.pyplot as plt
4import pandas as pd
5import numpy as np
6
7import os
8dirname = os.path.basename(os.getcwd())
9
10fontsize = 14
11
12df = pd.read_csv("main.out", delimiter = ", ")
13colnames = list(df.columns.values)
14
15
18
19ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
20ax = plt.subplot()
21
22for colname in colnames[1:]:
23 plt.plot( df[colnames[0]].values
24 , df[colname].values
25 , linewidth = 2
26 )
27
28plt.xticks(fontsize = fontsize)
29plt.yticks(fontsize = fontsize)
30ax.set_xlabel(colnames[0], fontsize = fontsize)
31ax.set_ylabel("Runtime [ seconds ]", fontsize = fontsize)
32ax.set_title(" vs. ".join(colnames[1:])+"\nLower is better.", fontsize = fontsize)
33ax.set_xscale("log")
34ax.set_yscale("log")
35plt.minorticks_on()
36plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
37ax.tick_params(axis = "y", which = "minor")
38ax.tick_params(axis = "x", which = "minor")
39ax.legend ( colnames[1:]
40 #, loc='center left'
41 #, bbox_to_anchor=(1, 0.5)
42 , fontsize = fontsize
43 )
44
45plt.tight_layout()
46plt.savefig("benchmark." + dirname + ".runtime.png")
47
48
51
52ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
53ax = plt.subplot()
54
55plt.plot( df[colnames[0]].values
56 , np.ones(len(df[colnames[0]].values))
57 , linestyle = "--"
58 #, color = "black"
59 , linewidth = 2
60 )
61for colname in colnames[2:]:
62 plt.plot( df[colnames[0]].values
63 , df[colname].values / df[colnames[1]].values
64 , linewidth = 2
65 )
66
67plt.xticks(fontsize = fontsize)
68plt.yticks(fontsize = fontsize)
69ax.set_xlabel(colnames[0], fontsize = fontsize)
70ax.set_ylabel("Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
71ax.set_title("Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}().".format(colnames[1]), fontsize = fontsize)
72ax.set_xscale("log")
73#ax.set_yscale("log")
74plt.minorticks_on()
75plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
76ax.tick_params(axis = "y", which = "minor")
77ax.tick_params(axis = "x", which = "minor")
78ax.legend ( colnames[1:]
79 #, bbox_to_anchor = (1, 0.5)
80 #, loc = "center left"
81 , fontsize = fontsize
82 )
83
84plt.tight_layout()
85plt.savefig("benchmark." + dirname + ".runtime.ratio.png")

Visualization of the benchmark output

Benchmark moral
  1. The procedures under the generic interface setMatTrans use a cache-oblivious approach to matrix Symmetric transposition.
    As such, they are particularly efficient and cache-friendly for large matrices.
    As such, the generic interface setMatTrans can be significantly faster than the Fortran intrinsic transpose(), depending on the Fortran compiler used.
    This is particularly true for large-order matrices.


Benchmark :: The runtime performance of setMatTrans vs. Fortran intrinsic transpose().

1! Test the performance of `transpose()` vs. `setMatTrans()`.
2program benchmark
3
4 use iso_fortran_env, only: error_unit
5 use pm_kind, only: IK, RKG => RKS, RK, SK
6 use pm_distUnif, only: setUnifRand
7 use pm_bench, only: bench_type
8
9 implicit none
10
11 integer(IK) :: i
12 integer(IK) :: fileUnit
13 integer(IK) :: rank, irank
14 integer(IK) , parameter :: NRANK = 20_IK
15 integer(IK) , parameter :: NBENCH = 2_IK
16 complex(RKG) :: dummySum = 0._RKG
17 complex(RKG), allocatable :: matA(:,:)
18 type(bench_type) :: bench(NBENCH)
19
20 bench(1) = bench_type(name = SK_"setMatTrans(transHerm)", exec = setMatTrans , overhead = setOverhead)
21 bench(2) = bench_type(name = SK_"transpose(conjg())", exec = transpose , overhead = setOverhead)
22
23
24 write(*,"(*(g0,:,' '))")
25 write(*,"(*(g0,:,' '))") "setMatTrans() vs. transpose(conjg())"
26 write(*,"(*(g0,:,' '))")
27
28 open(newunit = fileUnit, file = "main.out", status = "replace")
29
30 write(fileUnit, "(*(g0,:,', '))") "MatrixRank", (bench(i)%name, i = 1, NBENCH)
31
32 loopOverMatrixRank: do irank = 1, NRANK
33
34 rank = 1.5**irank
35 allocate(matA(rank, rank))
36 write(*,"(*(g0,:,' '))") "Benchmarking with rank", rank
37 call setUnifRand(matA)
38
39 do i = 1, NBENCH
40 bench(i)%timing = bench(i)%getTiming(minsec = 0.07_RK)
41 end do
42
43 write(fileUnit,"(*(g0,:,', '))") rank, (bench(i)%timing%mean, i = 1, NBENCH)
44 deallocate(matA)
45
46 end do loopOverMatrixRank
47 write(*,"(*(g0,:,' '))") dummySum
48 write(*,"(*(g0,:,' '))")
49
50 close(fileUnit)
51
52contains
53
54 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
55 ! procedure wrappers.
56 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
57
58 subroutine setOverhead()
59 call getDummy()
60 end subroutine
61
62 subroutine getDummy()
63 dummySum = dummySum + matA(1,1)
64 end subroutine
65
66 subroutine setMatTrans()
67 block
69 call setMatTrans(matA, operation = transHerm)
70 call getDummy()
71 end block
72 end subroutine
73
74 subroutine transpose()
75 block
76 intrinsic :: transpose
77 matA = transpose(conjg(matA))
78 call getDummy()
79 end block
80 end subroutine
81
82end program benchmark
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Postprocessing of the benchmark output
1#!/usr/bin/env python
2
3import matplotlib.pyplot as plt
4import pandas as pd
5import numpy as np
6
7import os
8dirname = os.path.basename(os.getcwd())
9
10fontsize = 14
11
12df = pd.read_csv("main.out", delimiter = ", ")
13colnames = list(df.columns.values)
14
15
18
19ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
20ax = plt.subplot()
21
22for colname in colnames[1:]:
23 plt.plot( df[colnames[0]].values
24 , df[colname].values
25 , linewidth = 2
26 )
27
28plt.xticks(fontsize = fontsize)
29plt.yticks(fontsize = fontsize)
30ax.set_xlabel(colnames[0], fontsize = fontsize)
31ax.set_ylabel("Runtime [ seconds ]", fontsize = fontsize)
32ax.set_title(" vs. ".join(colnames[1:])+"\nLower is better.", fontsize = fontsize)
33ax.set_xscale("log")
34ax.set_yscale("log")
35plt.minorticks_on()
36plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
37ax.tick_params(axis = "y", which = "minor")
38ax.tick_params(axis = "x", which = "minor")
39ax.legend ( colnames[1:]
40 #, loc='center left'
41 #, bbox_to_anchor=(1, 0.5)
42 , fontsize = fontsize
43 )
44
45plt.tight_layout()
46plt.savefig("benchmark." + dirname + ".runtime.png")
47
48
51
52ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
53ax = plt.subplot()
54
55plt.plot( df[colnames[0]].values
56 , np.ones(len(df[colnames[0]].values))
57 , linestyle = "--"
58 #, color = "black"
59 , linewidth = 2
60 )
61for colname in colnames[2:]:
62 plt.plot( df[colnames[0]].values
63 , df[colname].values / df[colnames[1]].values
64 , linewidth = 2
65 )
66
67plt.xticks(fontsize = fontsize)
68plt.yticks(fontsize = fontsize)
69ax.set_xlabel(colnames[0], fontsize = fontsize)
70ax.set_ylabel("Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
71ax.set_title("Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}().".format(colnames[1]), fontsize = fontsize)
72ax.set_xscale("log")
73#ax.set_yscale("log")
74plt.minorticks_on()
75plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
76ax.tick_params(axis = "y", which = "minor")
77ax.tick_params(axis = "x", which = "minor")
78ax.legend ( colnames[1:]
79 #, bbox_to_anchor = (1, 0.5)
80 #, loc = "center left"
81 , fontsize = fontsize
82 )
83
84plt.tight_layout()
85plt.savefig("benchmark." + dirname + ".runtime.ratio.png")

Visualization of the benchmark output

Benchmark moral
  1. The procedures under the generic interface setMatTrans use a cache-oblivious approach to matrix Hermitian transposition.
    As such, they are particularly efficient and cache-friendly for large matrices.
    As such, the generic interface setMatTrans can be significantly faster than the Fortran intrinsic transpose(), depending on the Fortran compiler used.
    This is particularly true for large-order matrices.


Benchmark :: The runtime performance of setMatTrans vs. Fortran intrinsic transpose().

1#define MatB_ENABLED 0
2! Test the performance of `transpose()` vs. `setMatTrans()`.
3program benchmark
4
5 use iso_fortran_env, only: error_unit
6 use pm_kind, only: IK, RKG => RK, RK, SK
7 use pm_distUnif, only: setUnifRand
10 use pm_arrayUnique, only: getUnique
11 use pm_bench, only: bench_type
12 use pm_val2str, only: getStr
13
14 implicit none
15
16 integer(IK) :: i
17 integer(IK) :: fileUnit
18 integer(IK) :: iblock
19 integer(IK) :: bsize
20 integer(IK) , parameter :: RANK = 1000_IK
21 real(RKG) :: dummySum = 0._RKG
22 integer(IK) , allocatable :: BlockSize(:)
23 type(bench_type), allocatable :: bench(:)
24 real(RKG) , allocatable :: matA(:,:)
25#if MatB_ENABLED
26 real(RKG) , allocatable :: matB(:,:)
27 allocate(matB(RANK, RANK))
28 call setUnifRand(matB)
29#endif
30 allocate(matA(RANK, RANK))
31 call setUnifRand(matA)
32
33 bench = [ bench_type(name = getReplaced(SK_"setMatTrans(matA(RANK,RANK))", SK_"RANK", getStr(RANK)), exec = setMatTrans, overhead = setOverhead) &
34 , bench_type(name = getReplaced( SK_"transpose(matA(RANK,RANK))", SK_"RANK", getStr(RANK)), exec = transpose, overhead = setOverhead) &
35 ]
36
37 BlockSize = getUnique(int(getLogSpace(log(1._RKG), log(real(2*RANK, RKG)), count = 50_IK), IK))
38
39 write(*,"(*(g0,:,' '))")
40 write(*,"(*(g0,:,' '))") "setMatTransBlock"
41 write(*,"(*(g0,:,' '))")
42
43 open(newunit = fileUnit, file = "main.out", status = "replace")
44
45 write(fileUnit, "(*(g0,:,', '))") "BlockSize", (bench(i)%name, i = 1, size(bench))
46
47 loopOverMatrixRank: do iblock = 1, size(BlockSize)
48
49 bsize = BlockSize(iblock)
50 write(*,"(*(g0,:,' '))") "Benchmarking with block size", bsize
51
52 do i = 1, size(bench)
53 bench(i)%timing = bench(i)%getTiming(minsec = 0.07_RK)
54 end do
55
56 write(fileUnit,"(*(g0,:,', '))") bsize, (bench(i)%timing%mean, i = 1, size(bench))
57
58 end do loopOverMatrixRank
59
60 close(fileUnit)
61
62 write(*,"(*(g0,:,' '))") dummySum
63 write(*,"(*(g0,:,' '))")
64
65contains
66
67 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68 ! procedure wrappers.
69 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
70
71 subroutine setOverhead()
72 call getDummy()
73 end subroutine
74
75 subroutine getDummy()
76#if MatB_ENABLED
77 dummySum = dummySum + matB(1,1)
78#else
79 dummySum = dummySum + matA(1,1)
80#endif
81 end subroutine
82
83 subroutine setMatTrans()
84 block
86#if MatB_ENABLED
87 call setMatTrans(matA, matB, bsize)
88#else
89 call setMatTrans(matA, bsize)
90#endif
91 call getDummy()
92 end block
93 end subroutine
94
95 subroutine transpose()
96 block
97 intrinsic :: transpose
98#if MatB_ENABLED
99 matB = transpose(matA)
100#else
101 matA = transpose(matA)
102#endif
103 call getDummy()
104 end block
105 end subroutine
106
107end program benchmark
Generate and return an arrayNew of the same type and kind as the input array, in which the requested ...
Generate count evenly-logarithmically-spaced points over the interval [base**logx1,...
Generate and return a vector of unique values in the input array.
Generate and return the conversion of the input value to an output Fortran string,...
Definition: pm_val2str.F90:167
This module contains procedures and generic interfaces for replacing patterns within arrays of variou...
This module contains procedures and generic interfaces for generating arrays with linear or logarithm...
This module contains procedures and generic interfaces for finding unique values of an input array of...
This module contains the generic procedures for converting values of different types and kinds to For...
Definition: pm_val2str.F90:58

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Postprocessing of the benchmark output
1#!/usr/bin/env python
2
3import matplotlib.pyplot as plt
4import pandas as pd
5import numpy as np
6
7import os
8dirname = os.path.basename(os.getcwd())
9
10fontsize = 14
11
12df = pd.read_csv("main.out", delimiter = ", ")
13colnames = list(df.columns.values)
14
15
18
19ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
20ax = plt.subplot()
21
22for colname in colnames[1:]:
23 plt.plot( df[colnames[0]].values
24 , df[colname].values
25 , linewidth = 2
26 )
27
28plt.xticks(fontsize = fontsize)
29plt.yticks(fontsize = fontsize)
30ax.set_xlabel(colnames[0], fontsize = fontsize)
31ax.set_ylabel("Runtime [ seconds ]", fontsize = fontsize)
32ax.set_title(" vs. ".join(colnames[1:])+"\nLower is better.", fontsize = fontsize)
33ax.set_xscale("log")
34ax.set_yscale("log")
35plt.minorticks_on()
36plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
37ax.tick_params(axis = "y", which = "minor")
38ax.tick_params(axis = "x", which = "minor")
39ax.legend ( colnames[1:]
40 #, loc='center left'
41 #, bbox_to_anchor=(1, 0.5)
42 , fontsize = fontsize
43 )
44
45plt.tight_layout()
46plt.savefig("benchmark." + dirname + ".runtime.png")
47
48
51
52ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
53ax = plt.subplot()
54
55plt.plot( df[colnames[0]].values
56 , np.ones(len(df[colnames[0]].values))
57 , linestyle = "--"
58 #, color = "black"
59 , linewidth = 2
60 )
61for colname in colnames[2:]:
62 plt.plot( df[colnames[0]].values
63 , df[colname].values / df[colnames[1]].values
64 , linewidth = 2
65 )
66
67plt.xticks(fontsize = fontsize)
68plt.yticks(fontsize = fontsize)
69ax.set_xlabel(colnames[0], fontsize = fontsize)
70ax.set_ylabel("Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
71ax.set_title("Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}().".format(colnames[1]), fontsize = fontsize)
72ax.set_xscale("log")
73#ax.set_yscale("log")
74plt.minorticks_on()
75plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
76ax.tick_params(axis = "y", which = "minor")
77ax.tick_params(axis = "x", which = "minor")
78ax.legend ( colnames[1:]
79 #, bbox_to_anchor = (1, 0.5)
80 #, loc = "center left"
81 , fontsize = fontsize
82 )
83
84plt.tight_layout()
85plt.savefig("benchmark." + dirname + ".runtime.ratio.png")

Visualization of the benchmark output

Benchmark moral
  1. The procedures under the generic interface setMatTrans use a cache-oblivious approach to matrix Hermitian transposition.
    As such, they are particularly efficient and cache-friendly for large matrices.
    However, despite its name and goals, the cache-oblivious algorithm is not entirely independent of the cache size (and hence the minimum block size) as evidenced here.
Test:
test_pm_matrixTrans


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Todo:
Normal Priority: The performance of this algorithm could be possibly improved by converting the recursive procedure calls within the implementation to do-loops.
Author:
Amir Shahmoradi, September 1, 2017, 12:00 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas Austin

Definition at line 767 of file pm_matrixTrans.F90.


The documentation for this interface was generated from the following file: