ParaMonte Fortran 2.0.0
Parallel Monte Carlo and Machine Learning Library
See the latest version documentation.
pm_mathSum::getDot Interface Reference

Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).
More...

Detailed Description

Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).

Parameters
[in]x: The input contiguous vector of arbitrary size, of
  1. type complex of kind any supported by the processor (e.g., CK, CK32, CK64, or CK128), or
  2. type real of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128),
representing the x value whose dot product with y is to be returned.
[in]y: The input contiguous vector of the same size, type, and kind as the input x, of
representing the y values whose dot product with x is to be returned.
[in]method: The input scalar object that can be,
  1. the constant iteration or equivalently, an object of type iteration_type.
    Specifying this option forces the use of normal iterative approach to summing all array elements.
    This option is equivalent to the default implementations of the Fortran intrinsic sum().
    This approach is the fastest serial method among all, but also generally the least accurate.
  2. the constant recursion or equivalently, an object of type recursion_type.
    Specifying this option forces the use of recursive pairwise approach to summing all array elements.
  3. the constant kahanbabu or equivalently, an object of type kahanbabu_type.
    Specifying this option forces the use of the Kahan-Babuska compensated approach to summing all array elements.
    This algorithm, while accurate, can be up to 2-4 times more expensive than the iterative approach discussed above.
  4. the constant fablocked or equivalently, an object of type fablocked_type.
    Specifying this option forces the use of the Fast Accurate Blocked approach to summing all array elements.
  5. the constant nablocked or equivalently, an object of type nablocked_type.
    Specifying this option forces the use of the Naive Blocked approach to summing all array elements.
The presence of this argument is merely for compile-time resolution of the procedures of this generic interface.
(optional, default = fablocked.)
Returns
dotres : The output scalar of the same type and kind containing the result of the dot product of the input x and y vectors.


Possible calling interfaces

use pm_mathSum, only: getDot, iteration, recursion, kahanbabu, fablocked, nablocked
dotres = getDot(x, y)
dotres = getDot(x, y, method)
Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).
This module contains procedures and generic interfaces for computing sum(x) accurately when x is a lo...
Definition: pm_mathSum.F90:72
type(kahanbabu_type), parameter kahanbabu
This is a scalar parameter object of type kahanbabu_type.
Definition: pm_mathSum.F90:266
type(fablocked_type), parameter fablocked
This is a scalar parameter object of type fablocked_type.
Definition: pm_mathSum.F90:143
type(nablocked_type), parameter nablocked
This is a scalar parameter object of type nablocked_type.
Definition: pm_mathSum.F90:206
Warning
The returned value is 0 is the size of the condition size(x) == 0 holds.
The condition size(x) == size(y) must hold for the corresponding input arguments.
This condition is verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.
The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds.
See also
get1mexp
getLog1p
getCumSum
getLogAddExp
getLogSubExp
getLogSumExp


Example usage

1program example
2
3 use pm_kind, only: SK, IK, LK, RKH
4 use pm_distUnif, only: setUnifRand
7 use pm_mathSum, only: getDot, iteration, recursion, kahanbabu, fablocked, nablocked
8 use pm_io, only: display_type
9
10 implicit none
11
12 real(RKH) :: truth
13 real(RKH), allocatable :: dotres(:), relerr(:)
14 type(display_type) :: disp
15
16 disp = display_type(file = "main.out.F90")
17
18 block
19 use pm_kind, only: TKG => RKS
20 integer(IK), parameter :: lenx = 10**7
21 real(TKG) :: x(lenx), y(lenx), lb, ub
22 call disp%skip()
23 call disp%show("lenx")
24 call disp%show( lenx )
25 call disp%show("lb = real(lenx, TKG); ub = 1._TKG")
26 lb = real(lenx, TKG); ub = 1._TKG
27 call disp%show("call setLinSpace(x, lb, ub) ! call setUnifRand(x)")
28 call setLinSpace(x, lb, ub) ! call setUnifRand(x)
29 call disp%show("y = 1 !/ x")
30 y = 1 !/ x
31 call disp%show("truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison")
32 truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison
33 call disp%show("truth")
34 call disp%show( truth )
35 call disp%show("dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]")
36 dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]
37 call disp%show("dotres")
38 call disp%show( dotres )
39 call disp%show("relerr = abs(truth - dotres) / truth")
40 relerr = abs(truth - dotres) / truth
41 call disp%show("relerr")
42 call disp%show( relerr )
43 call disp%show("getRankDense(relerr)")
44 call disp%show( getRankDense(relerr) )
45 call disp%skip()
46 end block
47
48 block
49 use pm_kind, only: TKG => RKD
50 integer(IK), parameter :: lenx = 10**8
51 real(TKG) :: x(lenx), y(lenx), lb, ub
52 call disp%skip()
53 call disp%show("lenx")
54 call disp%show( lenx )
55 call disp%show("lb = real(lenx, TKG); ub = 1._TKG")
56 lb = real(lenx, TKG); ub = 1._TKG
57 call disp%show("call setLinSpace(x, lb, ub) ! call setUnifRand(x)")
58 call setLinSpace(x, lb, ub) ! call setUnifRand(x)
59 call disp%show("y = 1 !/ x")
60 y = 1 !/ x
61 call disp%show("truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison")
62 truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison
63 call disp%show("truth")
64 call disp%show( truth )
65 call disp%show("dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]")
66 dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]
67 call disp%show("dotres")
68 call disp%show( dotres )
69 call disp%show("relerr = abs(truth - dotres) / truth")
70 relerr = abs(truth - dotres) / truth
71 call disp%show("relerr")
72 call disp%show( relerr )
73 call disp%show("getRankDense(relerr)")
74 call disp%show( getRankDense(relerr) )
75 call disp%skip()
76 end block
77
78 block
79 call disp%skip()
80 call disp%show("[getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])]")
81 call disp%show( [getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])] )
82 call disp%skip()
83 end block
84
85end program example
Generate and return the Dense rank of the input scalar string or contiguous array of rank 1 in ascend...
Return the linSpace output argument with size(linSpace) elements of evenly-spaced values over the int...
Return a uniform random scalar or contiguous array of arbitrary rank of randomly uniformly distribute...
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508
This module contains procedures and generic interfaces for obtaining various rankings of elements of ...
This module contains procedures and generic interfaces for generating arrays with linear or logarithm...
This module contains classes and procedures for computing various statistical quantities related to t...
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540
integer, parameter RKD
The double precision real kind in Fortran mode. On most platforms, this is an 64-bit real kind.
Definition: pm_kind.F90:568
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539
integer, parameter RKH
The scalar integer constant of intrinsic default kind, representing the highest-precision real kind t...
Definition: pm_kind.F90:858
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler
1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example Windows Batch compile command via Intel ifort compiler
1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler
1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe

Example output
1
2lenx
3+10000000
4lb = real(lenx, TKG); ub = 1._TKG
5call setLinSpace(x, lb, ub) ! call setUnifRand(x)
6y = 1 !/ x
7truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison
8truth
9+50000005000000.0000000000000000000000
10dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]
11dotres
12+50000000188416.0000000000000000000000, +49423623127040.0000000000000000000000, +50000008577024.0000000000000000000000, +50000004382720.0000000000000000000000, +50000000188416.0000000000000000000000, +49999983411200.0000000000000000000000
13relerr = abs(truth - dotres) / truth
14relerr
15+0.962316703768329623167037683296231645E-7, +0.115276363064363693563630643636935635E-1, +0.715404728459527154047284595271540462E-7, +0.123455987654401234559876544012345606E-7, +0.962316703768329623167037683296231645E-7, +0.431775956822404317759568224043177598E-6
16getRankDense(relerr)
17+3, +5, +2, +1, +3, +4
18
19
20lenx
21+100000000
22lb = real(lenx, TKG); ub = 1._TKG
23call setLinSpace(x, lb, ub) ! call setUnifRand(x)
24y = 1 !/ x
25truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison
26truth
27+5000000050000000.00000000000000000000
28dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]
29dotres
30+5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000
31relerr = abs(truth - dotres) / truth
32relerr
33+0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000
34getRankDense(relerr)
35+1, +1, +1, +1, +1, +1
36
37
38[getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])]
39+0.00000000, +1.00000000, +2.00000000, +3.00000000
40
41
Benchmarks:


Benchmark :: The effects of method on runtime efficiency

The following program compares the runtime performance of getDot algorithms with the default Fortran dot_product() intrinsic function.
1! Test the performance of `getDot()` with and without the selection `control` argument.
2program benchmark
3
4 use pm_bench, only: bench_type
5 use pm_distUnif, only: setUnifRand
6 use pm_mathCumSum, only: setCumSum
8 use pm_kind, only: SK, IK, LK, RKH, RK, RKG => RKD
9 use iso_fortran_env, only: error_unit
10
11 implicit none
12
13 integer(IK) :: ibench
14 integer(IK) :: iarr
15 integer(IK) :: arrlen
16 integer(IK) :: fileUnit
17 real(RKG) :: dumsum = 0._RKG
18 real(RKG) :: array(10**8)
19 real(RKG) :: dotres
20 type(bench_type) , allocatable :: bench(:)
21 logical(LK) :: underflowEnabled
22
23 bench = [ bench_type(name = SK_"dot_product()", exec = getDotFortran, overhead = setOverhead) &
24 , bench_type(name = SK_"fablocked", exec = getDotFAB, overhead = setOverhead) &
25 , bench_type(name = SK_"nablocked", exec = getDotNAB, overhead = setOverhead) &
26 , bench_type(name = SK_"kahanbabu", exec = getDotKAB, overhead = setOverhead) &
27 , bench_type(name = SK_"iteration", exec = getDotIte, overhead = setOverhead) &
28 , bench_type(name = SK_"recursion", exec = getDotRec, overhead = setOverhead) &
29 ]
30
31 write(*,"(*(g0,:,' '))")
32 write(*,"(*(g0,:,' '))") "dot_product() vs. getDot()"
33 write(*,"(*(g0,:,' '))")
34
35 open(newunit = fileUnit, file = "main.out", status = "replace")
36
37 call setUnifRand(array)
38 !truth = array; call setCumSum(truth)
39 write(fileUnit, "(*(g0,:,','))") "Array Size", (bench(ibench)%name, ibench = 1, size(bench))
40 loopOverArraySize: do iarr = 2, 26, 2
41
42 arrlen = 2**iarr
43 write(*,"(*(g0,:,' '))") "Benchmarking with array size", arrlen
44 do ibench = 1, size(bench)
45 bench(ibench)%timing = bench(ibench)%getTiming(minsec = 0.07_RK)
46 end do
47 write(fileUnit,"(*(g0,:,','))") arrlen, (bench(ibench)%timing%mean, ibench = 1, size(bench))
48
49 end do loopOverArraySize
50 write(*,"(*(g0,:,' '))") dumsum
51 write(*,"(*(g0,:,' '))")
52
53 close(fileUnit)
54
55contains
56
57 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
58 ! procedure wrappers.
59 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
60
61 subroutine setOverhead()
62 call getDummy()
63 end subroutine
64
65 subroutine getDummy()
66 dumsum = dumsum + dotres
67 end subroutine
68
69 subroutine getDotFortran()
70 dotres = dot_product(array(1 : arrlen), array(1 : arrlen))
71 call getDummy()
72 end subroutine
73
74 subroutine getDotFAB()
75 use pm_mathSum, only: getDot!, fablocked
76 dotres = getDot(array(1 : arrlen), array(1 : arrlen))!, fablocked)
77 call getDummy()
78 end subroutine
79
80 subroutine getDotNAB()
81 use pm_mathSum, only: getDot, nablocked
82 dotres = getDot(array(1 : arrlen), array(1 : arrlen), nablocked)
83 call getDummy()
84 end subroutine
85
86 subroutine getDotKAB()
87 use pm_mathSum, only: getDot, kahanbabu
88 dotres = getDot(array(1 : arrlen), array(1 : arrlen), kahanbabu)
89 call getDummy()
90 end subroutine
91
92 subroutine getDotIte()
93 use pm_mathSum, only: getDot, iteration
94 dotres = getDot(array(1 : arrlen), array(1 : arrlen), iteration)
95 call getDummy()
96 end subroutine
97
98 subroutine getDotRec()
99 use pm_mathSum, only: getDot, recursion
100 dotres = getDot(array(1 : arrlen), array(1 : arrlen), recursion)
101 call getDummy()
102 end subroutine
103
104end program benchmark
Allocate or resize (shrink or expand) an input allocatable scalar string or array of rank 1....
Generate and return an object of type timing_type containing the benchmark timing information and sta...
Definition: pm_bench.F90:574
Return the cumulative sum of the input array, optionally in the backward direction and optionally,...
This module contains procedures and generic interfaces for resizing allocatable arrays of various typ...
This module contains abstract interfaces and types that facilitate benchmarking of different procedur...
Definition: pm_bench.F90:41
integer, parameter RK
The default real kind in the ParaMonte library: real64 in Fortran, c_double in C-Fortran Interoperati...
Definition: pm_kind.F90:543
This module contains the procedures and interfaces for computing the cumulative sum of an array.
This is the class for creating benchmark and performance-profiling objects.
Definition: pm_bench.F90:386


Example Unix compile command via Intel ifort compiler

1#!/usr/bin/env sh
2rm main.exe
3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe


Example Windows Batch compile command via Intel ifort compiler

1del main.exe
2set PATH=..\..\..\lib;%PATH%
3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
4main.exe


Example Unix / MinGW compile command via GNU gfortran compiler

1#!/usr/bin/env sh
2rm main.exe
3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
4./main.exe


Postprocessing of the benchmark output

1#!/usr/bin/env python
2
3import matplotlib.pyplot as plt
4import pandas as pd
5import numpy as np
6
7import os
8dirname = os.path.basename(os.getcwd())
9
10fontsize = 14
11
12df = pd.read_csv("main.out", delimiter = ",")
13colnames = list(df.columns.values)
14
15
18
19ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
20ax = plt.subplot()
21
22for colname in colnames[1:]:
23 plt.plot( df[colnames[0]].values
24 , df[colname].values
25 , linewidth = 2
26 )
27
28plt.xticks(fontsize = fontsize)
29plt.yticks(fontsize = fontsize)
30ax.set_xlabel(colnames[0], fontsize = fontsize)
31ax.set_ylabel("Runtime [ seconds ]", fontsize = fontsize)
32ax.set_title(" vs. ".join(colnames[1:])+"\nLower is better.", fontsize = fontsize)
33ax.set_xscale("log")
34ax.set_yscale("log")
35plt.minorticks_on()
36plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
37ax.tick_params(axis = "y", which = "minor")
38ax.tick_params(axis = "x", which = "minor")
39ax.legend ( colnames[1:]
40 #, loc='center left'
41 #, bbox_to_anchor=(1, 0.5)
42 , fontsize = fontsize
43 )
44
45plt.tight_layout()
46plt.savefig("benchmark." + dirname + ".runtime.png")
47
48
51
52ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
53ax = plt.subplot()
54
55plt.plot( df[colnames[0]].values
56 , np.ones(len(df[colnames[0]].values))
57 , linestyle = "--"
58 #, color = "black"
59 , linewidth = 2
60 )
61for colname in colnames[2:]:
62 plt.plot( df[colnames[0]].values
63 , df[colname].values / df[colnames[1]].values
64 , linewidth = 2
65 )
66
67plt.xticks(fontsize = fontsize)
68plt.yticks(fontsize = fontsize)
69ax.set_xlabel(colnames[0], fontsize = fontsize)
70ax.set_ylabel("Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
71ax.set_title("Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}.".format(colnames[1]), fontsize = fontsize)
72ax.set_xscale("log")
73ax.set_yscale("log")
74plt.minorticks_on()
75plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
76ax.tick_params(axis = "y", which = "minor")
77ax.tick_params(axis = "x", which = "minor")
78ax.legend ( colnames[1:]
79 #, bbox_to_anchor = (1, 0.5)
80 #, loc = "center left"
81 , fontsize = fontsize
82 )
83
84plt.tight_layout()
85plt.savefig("benchmark." + dirname + ".runtime.ratio.png")


Visualization of the benchmark output


Benchmark moral

  1. Among all summation algorithms, fablocked_type appears to offer the most accurate result while also being even faster than the default Fortran dot_product() and all other implemented summation algorithms for array sizes > 100.
Test:
test_pm_mathSum


Final Remarks


If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

  1. If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
  2. If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Author:
Amir Shahmoradi, August 8, 2024, 10:23 PM, NASA Goddard Space Flight Center, Washington, D.C.

Definition at line 1247 of file pm_mathSum.F90.


The documentation for this interface was generated from the following file: