Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).
More...

Detailed Description

Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).

Parameters

[in]	x	: The input `contiguous` vector of arbitrary size, of type `complex` of kind any supported by the processor (e.g., CK, CK32, CK64, or CK128), or type `real` of kind any supported by the processor (e.g., RK, RK32, RK64, or RK128), representing the `x` value whose dot product with `y` is to be returned.
[in]	y	: The input `contiguous` vector of the same size, type, and kind as the input `x`, of representing the `y` values whose dot product with `x` is to be returned.
[in]	method	: The input scalar object that can be, the constant iteration or equivalently, an object of type iteration_type. Specifying this option forces the use of normal iterative approach to summing all array elements. This option is equivalent to the default implementations of the Fortran intrinsic `sum()`. This approach is the fastest serial method among all, but also generally the least accurate. the constant recursion or equivalently, an object of type recursion_type. Specifying this option forces the use of recursive pairwise approach to summing all array elements. the constant kahanbabu or equivalently, an object of type kahanbabu_type. Specifying this option forces the use of the Kahan-Babuska compensated approach to summing all array elements. This algorithm, while accurate, can be up to 2-4 times more expensive than the iterative approach discussed above. the constant fablocked or equivalently, an object of type fablocked_type. Specifying this option forces the use of the Fast Accurate Blocked approach to summing all array elements. the constant nablocked or equivalently, an object of type nablocked_type. Specifying this option forces the use of the Naive Blocked approach to summing all array elements. The presence of this argument is merely for compile-time resolution of the procedures of this generic interface. (optional, default = fablocked.)

Returns: dotres : The output scalar of the same type and kind containing the result of the dot product of the input x and y vectors.

Possible calling interfaces ⛓

: use pm_mathSum, only: getDot, iteration, recursion, kahanbabu, fablocked, nablocked

dotres = getDot(x, y)

dotres = getDot(x, y, method)

pm_mathSum::getDot
Generate and return the expression dot_product(x, y) accurately (almost without roundoff error).
Definition: pm_mathSum.F90:1247

pm_mathSum
This module contains procedures and generic interfaces for computing sum(x) accurately when x is a lo...
Definition: pm_mathSum.F90:72

pm_mathSum::kahanbabu
type(kahanbabu_type), parameter kahanbabu
This is a scalar parameter object of type kahanbabu_type.
Definition: pm_mathSum.F90:266

pm_mathSum::fablocked
type(fablocked_type), parameter fablocked
This is a scalar parameter object of type fablocked_type.
Definition: pm_mathSum.F90:143

pm_mathSum::nablocked
type(nablocked_type), parameter nablocked
This is a scalar parameter object of type nablocked_type.
Definition: pm_mathSum.F90:206

Warning: The returned value is 0 is the size of the condition size(x) == 0 holds.
The condition size(x) == size(y) must hold for the corresponding input arguments.
This condition is verified only if the library is built with the preprocessor macro CHECK_ENABLED=1.; The pure procedure(s) documented herein become impure when the ParaMonte library is compiled with preprocessor macro CHECK_ENABLED=1.
By default, these procedures are pure in release build and impure in debug and testing builds.

See also: get1mexp
getLog1p
getCumSum
getLogAddExp
getLogSubExp
getLogSumExp

Example usage ⛓

: 1program example

2

3 use pm_kind, only: SK, IK, LK, RKH

4 use pm_distUnif, only: setUnifRand

5 use pm_arrayRank, only: getRankDense

6 use pm_arraySpace, only: setLinSpace

7 use pm_mathSum, only: getDot, iteration, recursion, kahanbabu, fablocked, nablocked

8 use pm_io, only: display_type

9

10 implicit none

11

12 real(RKH) :: truth

13 real(RKH), allocatable :: dotres(:), relerr(:)

14 type(display_type) :: disp

15

16 disp = display_type(file = "main.out.F90")

17

18 block

19 use pm_kind, only: TKG => RKS

20 integer(IK), parameter :: lenx = 10**7

21 real(TKG) :: x(lenx), y(lenx), lb, ub

22 call disp%skip()

23 call disp%show("lenx")

24 call disp%show( lenx )

25 call disp%show("lb = real(lenx, TKG); ub = 1._TKG")

26 lb = real(lenx, TKG); ub = 1._TKG

27 call disp%show("call setLinSpace(x, lb, ub) ! call setUnifRand(x)")

28 call setLinSpace(x, lb, ub) ! call setUnifRand(x)

29 call disp%show("y = 1 !/ x")

30 y = 1 !/ x

31 call disp%show("truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison")

32 truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison

33 call disp%show("truth")

34 call disp%show( truth )

35 call disp%show("dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]")

36 dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]

37 call disp%show("dotres")

38 call disp%show( dotres )

39 call disp%show("relerr = abs(truth - dotres) / truth")

40 relerr = abs(truth - dotres) / truth

41 call disp%show("relerr")

42 call disp%show( relerr )

43 call disp%show("getRankDense(relerr)")

44 call disp%show( getRankDense(relerr) )

45 call disp%skip()

46 end block

47

48 block

49 use pm_kind, only: TKG => RKD

50 integer(IK), parameter :: lenx = 10**8

51 real(TKG) :: x(lenx), y(lenx), lb, ub

52 call disp%skip()

53 call disp%show("lenx")

54 call disp%show( lenx )

55 call disp%show("lb = real(lenx, TKG); ub = 1._TKG")

56 lb = real(lenx, TKG); ub = 1._TKG

57 call disp%show("call setLinSpace(x, lb, ub) ! call setUnifRand(x)")

58 call setLinSpace(x, lb, ub) ! call setUnifRand(x)

59 call disp%show("y = 1 !/ x")

60 y = 1 !/ x

61 call disp%show("truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison")

62 truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison

63 call disp%show("truth")

64 call disp%show( truth )

65 call disp%show("dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]")

66 dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]

67 call disp%show("dotres")

68 call disp%show( dotres )

69 call disp%show("relerr = abs(truth - dotres) / truth")

70 relerr = abs(truth - dotres) / truth

71 call disp%show("relerr")

72 call disp%show( relerr )

73 call disp%show("getRankDense(relerr)")

74 call disp%show( getRankDense(relerr) )

75 call disp%skip()

76 end block

77

78 block

79 call disp%skip()

80 call disp%show("[getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])]")

81 call disp%show( [getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])] )

82 call disp%skip()

83 end block

84

85end program example

pm_arrayRank::getRankDense
Generate and return the Dense rank of the input scalar string or contiguous array of rank 1 in ascend...
Definition: pm_arrayRank.F90:626

pm_arraySpace::setLinSpace
Return the linSpace output argument with size(linSpace) elements of evenly-spaced values over the int...
Definition: pm_arraySpace.F90:324

pm_distUnif::setUnifRand
Return a uniform random scalar or contiguous array of arbitrary rank of randomly uniformly distribute...
Definition: pm_distUnif.F90:11083

pm_io::show
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11726

pm_io::skip
This is a generic method of the derived type display_type with pass attribute.
Definition: pm_io.F90:11508

pm_arrayRank
This module contains procedures and generic interfaces for obtaining various rankings of elements of ...
Definition: pm_arrayRank.F90:137

pm_arraySpace
This module contains procedures and generic interfaces for generating arrays with linear or logarithm...
Definition: pm_arraySpace.F90:33

pm_distUnif
This module contains classes and procedures for computing various statistical quantities related to t...
Definition: pm_distUnif.F90:274

pm_io
This module contains classes and procedures for input/output (IO) or generic display operations on st...
Definition: pm_io.F90:252

pm_io::disp
type(display_type) disp
This is a scalar module variable an object of type display_type for general display.
Definition: pm_io.F90:11393

pm_kind
This module defines the relevant Fortran kind type-parameters frequently used in the ParaMonte librar...
Definition: pm_kind.F90:268

pm_kind::LK
integer, parameter LK
The default logical kind in the ParaMonte library: kind(.true.) in Fortran, kind(....
Definition: pm_kind.F90:541

pm_kind::IK
integer, parameter IK
The default integer kind in the ParaMonte library: int32 in Fortran, c_int32_t in C-Fortran Interoper...
Definition: pm_kind.F90:540

pm_kind::RKD
integer, parameter RKD
The double precision real kind in Fortran mode. On most platforms, this is an 64-bit real kind.
Definition: pm_kind.F90:568

pm_kind::SK
integer, parameter SK
The default character kind in the ParaMonte library: kind("a") in Fortran, c_char in C-Fortran Intero...
Definition: pm_kind.F90:539

pm_kind::RKH
integer, parameter RKH
The scalar integer constant of intrinsic default kind, representing the highest-precision real kind t...
Definition: pm_kind.F90:858

pm_kind::RKS
integer, parameter RKS
The single-precision real kind in Fortran mode. On most platforms, this is an 32-bit real kind.
Definition: pm_kind.F90:567

pm_io::display_type
Generate and return an object of type display_type.
Definition: pm_io.F90:10282

Example Unix compile command via Intel ifort compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example Windows Batch compile command via Intel ifort compiler ⛓
1del main.exe

2set PATH=..\..\..\lib;%PATH%

3ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe

4main.exe

Example Unix / MinGW compile command via GNU gfortran compiler ⛓
1#!/usr/bin/env sh

2rm main.exe

3gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe

4./main.exe

Example output ⛓
1

2lenx

3+10000000

4lb = real(lenx, TKG); ub = 1._TKG

5call setLinSpace(x, lb, ub) ! call setUnifRand(x)

6y = 1 !/ x

7truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison

8truth

9+50000005000000.0000000000000000000000

10dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]

11dotres

12+50000000188416.0000000000000000000000, +49423623127040.0000000000000000000000, +50000008577024.0000000000000000000000, +50000004382720.0000000000000000000000, +50000000188416.0000000000000000000000, +49999983411200.0000000000000000000000

13relerr = abs(truth - dotres) / truth

14relerr

15+0.962316703768329623167037683296231645E-7, +0.115276363064363693563630643636935635E-1, +0.715404728459527154047284595271540462E-7, +0.123455987654401234559876544012345606E-7, +0.962316703768329623167037683296231645E-7, +0.431775956822404317759568224043177598E-6

16getRankDense(relerr)

17+3, +5, +2, +1, +3, +4

18

19

20lenx

21+100000000

22lb = real(lenx, TKG); ub = 1._TKG

23call setLinSpace(x, lb, ub) ! call setUnifRand(x)

24y = 1 !/ x

25truth = (real(ub, TKG) + real(lb, RKH)) * size(x, 1, IK) / 2 ! reference high-precision value for comparison

26truth

27+5000000050000000.00000000000000000000

28dotres = [getDot(x, y), getDot(x, y, iteration), getDot(x, y, recursion), getDot(x, y, kahanbabu), getDot(x, y, fablocked), getDot(x, y, nablocked)]

29dotres

30+5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000, +5000000050000000.00000000000000000000

31relerr = abs(truth - dotres) / truth

32relerr

33+0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000, +0.00000000000000000000000000000000000

34getRankDense(relerr)

35+1, +1, +1, +1, +1, +1

36

37

38[getDot([real ::], [real ::]), getDot([real :: 1], [real :: 1]), getDot([real :: 1, 1], [real :: 1, 1]), getDot([real :: 1, 1, 1], [real :: 1, 1, 1])]

39+0.00000000, +1.00000000, +2.00000000, +3.00000000

40

41

Benchmarks:

Benchmark :: The effects of method on runtime efficiency ⛓

: The following program compares the runtime performance of getDot algorithms with the default Fortran dot_product() intrinsic function.

! Test the performance of `getDot()` with and without the selection `control` argument.
program benchmark
 
    use pm_bench, only: bench_type
    use pm_distUnif, only: setUnifRand
    use pm_mathCumSum, only: setCumSum
    use pm_arrayResize, only: setResized
    use pm_kind, only: SK, IK, LK, RKH, RK, RKG => RKD
    use iso_fortran_env, only: error_unit
 
    implicit none
 
    integer(IK)                         :: ibench
    integer(IK)                         :: iarr
    integer(IK)                         :: arrlen
    integer(IK)                         :: fileUnit
    real(RKG)                           :: dumsum = 0._RKG
    real(RKG)                           :: array(10**8)
    real(RKG)                           :: dotres
    type(bench_type)    , allocatable   :: bench(:)
    logical(LK)                         :: underflowEnabled
 
    bench = [ bench_type(name = SK_"dot_product()", exec = getDotFortran, overhead = setOverhead) &
            , bench_type(name = SK_"fablocked", exec = getDotFAB, overhead = setOverhead) &
            , bench_type(name = SK_"nablocked", exec = getDotNAB, overhead = setOverhead) &
            , bench_type(name = SK_"kahanbabu", exec = getDotKAB, overhead = setOverhead) &
            , bench_type(name = SK_"iteration", exec = getDotIte, overhead = setOverhead) &
            , bench_type(name = SK_"recursion", exec = getDotRec, overhead = setOverhead) &
            ]
 
    write(*,"(*(g0,:,' '))")
    write(*,"(*(g0,:,' '))") "dot_product() vs. getDot()"
    write(*,"(*(g0,:,' '))")
 
    open(newunit = fileUnit, file = "main.out", status = "replace")
 
        call setUnifRand(array)
        !truth = array; call setCumSum(truth)
        write(fileUnit, "(*(g0,:,','))") "Array Size", (bench(ibench)%name, ibench = 1, size(bench))
        loopOverArraySize: do iarr = 2, 26, 2
 
            arrlen = 2**iarr
            write(*,"(*(g0,:,' '))") "Benchmarking with array size", arrlen
            do ibench = 1, size(bench)
                bench(ibench)%timing = bench(ibench)%getTiming(minsec = 0.07_RK)
            end do
            write(fileUnit,"(*(g0,:,','))") arrlen, (bench(ibench)%timing%mean, ibench = 1, size(bench))
 
        end do loopOverArraySize
        write(*,"(*(g0,:,' '))") dumsum
        write(*,"(*(g0,:,' '))")
 
    close(fileUnit)
 
contains
 
    !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    ! procedure wrappers.
    !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
    subroutine setOverhead()
        call getDummy()
    end subroutine
 
    subroutine getDummy()
        dumsum = dumsum + dotres
    end subroutine
 
    subroutine getDotFortran()
        dotres = dot_product(array(1 : arrlen), array(1 : arrlen))
        call getDummy()
    end subroutine
 
    subroutine getDotFAB()
        use pm_mathSum, only: getDot!, fablocked
        dotres = getDot(array(1 : arrlen), array(1 : arrlen))!, fablocked)
        call getDummy()
    end subroutine
 
    subroutine getDotNAB()
        use pm_mathSum, only: getDot, nablocked
        dotres = getDot(array(1 : arrlen), array(1 : arrlen), nablocked)
        call getDummy()
    end subroutine
 
    subroutine getDotKAB()
        use pm_mathSum, only: getDot, kahanbabu
        dotres = getDot(array(1 : arrlen), array(1 : arrlen), kahanbabu)
        call getDummy()
    end subroutine
 
    subroutine getDotIte()
        use pm_mathSum, only: getDot, iteration
        dotres = getDot(array(1 : arrlen), array(1 : arrlen), iteration)
        call getDummy()
    end subroutine
 
    subroutine getDotRec()
        use pm_mathSum, only: getDot, recursion
        dotres = getDot(array(1 : arrlen), array(1 : arrlen), recursion)
        call getDummy()
    end subroutine
 
end program benchmark

Example Unix compile command via Intel ifort compiler ⛓

#!/usr/bin/env sh
rm main.exe
ifort -fpp -standard-semantics -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
./main.exe

Example Windows Batch compile command via Intel ifort compiler ⛓

del main.exe
set PATH=..\..\..\lib;%PATH%
ifort /fpp /standard-semantics /O3 /I:..\..\..\include main.F90 ..\..\..\lib\libparamonte*.lib /exe:main.exe
main.exe

Example Unix / MinGW compile command via GNU gfortran compiler ⛓

#!/usr/bin/env sh
rm main.exe
gfortran -cpp -ffree-line-length-none -O3 -Wl,-rpath,../../../lib -I../../../inc main.F90 ../../../lib/libparamonte* -o main.exe
./main.exe

Postprocessing of the benchmark output ⛓

#!/usr/bin/env python
 
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
 
import os
dirname = os.path.basename(os.getcwd()) 
 
fontsize = 14
 
df = pd.read_csv("main.out", delimiter = ",")
colnames = list(df.columns.values)
 
 
 
ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
ax = plt.subplot()
 
for colname in colnames[1:]:
    plt.plot( df[colnames[0]].values
            , df[colname].values
            , linewidth = 2
            )
 
plt.xticks(fontsize = fontsize)
plt.yticks(fontsize = fontsize)
ax.set_xlabel(colnames[0], fontsize = fontsize)
ax.set_ylabel("Runtime [ seconds ]", fontsize = fontsize)
ax.set_title(" vs. ".join(colnames[1:])+"\nLower is better.", fontsize = fontsize)
ax.set_xscale("log")
ax.set_yscale("log")
plt.minorticks_on()
plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
ax.tick_params(axis = "y", which = "minor")
ax.tick_params(axis = "x", which = "minor")
ax.legend   ( colnames[1:]
           #, loc='center left'
           #, bbox_to_anchor=(1, 0.5)
            , fontsize = fontsize
            )
 
plt.tight_layout()
plt.savefig("benchmark." + dirname + ".runtime.png")
 
 
 
ax = plt.figure(figsize = 1.25 * np.array([6.4,4.6]), dpi = 200)
ax = plt.subplot()
 
plt.plot( df[colnames[0]].values
        , np.ones(len(df[colnames[0]].values))
        , linestyle = "--"
       #, color = "black"
        , linewidth = 2
        )
for colname in colnames[2:]:
    plt.plot( df[colnames[0]].values
            , df[colname].values / df[colnames[1]].values
            , linewidth = 2
            )
 
plt.xticks(fontsize = fontsize)
plt.yticks(fontsize = fontsize)
ax.set_xlabel(colnames[0], fontsize = fontsize)
ax.set_ylabel("Runtime compared to {}".format(colnames[1]), fontsize = fontsize)
ax.set_title("Runtime Ratio Comparison. Lower means faster.\nLower than 1 means faster than {}.".format(colnames[1]), fontsize = fontsize)
ax.set_xscale("log")
ax.set_yscale("log")
plt.minorticks_on()
plt.grid(visible = True, which = "both", axis = "both", color = "0.85", linestyle = "-")
ax.tick_params(axis = "y", which = "minor")
ax.tick_params(axis = "x", which = "minor")
ax.legend   ( colnames[1:]
           #, bbox_to_anchor = (1, 0.5)
           #, loc = "center left"
            , fontsize = fontsize
            )
 
plt.tight_layout()
plt.savefig("benchmark." + dirname + ".runtime.ratio.png")

Visualization of the benchmark output ⛓

Benchmark moral ⛓

Among all summation algorithms, fablocked_type appears to offer the most accurate result while also being even faster than the default Fortran dot_product() and all other implemented summation algorithms for array sizes > 100.

Test:: test_pm_mathSum

Final Remarks ⛓

If you believe this algorithm or its documentation can be improved, we appreciate your contribution and help to edit this page's documentation and source file on GitHub.
For details on the naming abbreviations, see this page.
For details on the naming conventions, see this page.
This software is distributed under the MIT license with additional terms outlined below.

If you use any parts or concepts from this library to any extent, please acknowledge the usage by citing the relevant publications of the ParaMonte library.
If you regenerate any parts/ideas from this library in a programming environment other than those currently supported by this ParaMonte library (i.e., other than C, C++, Fortran, MATLAB, Python, R), please also ask the end users to cite this original ParaMonte library.

This software is available to the public under a highly permissive license.
Help us justify its continued development and maintenance by acknowledging its benefit to society, distributing it, and contributing to it.

Copyright: Computational Data Science Lab

Author:: Amir Shahmoradi, August 8, 2024, 10:23 PM, NASA Goddard Space Flight Center, Washington, D.C.

Definition at line 1247 of file pm_mathSum.F90.

The documentation for this interface was generated from the following file:

src/fortran/main/pm_mathSum.F90