https://www.cdslab.org/paramonte/fortran/2
Current view: top level - main - pm_parallelism.F90 (source / functions) Hit Total Coverage
Test: ParaMonte 2.0.0 :: Serial Fortran - Code Coverage Report Lines: 17 23 73.9 %
Date: 2024-04-08 03:18:57 Functions: 5 8 62.5 %
Legend: Lines: hit not hit

          Line data    Source code
       1             : !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
       2             : !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
       3             : !!!!                                                                                                                            !!!!
       4             : !!!!    ParaMonte: Parallel Monte Carlo and Machine Learning Library.                                                           !!!!
       5             : !!!!                                                                                                                            !!!!
       6             : !!!!    Copyright (C) 2012-present, The Computational Data Science Lab                                                          !!!!
       7             : !!!!                                                                                                                            !!!!
       8             : !!!!    This file is part of the ParaMonte library.                                                                             !!!!
       9             : !!!!                                                                                                                            !!!!
      10             : !!!!    LICENSE                                                                                                                 !!!!
      11             : !!!!                                                                                                                            !!!!
      12             : !!!!       https://github.com/cdslaborg/paramonte/blob/main/LICENSE.md                                                          !!!!
      13             : !!!!                                                                                                                            !!!!
      14             : !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      15             : !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      16             : 
      17             : !>  \brief
      18             : !>  This module contains procedures and generic interfaces for facilitating parallel computations or computing the performance of the parallel Coarray/MPI/OpenMP algorithms.
      19             : !>
      20             : !>  \details
      21             : !>  A primary goal for the design of this type and the associated procedures is to facilitate parallelism-agnostic coding within a codebase.<br>
      22             : !>  As such, the attributes and procedures within this module are guaranteed to also work in serial applications.<br>
      23             : !>  However, the procedures of this module typically do nothing when the application is serial or the parallelism is not recognized.<br>
      24             : !>
      25             : !>  \test
      26             : !>  [test_pm_parallelism](@ref test_pm_parallelism)
      27             : !>
      28             : !>  \finmain
      29             : !>
      30             : !>  \author
      31             : !>  \AmirShahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
      32             : 
      33             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      34             : 
      35             : module pm_parallelism
      36             : 
      37             :     use pm_kind, only: SK, IK, LK
      38             :     use pm_val2str, only: getStr
      39             : 
      40             :     implicit none
      41             : 
      42             :     character(*, SK), parameter :: MODULE_NAME = SK_"@pm_parallelism"
      43             : 
      44             : #if CAF_ENABLED || MPI_ENABLED
      45             :     !>  \brief
      46             :     !>  The scalar constant of type `character` of default kind \SK, whose value is set to `parallel`
      47             :     !>  if the ParaMonte library is built with the preprocessor macro `-DCAF_ENABLED=1` `-DMPI_ENABLED=1`,
      48             :     !>  otherwise, it is set to `"serial"`.<br>
      49             :     character(*, SK), parameter :: PARALLELIZATION_MODE = SK_"parallel"
      50             : #else
      51             :     character(*, SK), parameter :: PARALLELIZATION_MODE = SK_"serial"
      52             : #endif
      53             : 
      54             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      55             : 
      56             :     !>  \brief
      57             :     !>  This is the [imageis_type](@ref pm_parallelism::imageis_type) type for generating objects with components
      58             :     !>  of type `logical` of default kind \LK that contain information about the current image/processor/thread.<br>
      59             :     !>
      60             :     !>  \details
      61             :     !>  Objects of this type are not meant to be used directly by the end user.<br>
      62             :     !>  This type merely exists to create the `is` component of the [image_type](@ref pm_parallelism::image_type) class.<br>
      63             :     !>
      64             :     !>  \interface{imageis_type}
      65             :     !>  \code{.F90}
      66             :     !>
      67             :     !>      use pm_parallelism, only: imageis_type
      68             :     !>      type(imageis_type) :: imageis
      69             :     !>
      70             :     !>      imageis = imageis_type()
      71             :     !>
      72             :     !>  \endcode
      73             :     !>
      74             :     !>  \finmain{imageis_type}
      75             :     !>
      76             :     !>  \author
      77             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
      78             :     type                                :: imageis_type
      79             :         logical(LK)                     :: first    = .false._LK    !<  \public The scalar `logical` of default kind \LK indicating whether the current process is ID #1.
      80             :         logical(LK)                     :: extra    = .false._LK    !<  \public The scalar `logical` of default kind \LK indicating whether the current process is NOT ID #1.
      81             :         logical(LK)                     :: leader   = .false._LK    !<  \public The scalar `logical` of default kind \LK indicating whether the current process is a leader.<br>
      82             :                                                                     !!          The default value is `.true.` <b>if and only if</b> the corresponding image ID is `1`.<br>
      83             :                                                                     !!          **Otherwise, it must be set by the user after calling the type constructor depending on the parallelism type**.<br>
      84             :         logical(LK)                     :: rooter   = .false._LK    !<  \public The scalar `logical` of default kind \LK indicating whether the current process is a follower.<br>
      85             :                                                                     !!          The default value is `.true.` <b>if and only if</b> the corresponding image ID is **not** `1`.<br>
      86             :                                                                     !!          **Otherwise, it must be set by the user after calling the type constructor depending on the parallelism type**.<br>
      87             :     end type
      88             : 
      89             :     !>  \brief
      90             :     !>  This is the [image_type](@ref pm_parallelism::image_type) type for generating objects that contain
      91             :     !>  information about the current image/processor/thread and facilitate its synchronization with other processes,
      92             :     !>  or the global finalization of all inter-process parallel communications (e.g., as is done in MPI applications).
      93             :     !>
      94             :     !>  \details
      95             :     !>  <ol>
      96             :     !>      <li>    Within the Fortran terminology, an **image** (or **process**) is equivalent to a replication of the program having its own set of data objects.<br>
      97             :     !>      <li>    The number of images could be the same as or more than or less than the available number of physical processors.
      98             :     !>      <li>    A particular implementation may permit the number of images to be chosen at compile time, at link time, or at execution time.
      99             :     !>      <li>    Each image executes asynchronously, and the normal rules of Fortran apply within each image.
     100             :     !>      <li>    The execution sequence can differ from image to image as specified by the programmer who, with the help
     101             :     !>              of a unique image index, determines the actual path using normal Fortran control constructs and explicit synchronizations.
     102             :     !>  </ol>
     103             :     !>
     104             :     !>  \return
     105             :     !>  `image` :   The output scalar of type [image_type](@ref pm_parallelism::image_type).
     106             :     !>
     107             :     !>  \interface{image_type}
     108             :     !>  \code{.F90}
     109             :     !>
     110             :     !>      use pm_parallelism, only: image_type
     111             :     !>      type(image_type) :: image
     112             :     !>
     113             :     !>      image = image_type()
     114             :     !>
     115             :     !>  \endcode
     116             :     !>
     117             :     !>  \warning
     118             :     !>  Even when the application is MPI/OpenMP-parallel, the `id` assigned to the zeroth process is `1` in this type.<br>
     119             :     !>  This convention follows the rules of the Fortran Coarray parallel programming language.<br>
     120             :     !>
     121             :     !>  \warning
     122             :     !>  For **OpenMP-enabled** ParaMonte library builds, this routine return the value of the OpenMP-intrinsic `omp_get_num_threads()` routine,
     123             :     !>  that is, the number of threads in the team that is executing the parallel region to which the routine region binds.<br>
     124             :     !>  If this constructor is called from the sequential part of an OpenMP-enabled program, this routine returns `1`.<br>
     125             :     !>  To return a value larger than `1`, this routine must be called from within an OpenMP-enabled parallel region.<br>
     126             :     !>  The desired number of OpenMP threads can be set at runtime via [setImageCount()](@ref pm_parallelism::setImageCount),
     127             :     !>  or directly via the OpenMP intrinsic subroutine `omp_set_num_threads()`.<br>
     128             :     !>
     129             :     !>  \finmain{image_type}
     130             :     !>
     131             :     !>  \author
     132             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     133             :     type                                :: image_type
     134             :         integer(IK)                     :: count        = -huge(1_IK)       !<  \public The scalar `integer` of default kind \IK representing the total count of runtime parallel processes available within the current communication.
     135             :         integer(IK)                     :: id           = -huge(1_IK)       !<  \public The scalar `integer` of default kind \IK representing the ID of the runtime parallel process starting with `1`: `1`, `2`, `3`, ...
     136             :         type(imageis_type)              :: is           = imageis_type()    !<  \public The scalar of type [imageis_type](@ref pm_parallelism::imageis_type) containing `logical` components that signify the current image role.
     137             :         character(:, SK), allocatable   :: label                            !<  \public The `allocatable` scalar `character` of default kind \SK containing the ID of the current process in the format `@process(ID)`.
     138             :         contains
     139             :         procedure, nopass               :: sync => setImageSynced
     140             :         procedure, nopass               :: finalize => setImageFinalized
     141             :     end type
     142             : 
     143             :     !>  \cond excluded
     144             :     interface image_type
     145             :         module procedure :: constructImage
     146             :     end interface
     147             :     !>  \endcond excluded
     148             : 
     149             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     150             : 
     151             :     !>  \brief
     152             :     !>  Return the predicted parallel Fork-Join speedup scaling behavior for simulations whose image
     153             :     !>  contributions are stochastically accepted only from the first successful image starting from image ID `1`.
     154             :     !>
     155             :     !>  \details
     156             :     !>  This generic interface can be used to compute theoretical speedup gained by a
     157             :     !>  Fork-Join parallel simulation where the probability of contribution of an image to the simulation
     158             :     !>  is less than `1`, meaning that image contributions are stochastic at each step of the Fork-Join cycle.<br>
     159             :     !>  If the image contributions are inspected sequentially from the first image to last and only the first successful image contribution
     160             :     !>  is kept in the simulation, then it can be shown that the contribution of the images to the parallel Fork-Join simulation follows a [Cyclic Geometric Distribution](@ref pm_distGeomCyclic).<br>
     161             :     !>  See [Amir Shahmoradi, Fatemeh Bagheri (2020). ParaDRAM: A Cross-Language Toolbox for Parallel High-Performance Delayed-Rejection Adaptive Metropolis Markov Chain Monte Carlo Simulations.](https://www.cdslab.org/pubs/2020_Shahmoradi_I.pdf)
     162             :     !>  for theoretical details.<br>
     163             :     !>
     164             :     !>  \param[in]      conProb         :   The input scalar of,
     165             :     !>                                      <ol>
     166             :     !>                                          <li>    type `real` of kind \RKALL,
     167             :     !>                                      </ol>
     168             :     !>                                      containing the average probability of contribution of each image to the Fork-Join simulation.<br>
     169             :     !>                                      For example, `conProb` can be the effective acceptance rate in parallel Fork-Join Monte Carlo or MCMC sampling simulations
     170             :     !>                                      where each image is tasked with returning a single proposal state for subsequent evaluation and if is first to be accepted accepted,
     171             :     !>                                      for usage in the next simulation step.<br>
     172             :     !>  \param[in]      seqSecTime      :   The input scalar of the same type and kind as `conProb` containing the time (in seconds) **per image (or process or thread)** of the inherently-sequential sections of the entire Fork-Join simulation.<br>
     173             :     !>  \param[in]      parSecTime      :   The input scalar of the same type and kind as `conProb` containing the time (in seconds) **per image (or process or thread)** of the inherently-parallel sections of the entire Fork-Join simulation.<br>
     174             :     !>  \param[in]      comSecTime      :   The input scalar of the same type and kind as `conProb` containing the time (in seconds) **per image (or process or thread)** of the **pairwise** inter-process communication sections of the entire Fork-Join simulation.<br>
     175             :     !>                                      Assuming all rooter images communicate only with the main image and the entire communication takes `overhead` seconds, then `comSecTime = overhead / (nproc - 1)` with `nproc` representing the total number of images.<br>
     176             :     !>  \param[out]     scaling         :   The output `allocatable` vector of the same type and kind as the input `conProb` containing the predicted speedup scaling of
     177             :     !>                                      the stochastic Fork-Join simulation under a range of possible image (or thread or process) counts given in the output `numproc`.<br>
     178             :     !>                                      On output, the vector `scaling` will be resized until the maximum speedup and the corresponding image count is identified in the scaling.<br>
     179             :     !>                                      The `i`th element of `scaling` represents theoretically-predicted speedup if `numproc(i)` images (threads or processes) were used in the parallel Fork-Join simulation.<br>
     180             :     !>  \param[out]     numproc         :   The output `allocatable` vector of type `integer` of default kind \IK containing the
     181             :     !>                                      number of images for the speedup reported in the corresponding element of `scaling`.<br>
     182             :     !>                                      If the input argument `comSecTime` is positive, then `numproc` is simply a linear range of integers starting with `1` with a jump size of `1`.<br>
     183             :     !>                                      If the input argument `comSecTime` is positive, then `numproc` is simply a log(2)-linear range of integers starting with `1` with a jump size of `1`.<br>
     184             :     !>  \param[in]      scalingMaxVal   :   The output scalar of the same type and kind as the input `conProb` containing the maximum achievable speedup (corresponding to `maxval(scaling)`).<br>
     185             :     !>  \param[in]      scalingMaxLoc   :   The output scalar of type `integer` of default kind \IK containing the index of element of the output `scaling` containing `scalingMaxVal` (i.e., `maxloc(scaling)`).<br>
     186             :     !>  \param[out]     scalingMinLen   :   The input positive scalar of type `integer` of default kind \IK containing the minimum size of the output `scaling`.<br>
     187             :     !>                                      (**optional**, default = `int(2 / max(conProb, 0.001))`)
     188             :     !>
     189             :     !>  \interface{setForkJoinScaling}
     190             :     !>  \code{.F90}
     191             :     !>
     192             :     !>      use pm_parallelism, only: setForkJoinScaling
     193             :     !>
     194             :     !>      call setForkJoinScaling(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen = scalingMinLen)
     195             :     !>      !
     196             :     !>  \endcode
     197             :     !>
     198             :     !>  \warning
     199             :     !>  The condition `0 <= seqSecTime` must hold for the corresponding input arguments.<br>
     200             :     !>  The condition `0 <= parSecTime` must hold for the corresponding input arguments.<br>
     201             :     !>  The condition `0 <= comSecTime` must hold for the corresponding input arguments.<br>
     202             :     !>  The condition `0 <= scalingMinLen` must hold for the corresponding input arguments.<br>
     203             :     !>  The condition `0 <= conProb .and. conProb <= 1` must hold for the corresponding input arguments.<br>
     204             :     !>  \vericons
     205             :     !>
     206             :     !>  \see
     207             :     !>  [pm_sampling](@ref pm_sampling)<br>
     208             :     !>  [pm_distGeomCyclic](@ref pm_distGeomCyclic)<br>
     209             :     !>  [Amir Shahmoradi, Fatemeh Bagheri (2020). ParaDRAM: A Cross-Language Toolbox for Parallel High-Performance Delayed-Rejection Adaptive Metropolis Markov Chain Monte Carlo Simulations.](https://www.cdslab.org/pubs/2020_Shahmoradi_I.pdf)<br>
     210             :     !>
     211             :     !>  \example{setForkJoinScaling}
     212             :     !>  \include{lineno} example/pm_parallelism/setForkJoinScaling/main.F90
     213             :     !>  \compilef{setForkJoinScaling}
     214             :     !>  \output{setForkJoinScaling}
     215             :     !>  \include{lineno} example/pm_parallelism/setForkJoinScaling/main.out.F90
     216             :     !>  \postproc{setForkJoinScaling}
     217             :     !>  \include{lineno} example/pm_parallelism/setForkJoinScaling/main.py
     218             :     !>  \vis{setForkJoinScaling}
     219             :     !>  \image html pm_parallelism/setForkJoinScaling/setForkJoinScaling.png width=700
     220             :     !>
     221             :     !>  \test
     222             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     223             :     !>
     224             :     !>  \finmain{setForkJoinScaling}
     225             :     !>
     226             :     !>  \author
     227             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     228             : 
     229             :     interface setForkJoinScaling
     230             : 
     231             :     !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     232             : 
     233             : #if RK5_ENABLED
     234             :     PURE module subroutine setForkJoinScaling_RK5(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen)
     235             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     236             :         !DEC$ ATTRIBUTES DLLEXPORT :: setForkJoinScaling_RK5
     237             : #endif
     238             :         use pm_kind, only: RKC => RK5
     239             :         integer(IK)         , intent(out)                   :: scalingMaxLoc
     240             :         integer(IK)         , intent(in)    , optional      :: scalingMinLen
     241             :         integer(IK)         , intent(out)   , allocatable   :: numproc(:)
     242             :         real(RKC)           , intent(out)   , allocatable   :: scaling(:)
     243             :         real(RKC)           , intent(in)                    :: conProb, seqSecTime, parSecTime, comSecTime
     244             :         real(RKC)           , intent(out)                   :: scalingMaxVal
     245             :     end subroutine
     246             : #endif
     247             : 
     248             : #if RK4_ENABLED
     249             :     PURE module subroutine setForkJoinScaling_RK4(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen)
     250             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     251             :         !DEC$ ATTRIBUTES DLLEXPORT :: setForkJoinScaling_RK4
     252             : #endif
     253             :         use pm_kind, only: RKC => RK4
     254             :         integer(IK)         , intent(out)                   :: scalingMaxLoc
     255             :         integer(IK)         , intent(in)    , optional      :: scalingMinLen
     256             :         integer(IK)         , intent(out)   , allocatable   :: numproc(:)
     257             :         real(RKC)           , intent(out)   , allocatable   :: scaling(:)
     258             :         real(RKC)           , intent(in)                    :: conProb, seqSecTime, parSecTime, comSecTime
     259             :         real(RKC)           , intent(out)                   :: scalingMaxVal
     260             :     end subroutine
     261             : #endif
     262             : 
     263             : #if RK3_ENABLED
     264             :     PURE module subroutine setForkJoinScaling_RK3(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen)
     265             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     266             :         !DEC$ ATTRIBUTES DLLEXPORT :: setForkJoinScaling_RK3
     267             : #endif
     268             :         use pm_kind, only: RKC => RK3
     269             :         integer(IK)         , intent(out)                   :: scalingMaxLoc
     270             :         integer(IK)         , intent(in)    , optional      :: scalingMinLen
     271             :         integer(IK)         , intent(out)   , allocatable   :: numproc(:)
     272             :         real(RKC)           , intent(out)   , allocatable   :: scaling(:)
     273             :         real(RKC)           , intent(in)                    :: conProb, seqSecTime, parSecTime, comSecTime
     274             :         real(RKC)           , intent(out)                   :: scalingMaxVal
     275             :     end subroutine
     276             : #endif
     277             : 
     278             : #if RK2_ENABLED
     279             :     PURE module subroutine setForkJoinScaling_RK2(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen)
     280             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     281             :         !DEC$ ATTRIBUTES DLLEXPORT :: setForkJoinScaling_RK2
     282             : #endif
     283             :         use pm_kind, only: RKC => RK2
     284             :         integer(IK)         , intent(out)                   :: scalingMaxLoc
     285             :         integer(IK)         , intent(in)    , optional      :: scalingMinLen
     286             :         integer(IK)         , intent(out)   , allocatable   :: numproc(:)
     287             :         real(RKC)           , intent(out)   , allocatable   :: scaling(:)
     288             :         real(RKC)           , intent(in)                    :: conProb, seqSecTime, parSecTime, comSecTime
     289             :         real(RKC)           , intent(out)                   :: scalingMaxVal
     290             :     end subroutine
     291             : #endif
     292             : 
     293             : #if RK1_ENABLED
     294             :     PURE module subroutine setForkJoinScaling_RK1(conProb, seqSecTime, parSecTime, comSecTime, scaling, numproc, scalingMaxVal, scalingMaxLoc, scalingMinLen)
     295             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     296             :         !DEC$ ATTRIBUTES DLLEXPORT :: setForkJoinScaling_RK1
     297             : #endif
     298             :         use pm_kind, only: RKC => RK1
     299             :         integer(IK)         , intent(out)                   :: scalingMaxLoc
     300             :         integer(IK)         , intent(in)    , optional      :: scalingMinLen
     301             :         integer(IK)         , intent(out)   , allocatable   :: numproc(:)
     302             :         real(RKC)           , intent(out)   , allocatable   :: scaling(:)
     303             :         real(RKC)           , intent(in)                    :: conProb, seqSecTime, parSecTime, comSecTime
     304             :         real(RKC)           , intent(out)                   :: scalingMaxVal
     305             :     end subroutine
     306             : #endif
     307             : 
     308             :     !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     309             : 
     310             :     end interface
     311             : 
     312             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     313             : 
     314             : !>  \cond excluded
     315             : #if OMP_ENABLED
     316             :     logical(LK) , save :: mv_failed = .false._LK
     317             : #endif
     318             : !>  \endcond excluded
     319             : 
     320             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     321             : 
     322             : contains
     323             : 
     324             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     325             : 
     326             :     !>  \brief
     327             :     !>  Generate and return an object of class [image_type](@ref pm_parallelism::image_type)
     328             :     !>  containing information and statistics of the parallel images/processes/threads available, depending on the type of parallelism requested.<br>
     329             :     !>
     330             :     !>  \details
     331             :     !>  This procedure is the constructor of the type [image_type](@ref pm_parallelism::image_type).<br>
     332             :     !>  See the documentation of [image_type](@ref pm_parallelism::image_type) for further details and example usage.<br>
     333             :     !>
     334             :     !>  \return
     335             :     !>  `image`             :   The output scalar of type [image_type](@ref pm_parallelism::image_type).
     336             :     !>
     337             :     !>  \interface{constructImage}
     338             :     !>  \code{.F90}
     339             :     !>
     340             :     !>      use pm_parallelism, only: image_type
     341             :     !>      type(image_type) :: image
     342             :     !>
     343             :     !>      image = image_type()
     344             :     !>
     345             :     !>  \endcode
     346             :     !>
     347             :     !>  \warning
     348             :     !>  Even when the application is MPI/OpenMP-parallel, the `id` assigned to the zeroth process is `1` in this type.<br>
     349             :     !>  This convention follows the rules of the Fortran Coarray parallel programming language.<br>
     350             :     !>
     351             :     !>  \warning
     352             :     !>  For **OpenMP-enabled** ParaMonte library builds, this routine return the value of the OpenMP-intrinsic `omp_get_num_threads()` routine,
     353             :     !>  that is, the number of threads in the team that is executing the parallel region to which the routine region binds.<br>
     354             :     !>  If this constructor is called from the sequential part of an OpenMP-enabled program, this routine returns `1`.<br>
     355             :     !>  To return a value larger than `1`, this routine must be called from within an OpenMP-enabled parallel region.<br>
     356             :     !>  The desired number of OpenMP threads can be set at runtime via [setImageCount()](@ref pm_parallelism::setImageCount),
     357             :     !>  or directly via the OpenMP intrinsic subroutine `omp_set_num_threads()`.<br>
     358             :     !>
     359             :     !>  \finmain{constructImage}
     360             :     !>
     361             :     !>  \author
     362             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     363          60 :     function constructImage() result(image)
     364             :         type(image_type) :: image
     365             :         !integer(IK), intent(in), optional :: nthread
     366             :         ! setup general processor / coarray image variables
     367          60 :         image%id            = getImageID()
     368          60 :         image%count         = getImageCount()
     369          60 :         image%label         = SK_"@process("//getStr(image%id)//SK_")"
     370          60 :         image%is%first       = image%id == 1_IK
     371          60 :         image%is%extra    = image%id /= 1_IK
     372             :        !image%is%leader      = .false._LK ! ATTN: this is to be set by the user at runtime, depending on the parallelism type.
     373             :        !image%is%rooter      = .false._LK ! ATTN: this is to be set by the user at runtime, depending on the parallelism type.
     374          60 :     end function
     375             : 
     376             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     377             : 
     378             :     !>  \brief
     379             :     !>  Synchronize all existing parallel images and return nothing.
     380             :     !>
     381             :     !>  \details
     382             :     !>  This is a static member of the [image_type](@ref pm_parallelism::image_type) class.<br>
     383             :     !>  This procedure synchronizes all images in the current communication world by,
     384             :     !>  <ol>
     385             :     !>      <li>    calling `sync all` in Coarray applications.
     386             :     !>      <li>    calling `mpi_barrier()` in MPI applications.
     387             :     !>      <li>    calling nothing in all other (including serial) applications.
     388             :     !>  </ol>
     389             :     !>
     390             :     !>  \warning
     391             :     !>  This routine contains global Coarray and MPI synchronization barriers and therefore, must be called by all processes in the current simulation.
     392             :     !>
     393             :     !>  \warning
     394             :     !>  The MPI library must be initialized and not finalized prior to calling this routine for MPI-parallel applications.
     395             :     !>
     396             :     !>  \interface{setImageSynced}
     397             :     !>  \code{.F90}
     398             :     !>
     399             :     !>      use pm_parallelism, only: setImageSynced()
     400             :     !>
     401             :     !>      call setImageSynced()
     402             :     !>
     403             :     !>  \endcode
     404             :     !>
     405             :     !>  \test
     406             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     407             :     !>
     408             :     !>  \finmain{setImageSynced}
     409             :     !>
     410             :     !>  \author
     411             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     412         174 :     subroutine setImageSynced()
     413             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     414             :         !DEC$ ATTRIBUTES DLLEXPORT :: setImageSynced
     415             : #endif
     416             : #if     CAF_ENABLED
     417             :         sync all
     418             : #elif   MPI_ENABLED
     419             :         block
     420             :             use mpi !, only: mpi_barrier, mpi_comm_world
     421             :             integer :: ierrMPI
     422             :             call mpi_barrier(mpi_comm_world, ierrMPI)
     423             :         end block
     424             : #elif   OMP_ENABLED
     425             :         !$omp barrier
     426             : #endif
     427         174 :     end subroutine
     428             : 
     429             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     430             : 
     431             :     !>  \brief
     432             :     !>  Finalize the current parallel simulation and return nothing.<br>
     433             :     !>
     434             :     !>  \details
     435             :     !>  This is a static member of the [image_type](@ref pm_parallelism::image_type) class.<br>
     436             :     !>  <ol>
     437             :     !>      <li>    This procedure facilitates a parallelism-agnostic way of finalizing serial and parallel applications by hiding the finalization step within this procedure.<br>
     438             :     !>      <li>    This procedure is primarily relevant to MPI applications where MPI needs to be finalized before the simulation stop.<br>
     439             :     !>      <li>    This procedure performs a `sync all` for Coarray applications and returns.<br>
     440             :     !>      <li>    This procedure performs nothing for other all applications (e.g., serial, OpenMP, ...).<br>
     441             :     !>  </ol>
     442             :     !>
     443             :     !>  \interface{setImageFinalized}
     444             :     !>  \code{.F90}
     445             :     !>
     446             :     !>      use pm_parallelism, only: setImageFinalized
     447             :     !>
     448             :     !>      call setImageFinalized()
     449             :     !>      !
     450             :     !>  \endcode
     451             :     !>
     452             :     !>  \warning
     453             :     !>  The MPI library must be initialized and not finalized prior to calling this routine for MPI-parallel applications.
     454             :     !>
     455             :     !>  \warning
     456             :     !>  The MPI communications will be shut down upon calling this routine and further inter-process communications will be impossible.
     457             :     !>
     458             :     !>  \test
     459             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     460             :     !>
     461             :     !>  \finmain{setImageFinalized}
     462             :     !>
     463             :     !>  \author
     464             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     465             :     subroutine setImageFinalized() ! LCOV_EXCL_LINE
     466             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     467             :         !DEC$ ATTRIBUTES DLLEXPORT :: setImageFinalized
     468             : #endif
     469             : #if     CAF_ENABLED
     470             :         sync all
     471             : #elif   MPI_ENABLED
     472             :         use mpi !mpi_f08, only: mpi_comm_world, mpi_finalized, mpi_finalize, mpi_barrier
     473             :         implicit none
     474             :         integer :: ierrMPI
     475             :         logical :: isFinalized
     476             :         call mpi_finalized(isFinalized, ierrMPI)
     477             :         if (.not. isFinalized) then
     478             :             call mpi_barrier(mpi_comm_world, ierrMPI)
     479             :             call mpi_finalize(ierrMPI)
     480             :         end if
     481             : #elif   OMP_ENABLED
     482             :         !$omp barrier
     483             : #endif
     484          46 :     end subroutine
     485             : 
     486             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     487             : 
     488             :     !>  \brief
     489             :     !>  Generate and return the ID of the current Coarray image / MPI process / OpenMP thread, all starting with `1`.<br>
     490             :     !>
     491             :     !>  \return
     492             :     !>  `imageID`   :   The output scalar `integer` of default kind \IK representing the ID of the current image/process/thread.<br>
     493             :     !>                  The output `imageID` is set to,
     494             :     !>                  <ol>
     495             :     !>                      <li>    The value returned by `this_image()` in Coarray applications,
     496             :     !>                      <li>    The value returned by `mpi_comm_rank() + 1` in MPI applications,
     497             :     !>                      <li>    The value returned by `omp_get_thread_num() + 1` in OpenMP applications,
     498             :     !>                      <li>    The value `1` in serial applications,
     499             :     !>                  </ol>
     500             :     !>
     501             :     !>  \interface{getImageID}
     502             :     !>  \code{.F90}
     503             :     !>
     504             :     !>      use pm_parallelism, only: getImageID
     505             :     !>      use pm_kind, only: IK
     506             :     !>      integer(IK) :: id
     507             :     !>
     508             :     !>      id = getImageID() ! The ID (starting at 1) of the current Coarray image, MPI process, or OpenMP thread.
     509             :     !>      !
     510             :     !>  \endcode
     511             :     !>
     512             :     !>  \warning
     513             :     !>  The MPI library must be initialized and not finalized prior to calling this routine for MPI-parallel applications.<br>
     514             :     !>  Otherwise, the MPI library will automatically be initialized, which may fail if it has been already finalized.<br>
     515             :     !>
     516             :     !>  \test
     517             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     518             :     !>
     519             :     !>  \finmain{getImageID}
     520             :     !>
     521             :     !>  \author
     522             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     523      387364 :     function getImageID() result(imageID)
     524             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     525             :         !DEC$ ATTRIBUTES DLLEXPORT :: getImageID
     526             : #endif
     527             :         integer(IK) :: imageID
     528             : #if     CAF_ENABLED
     529             :         imageID = this_image()
     530             : #elif   MPI_ENABLED
     531             :         block
     532             :             use mpi !mpi_f08, only : mpi_initialized, mpi_comm_world, mpi_comm_size, mpi_init
     533             :             integer :: rank, ierrMPI
     534             :             logical :: isinit, isfinit
     535             :             call mpi_initialized(isinit, ierrMPI)
     536             :             if (.not. isinit) then
     537             :                 call mpi_finalized(isfinit, ierrMPI)
     538             :                 if (isfinit) error stop MODULE_NAME//"@getImageID(): Error occurred. A finalized MPI library cannot be reinitialized."
     539             :                 call mpi_init(ierrMPI)
     540             :             end if
     541             :             call mpi_comm_rank(mpi_comm_world, rank, ierrMPI)
     542             :             if (ierrMPI /= 0) error stop "Failed to fetch the MPI process counts."
     543             :             imageID = int(rank, IK) + 1_IK
     544             :         end block
     545             : #elif   OMP_ENABLED
     546             :         block
     547             :             use omp_lib, only: omp_get_thread_num
     548             :             imageID = int(omp_get_thread_num() + 1, IK)
     549             :         end block
     550             : #else
     551             :         imageID = 1_IK
     552             : #endif
     553      387364 :     end function
     554             : 
     555             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     556             : 
     557             :     !>  \brief
     558             :     !>  Generate and return the number of available processes in the current parallel world communication.
     559             :     !>
     560             :     !>  \return
     561             :     !>  `imageCount`    :   The output scalar `integer` of default kind \IK representing the number of available processes in the current world communication.<br>
     562             :     !>                      The output `imageCount` is set to,<br>
     563             :     !>                      <ol>
     564             :     !>                          <li>    the value returned by the Fortran intrinsic `num_images()` in Coarray parallel applications,
     565             :     !>                          <li>    the value returned by `mpi_comm_size()` subroutine in MPI parallel applications,
     566             :     !>                          <li>    the value returned by `omp_get_num_threads()` in OpenMP parallel applications,
     567             :     !>                          <li>    the value `1` in serial applications,
     568             :     !>                      </ol>
     569             :     !>
     570             :     !>  \warning
     571             :     !>  For **OpenMP-enabled** ParaMonte library builds, this routine return the value of the OpenMP-intrinsic `omp_get_num_threads()` routine,
     572             :     !>  that is, the number of threads in the team that is executing the parallel region to which the routine region binds.<br>
     573             :     !>  If this generic interface is called from the sequential part of an OpenMP-enabled program, this routine returns `1`.<br>
     574             :     !>  To return a value larger than `1`, this routine must be called from within an OpenMP-enabled parallel region.<br>
     575             :     !>  The desired number of OpenMP threads can be set at runtime via [setImageCount()](@ref pm_parallelism::setImageCount),
     576             :     !>  or directly via the OpenMP intrinsic subroutine `omp_set_num_threads()`.<br>
     577             :     !>
     578             :     !>  \interface{getImageCount}
     579             :     !>  \code{.F90}
     580             :     !>
     581             :     !>      use pm_kind, only: IK
     582             :     !>      use pm_parallelism, only: getImageCount
     583             :     !>      integer(IK) :: imageCount
     584             :     !>
     585             :     !>      imageCount = getImageCount()
     586             :     !>
     587             :     !>  \endcode
     588             :     !>
     589             :     !>  \warning
     590             :     !>  See the warnings associated with [getImageCountMPI](@ref pm_parallelism::getImageCountMPI).<br>
     591             :     !>
     592             :     !>  \test
     593             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     594             :     !>
     595             :     !>  \finmain{getImageCount}
     596             :     !>
     597             :     !>  \author
     598             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     599          60 :     function getImageCount() result(imageCount)
     600             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     601             :         !DEC$ ATTRIBUTES DLLEXPORT :: getImageCount
     602             : #endif
     603             :         integer(IK) :: imageCount
     604             : #if     CAF_ENABLED
     605             :         imageCount = num_images()
     606             : #elif   MPI_ENABLED
     607             :         imageCount = getImageCountMPI()
     608             : #elif   OMP_ENABLED
     609             :         imageCount = getImageCountOMP()
     610             : #else
     611             :         imageCount = 1_IK
     612             : #endif
     613          60 :     end function
     614             : 
     615             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     616             : 
     617             :     !>  \brief
     618             :     !>  Generate and return the number of available processes in the current **MPI-parallel** world communication.
     619             :     !>
     620             :     !>  \return
     621             :     !>  `imageCount`    :   The output scalar `integer` of default kind \IK representing the number of
     622             :     !>                      available processes in the current **MPI-parallel** world communication.<br>
     623             :     !>                      The output `imageCount` is set to,<br>
     624             :     !>                      <ol>
     625             :     !>                          <li>    the value returned by `mpi_comm_size()` subroutine in MPI parallel applications,
     626             :     !>                          <li>    the value `0` if the ParaMonte library build is not for MPI applications,
     627             :     !>                      </ol>
     628             :     !>
     629             :     !>  \interface{getImageCountMPI}
     630             :     !>  \code{.F90}
     631             :     !>
     632             :     !>      use pm_kind, only: IK
     633             :     !>      use pm_parallelism, only: getImageCountMPI
     634             :     !>      integer(IK) :: imageCount
     635             :     !>
     636             :     !>      imageCount = getImageCountMPI()
     637             :     !>
     638             :     !>  \endcode
     639             :     !>
     640             :     !>  \warning
     641             :     !>  The MPI library must be initialized and not finalized prior to calling this routine for MPI-parallel applications.<br>
     642             :     !>  Otherwise, the MPI library will automatically be initialized, which may fail if it has been already finalized.<br>
     643             :     !>
     644             :     !>  \note
     645             :     !>  Th C-binding of this routine is required for automatic detection of the use of `mpiexec` launcher in dynamic programming languages.
     646             :     !>
     647             :     !>  \test
     648             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     649             :     !>
     650             :     !>  \finmain{getImageCountMPI}
     651             :     !>
     652             :     !>  \author
     653             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     654           0 :     function getImageCountMPI() result(imageCount) bind(C, name = "getImageCountMPI")
     655             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     656             :         !DEC$ ATTRIBUTES DLLEXPORT :: getImageCountMPI
     657             : #endif
     658             : #if     MPI_ENABLED
     659             :         use mpi !mpi_f08, only : mpi_initialized, mpi_comm_world, mpi_comm_size, mpi_init
     660             :         integer :: nproc
     661             :         integer :: ierrMPI
     662             :         logical :: isinit
     663             :         integer(IK) :: imageCount
     664             :         imageCount = 0_IK
     665             :         call mpi_initialized(isinit, ierrMPI)
     666             :         if (ierrMPI /= 0) return ! LCOV_EXCL_LINE
     667             :         if (.not. isinit) then
     668             :             call mpi_init(ierrMPI) ! LCOV_EXCL_LINE
     669             :             if (ierrMPI /= 0) return ! LCOV_EXCL_LINE
     670             :         end if
     671             :         call mpi_comm_size(mpi_comm_world, nproc, ierrMPI)
     672             :         if (ierrMPI /= 0) return ! LCOV_EXCL_LINE
     673             :         imageCount = int(nproc, IK)
     674             : #else
     675             :         integer(IK) :: imageCount
     676             :         imageCount = 0_IK
     677             : #endif
     678           0 :     end function
     679             : 
     680             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     681             : 
     682             :     !>  \brief
     683             :     !>  Generate and return the number of available processes in the current **OpenMP-parallel** world communication.
     684             :     !>
     685             :     !>  \return
     686             :     !>  `imageCount`    :   The output scalar `integer` of default kind \IK representing the number of
     687             :     !>                      available processes in the current **OpenMP-parallel** world communication.<br>
     688             :     !>                      The output `imageCount` is set to,<br>
     689             :     !>                      <ol>
     690             :     !>                          <li>    the value returned by `mpi_comm_size()` subroutine in OpenMP parallel applications,
     691             :     !>                          <li>    the value `0` if the ParaMonte library build is not for OpenMP applications,
     692             :     !>                      </ol>
     693             :     !>
     694             :     !>  \interface{getImageCountOMP}
     695             :     !>  \code{.F90}
     696             :     !>
     697             :     !>      use pm_kind, only: IK
     698             :     !>      use pm_parallelism, only: getImageCountOMP
     699             :     !>      integer(IK) :: imageCount
     700             :     !>
     701             :     !>      imageCount = getImageCountOMP()
     702             :     !>
     703             :     !>  \endcode
     704             :     !>
     705             :     !>  \test
     706             :     !>  [test_pm_parallelism](@ref test_pm_parallelism)
     707             :     !>
     708             :     !>  \finmain{getImageCountOMP}
     709             :     !>
     710             :     !>  \author
     711             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 4:13 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     712           0 :     function getImageCountOMP() result(imageCount) bind(C, name = "getImageCountOMP")
     713             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     714             :         !DEC$ ATTRIBUTES DLLEXPORT :: getImageCountOMP
     715             : #endif
     716             : #if     OMP_ENABLED
     717             :         use omp_lib, only: omp_get_num_threads
     718             :         integer(IK) :: imageCount
     719             :         imageCount = omp_get_num_threads()
     720             : #else
     721             :         integer(IK) :: imageCount
     722             :         imageCount = 0_IK
     723             : #endif
     724           0 :     end function
     725             : 
     726             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     727             : 
     728             :     !>  \brief
     729             :     !>  Set the number the parallel **threads** for an OpenMP-enabled application.
     730             :     !>
     731             :     !>  \details
     732             :     !>  This generic interface is exclusively relevant to the OpenMP-enabled ParaMonte library builds and applications.<br>
     733             :     !>  This interface offers a convenient consistent cross-platform method of setting the number of runtime OpenMP threads for a parallel code section.<br>
     734             :     !>  It does so by calling the OpenMP intrinsic
     735             :     !>
     736             :     !>  \param[in]  count : The input scalar of type `integer` of default kind \IK containing the requested number of OpenMP threads.<br>
     737             :     !>                      This argument is relevant only if the OpenMP parallelism is enabled at the time of building the ParaMonte library.<br>
     738             :     !>                      Unlike Coarray or MPI parallelism paradigms, the number of images (threads) can be (re)set at runtime in OpenMP applications, hence the need for this argument.<br>
     739             :     !>                      If the input `count` is **non-positive**, this generic interface uses the output of OpenMP intrinsic `omp_get_num_procs()` as the number of threads.<br>
     740             :     !>                      This is unlike the default behavior of Intel and GNU OpenMP libraries which set the number of threads to `max(count, 1)`.<br>
     741             :     !>                      (**optional**, default = `omp_get_num_procs()`)
     742             :     !>
     743             :     !>  \devnote
     744             :     !>  Although this procedure exists for all parallelism builds of the ParaMonte library, it does nothing for non OpenMP-enabled library builds.<br>
     745             :     !>  Note that the number of images/processes for Coarray/MPI -enabled applications must be set by the user at the time of running the parallel application.<br>
     746             :     !>  This is unlike the OpenMP parallel applications where the number of threads can be (re)set at runtime during the program execution.<br>
     747             :     !>
     748             :     !>  \interface{setImageCount}
     749             :     !>  \code{.F90}
     750             :     !>
     751             :     !>      use pm_parallelism, only: setImageCount
     752             :     !>
     753             :     !>      call setImageCount(count = count)
     754             :     !>
     755             :     !>  \endcode
     756             :     !>
     757             :     !>  \finmain{setImageCount}
     758             :     !>
     759             :     !>  \author
     760             :     !>  \AmirShahmoradi, Tuesday March 7, 2017, 3:50 AM, Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin
     761           0 :     subroutine setImageCount(count)
     762             : #if __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     763             :         !DEC$ ATTRIBUTES DLLEXPORT :: setImageCount
     764             : #endif
     765             :         integer(IK), intent(in), optional :: count
     766             : #if     OMP_ENABLED
     767             :         block
     768             :             use omp_lib, only: omp_get_num_procs, omp_set_num_threads
     769             :             integer :: count_def
     770             :             count_def = omp_get_num_procs()
     771             :             if (present(count)) then
     772             :                 if (0_IK < count) count_def = int(count)
     773             :             end if
     774             :             call omp_set_num_threads(count_def)
     775             :         end block
     776             : #endif
     777           0 :     end subroutine
     778             : 
     779             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     780             : 
     781             :         !>  \brief
     782             :         !>  Broadcast the error condition from all images/processes to all images/processes.<br>
     783             :         !>
     784             :         !>  \brief
     785             :         !>  This procedure exists primarily to aid the detection and graceful
     786             :         !>  handling of runtime errors during the testing of ParaMonte library samplers
     787             :         !>  or less frequently, when calling the samplers from dynamic programming languages or interactive
     788             :         !>  environments such as Jupyter which require graceful handling of any errors to avoid environment shutdown.<br>
     789             :         !>  The use of this procedure is essential to avoid runtime parallel deadlocks.<br>
     790             :         !>
     791             :         !>  \param[in]  failed  :   The input scalar `logical` of default kind \LK that should be `.true.` if and only if a fatal error has occurred.<br>
     792             :         !>                          The value of `failed` is typically the `occurred` component of an object of type [err_type](@ref pm_err::err_type).<br>
     793             :         !>                          On input, the specified value for `failed` is broadcast to all other active processes in the program.<br>
     794             :         !>
     795             :         !>  \return
     796             :         !>  `failedParallelism` :   The output scalar of type `logical` of default kind \LK that is `.true.` if and only if
     797             :         !>                          the input argument `failed` on **any** image/process in the current parallel simulation is `.true.`.<br>
     798             :         !>
     799             :         !>  \interface{isFailedImage}
     800             :         !>  \code{.F90}
     801             :         !>
     802             :         !>      use pm_kind, only: LK
     803             :         !>      use pm_parallelism, only: isFailedImage
     804             :         !>      logical(LK) :: failedParallelism
     805             :         !>      logical(LK) :: failed
     806             :         !>
     807             :         !>      failedParallelism = isFailedImage(failed)
     808             :         !>
     809             :         !>  \endcode
     810             :         !>
     811             :         !>  \warning
     812             :         !>  This subroutine must be called in parallel by **all** images or **none**.<br>
     813             :         !>
     814             :         !>  \warning
     815             :         !>  This function primarily exists for soft handling of fatal errors in parallel
     816             :         !>  testing mode and should therefore ideally not be used in production builds.<br>
     817             :         !>  If it is used repeatedly within an application, it can **significantly degrade** the parallel performance.<br>
     818             :         !>
     819             :         !>  \impure
     820             :         !>
     821             :         !>  \test
     822             :         !>  [test_pm_parallelism](@ref test_pm_parallelism)
     823             :         !>
     824             :         !>  \finmain{isFailedImage}
     825             :         !>
     826             :         !>  \author
     827             :         !>  \AmirShahmoradi, 9:49 PM Friday, March 1, 2013, Institute for Fusion Studies, The University of Texas at Austin
     828           3 :         function isFailedImage(failed) result(failedParallelism)
     829             : #if     __INTEL_COMPILER && DLL_ENABLED && (_WIN32 || _WIN64)
     830             :             !DEC$ ATTRIBUTES DLLEXPORT :: isFailedImage
     831             : #endif
     832             :             logical(LK), intent(in) :: failed
     833             :             logical(LK) :: failedParallelism
     834             : #if         CAF_ENABLED
     835             :             logical(LK), allocatable, save :: failure(:)[:]
     836             :             integer :: iid, imageCount
     837             :             imageCount = num_images()
     838             :             allocate(failure(imageCount)[*])
     839             :             failure(this_image()) = failed
     840             :             sync all
     841             :             do iid = 1, imageCount
     842             :                 failedParallelism = failure(iid)[iid]
     843             :                 if (failedParallelism) return
     844             :             end do
     845             : #elif       MPI_ENABLED
     846             :             logical, allocatable :: failure(:) ! This must be default kind.
     847             :             integer :: imageCount!, imageID ! This must be default kind.
     848             :             block
     849             :                 use mpi !mpi_f08, only: mpi_comm_world, mpi_comm_rank, mpi_comm_size, mpi_allgather, mpi_logical
     850             :                 use pm_arrayResize, only: setResized
     851             :                 integer :: ierrMPI
     852             :                !call mpi_comm_rank(mpi_comm_world, imageID, ierrMPI)
     853             :                 call mpi_comm_size(mpi_comm_world, imageCount, ierrMPI)
     854             :                 call setResized(failure, int(imageCount, IK))
     855             :                 call mpi_allgather  ( logical(failed) & ! LCOV_EXCL_LINE : send buffer
     856             :                                     , 1 & ! LCOV_EXCL_LINE : send count
     857             :                                     , mpi_logical & ! LCOV_EXCL_LINE : send datatype
     858             :                                     , failure(:) & ! LCOV_EXCL_LINE : receive buffer
     859             :                                     , 1 & ! LCOV_EXCL_LINE : receive count
     860             :                                     , mpi_logical & ! LCOV_EXCL_LINE : receive datatype
     861             :                                     , mpi_comm_world & ! LCOV_EXCL_LINE : comm
     862             :                                     , ierrMPI & ! LCOV_EXCL_LINE : error code
     863             :                                     )
     864             :                 !call mpi_alltoall   ( err%occurred &    ! buffer_send   : The buffer containing the data that will be scattered to other processes.<br>
     865             :                 !                    , 1 &               ! count_send    : The number of elements that will be sent to each process.<br>
     866             :                 !                    , mpi_logical &     ! datatype_send : The type of one send buffer element.<br>
     867             :                 !                    , failure &   ! buffer_recv   : The buffer in which store the gathered data.<br>
     868             :                 !                    , imageCount &      ! count_recv    : The number of elements in the message to receive per process, not the total number of elements to receive from all processes altogether.<br>
     869             :                 !                    , mpi_logical &     ! datatype_recv : The type of one receive buffer element.<br>
     870             :                 !                    , mpi_comm_world &  ! communicator  : The communicator in which the all to all takes place.<br>
     871             :                 !                    , ierrMPI & ! LCOV_EXCL_LINE : error code
     872             :                 !                    )
     873             :                 failedParallelism = logical(any(failure), LK)
     874             :             end block
     875             : #elif       OMP_ENABLED
     876             :             !$omp critical
     877             :             mv_failed = mv_failed .or. failed
     878             :             !$omp end critical
     879             :             !$omp barrier
     880             :             failedParallelism = mv_failed
     881             :             !$omp master
     882             :             mv_failed = .false._LK
     883             :             !$omp end master
     884             : #else
     885           3 :             failedParallelism = failed
     886             : #endif
     887           3 :         end function
     888             : 
     889             : !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     890             : 
     891             : end module pm_parallelism ! LCOV_EXCL_LINE

ParaMonte: Parallel Monte Carlo and Machine Learning Library 
The Computational Data Science Lab
© Copyright 2012 - 2024