The mathematical objective function
Suppose we want to sample random vectors
\[\mathbf{x}=[x_1,x_2,x_3,x_4] ~~,\]from a 4-dimensional Multivariate Normal (MVN) Probability Density Function (PDF),
\[\mathrm{PDF}_\mathrm{MVN} \equiv \pi ( \mathbf{x} ~|~ \mu, \Sigma) = (2\pi)^{-\frac{k}{2}} \sqrt{|\Sigma|} \exp\bigg( -\frac{1}{2} (\mathbf{x}-\mu)^\mathrm{T}\Sigma^{-1}(\mathbf{x}-\mu) \bigg) ~~,\]that has a mean vector,
\[\mu = [0,0,0,0]\]with a covariance matrix,
\[\Sigma = \begin{bmatrix} 1.0 & 0.5 & 0.5 & 0.5 \\ 0.5 & 1.0 & 0.5 & 0.5 \\ 0.5 & 0.5 & 1.0 & 0.5 \\ 0.5 & 0.5 & 0.5 & 1.0 \end{bmatrix}\]This will serve as our mathematical objective function. Clearly, all of the four dimensions of this MVN are correlated with each other with a Pearson’s correlation coefficient of $0.5$.
Working with the natural logarithm of the objective function
In general, since probabilities are often tiny numbers, we’d want to always work with their natural logarithms in the world of computers to avoid unwanted overflow or underflow of numbers, which can crash our simulations. To highlight the importance of taking the natural logarithms of all input objective functions to the ParaMonte routines, we always name them like getLogFunc()
.
Implementing the objective function in C
Here is a minimalistic implementation of the 4-D multivariate normal distribution objective function in C, which can also be downloaded as logfunc.c.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Description:
// + Return the natural logarithm of an ndim-dimensional Multivariate Normal (MVN)
// probability density function (PDF) with the Mean and Covariance Matrix as defined below.
// Reference: https://en.wikipedia.org/wiki/Multivariate_normal_distribution
// Input:
// + ndim: The number of dimensions of the domain of the objective function.
// + point: The input 64-bit real-valued vector of length ndim,
// at which the natural logarithm of objective function is computed.
// Output:
// + logFunc: A 64-bit real scalar number representing the natural logarithm of the objective function.
// Author:
// + Computational Data Science Lab, Monday 9:03 AM, May 16 2016, ICES, UT Austin
// Visit:
// + https://www.cdslab.org/paramonte
//
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include "logfunc.h"
double getLogFunc (
int32_t ndim,
double Point[]
)
{
// multivariate normal (MVN) distribution specifications:
const double LOG_INVERSE_SQRT_TWO_PI = log(0.398942280401432); // log(1/sqrt(2*Pi))
const double MEAN[NDIM] = {0., 0., 0., 0.}; // mean vector of the MVN
const double COVMAT[NDIM][NDIM] = {
{1.0,0.5,0.5,0.5},
{0.5,1.0,0.5,0.5},
{0.5,0.5,1.0,0.5},
{0.5,0.5,0.5,1.0}
}; // covariance matrix of the MVN
const double INVCOVMAT[NDIM][NDIM] = {
{+1.6, -0.4, -0.4, -0.4},
{-0.4, +1.6, -0.4, -0.4},
{-0.4, -0.4, +1.6, -0.4},
{-0.4, -0.4, -0.4, +1.6}
}; // inverse covariance matrix of the MVN
const double LOG_SQRT_DET_INV_COV = 0.581575404902840; // logarithm of square root of the determinant of the inverse covariance matrix
// subtract mean vector from the input point
int i;
double NormedPoint[NDIM];
for(i = 0; i < NDIM; i++){
NormedPoint[i] = Point[i] - MEAN[i];
}
// compute the log probability density function of the MVN
double exponentTerm = 0.;
for(i = 0; i < NDIM; i++){
double MatMulResult[NDIM];
MatMulResult[i] = 0.;
int j;
for(j = 0; j < NDIM; j++){
MatMulResult[i] += INVCOVMAT[i][j] * NormedPoint[j];
}
exponentTerm += NormedPoint[i] * MatMulResult[i];
}
double logFunc = NDIM * LOG_INVERSE_SQRT_TWO_PI + LOG_SQRT_DET_INV_COV - 0.5*exponentTerm;
return logFunc;
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Here is the contents of the corresponding C header file, which can also be downloaded as logfunc.h.
#ifndef PM_LOG_FUNC
#define PM_LOG_FUNC
#include <math.h>
#include <stdint.h>
#define NDIM 4 // The number of dimensions of the domain of the objective function.
double getLogFunc (
int32_t , // ndim
double [] // Point
);
#endif
Calling the ParaMonte samplers
Suppose we want to randomly sample the above objective function via the Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler of ParaMonte (the ParaDRAM sampler).
The following C main code can be used to call the ParaMonte library routine runParaDRAM()
,
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Description:
// + Run the Monte Carlo sampler of the ParaMonte library given the input log-target density function `getLogFunc()`.
// Output:
// + The simulation output files.
// Author:
// + Computational Data Science Lab, Monday 9:03 AM, May 16 2016, ICES, UT Austin
// Visit:
// + https://www.cdslab.org/paramonte
//
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include "logfunc.h" // getLogFunc, NDIM
#include "paramonte.h" // runParaDRAM
int main(int argc, char *argv[])
{
char inputFile[] = "./paramonte.in";
int32_t inputFileLen = strlen(inputFile);
int32_t ndim = NDIM;
// C rules for argument passing apply here
runParaDRAM ( ndim
, &getLogFunc
, inputFile
, inputFileLen
);
return 0;
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Calling the ParaMonte samplers: The signature header file
Here is the contents of the C paramonte.h
signature header file for the ParaMonte library routines,
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////
//// MIT License
////
//// ParaMonte: plain powerful parallel Monte Carlo library.
////
//// Copyright (C) 2012-present, The Computational Data Science Lab
////
//// This file is part of the ParaMonte library.
////
//// Permission is hereby granted, free of charge, to any person obtaining a
//// copy of this software and associated documentation files (the "Software"),
//// to deal in the Software without restriction, including without limitation
//// the rights to use, copy, modify, merge, publish, distribute, sublicense,
//// and/or sell copies of the Software, and to permit persons to whom the
//// Software is furnished to do so, subject to the following conditions:
////
//// The above copyright notice and this permission notice shall be
//// included in all copies or substantial portions of the Software.
////
//// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
//// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
//// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
//// IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
//// DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
//// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
//// OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
////
//// ACKNOWLEDGMENT
////
//// ParaMonte is an honor-ware and its currency is acknowledgment and citations.
//// As per the ParaMonte library license agreement terms, if you use any parts of
//// this library for any purposes, kindly acknowledge the use of ParaMonte in your
//// work (education/research/industry/development/...) by citing the ParaMonte
//// library as described on this page:
////
//// https://github.com/cdslaborg/paramonte/blob/3548c097f2a25dfc0613061800656d27d0e0ddbe/ACKNOWLEDGMENT.md
////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Description:
// - The C header file containing the prototypes of the ParaMonte library routines to be called from C/C++.
// Prototypes:
// - runParaDRAM()
// Author:
// - Computational Data Science Lab, Monday 9:03 AM, May 16 2016, ICES, UT Austin
//
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#ifndef ParaMonte
#define ParaMonte
#include <stdint.h>
#if defined DLL_ENABLED
extern __declspec(dllimport)
#endif
void runParaDRAM(
// ndim: dimension of the domain of the LogFunc
int32_t ,
// getLogFunc(ndim, Point(ndim)): procedure pointer to the LogFunc
double (*) (
int32_t ,
double []
),
// inputFile: ParaDRAM input file, containing the path the file containing a list of
// all optional input variables and values
char [],
// inputFileLen: the length of the inputFile char vector: int32_t inputFileLen = strlen(inputFile);
int32_t
);
#endif
Calling the ParaMonte samplers: The input file
We will store the simulation specifications in a separate external input file whose path "./paramonte.in"
is given by the variable inputFile
in the main code above. See this page for a detailed description of the structure and convention rules of the input files. Also, see this page for detailed descriptions of the simulation specifications of the ParaDRAM sampler.
inputFile
of the ParaMonte routines, instead of passing the-path-to-this-input-file.Here is the contents of this paramonte.in
input file,
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!%
!% Description:
!% + Run the Monte Carlo sampler of the ParaMonte library given the input log-target density function `getLogFunc()`.
!% Output:
!% + The simulation output files.
!% Author:
!% + Computational Data Science Lab, Monday 9:03 AM, May 16 2016, ICES, UT Austin
!% Visit:
!% + https://www.cdslab.org/paramonte
!%
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!
! USAGE:
!
! -- Comments must begin with an exclamation mark (!).
! -- Comments can appear anywhere on an empty line or, after a value assignment.
! -- All variable assignments are optional and can be commented out. In such cases, appropriate default values will be assigned.
! -- Use ParaDRAM namelist (group) name to group a set of ParaDRAM simulation specification variables.
! -- The order of the input variables in the namelist groups is irrelevant and unimportant.
! -- Variables can be defined multiple times, but only the last definition will be considered as input.
! -- All variable names are case insensitive. However, for clarity, this software follows the camelCase code-writing practice.
! -- String values must be enclosed with either single or double quotation marks.
! -- Logical values are case-insensitive and can be either .true., true, or t for a TRUE value, and .false., false, or f for a FALSE value.
! -- All vectors and arrays in the input file begin with index 1. This is following the convention of
! the majority of science-oriented programming languages: Fortran, Julia, Mathematica, MATLAB, and R.
!
! For comprehensive guidelines on the input file organization and rules, visit:
!
! https://www.cdslab.org/paramonte/notes/usage/paradram/input/
!
! To see detailed descriptions of each of variables, visit:
!
! https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/
!
&ParaDRAM
! Base specifications
description = "
This\n
is a\n
multi-line\n
description." ! strings must be enclosed with "" or '' and can be continued on multiple lines.
! No comments within strings are allowed.
!outputColumnWidth = 25 ! this is an example of a variable that is commented out and
! therefore, its value won't be read by the sampler routine.
! To pass it to the routine, simply remove the ! mark at
! the beginning of the line.
outputRealPrecision = 17
!outputDelimiter = ","
outputFileName = "./out/" ! the last forward-slash character indicates that this
! is the folder where the output files will have to stored.
! However, since no output filename prefix has been specified,
! the output filenames will be assigned a randomly-generated prefix.
!sampleSize = 111
randomSeed = 2136275,
chainFileFormat = "compact"
variableNameList = "variable1" ! Notice the missing fourth variable name here.
, "variable2" ! Any missing variable name will be automatically assigned an appropriate name.
, "variable3"
domainLowerLimitVec = 4*-1.e300 ! repetition pattern rules apply again here. 4 dimensions => 4-element vector of values.
domainUpperLimitVec = 4*1.e300 ! repetition pattern rules apply again here. 4 dimensions => 4-element vector of values.
parallelizationModel = "single chain" ! "singleChain" would also work. Similarly, "multichain", "multi chain", or "multiChain".
!targetAcceptanceRate = 0.23e0
progressReportPeriod = 1000
maxNumDomainCheckToWarn = 100
maxNumDomainCheckToStop = 1000
restartFileFormat = "binary"
overwriteRequested = true ! FALSE, false, .false., .f., and f would be also all valid logical values representing False
silentModeRequested = false ! FALSE, false, .false., .f., and f would be also all valid logical values representing False
!mpiFinalizeRequested = true ! TRUE, true, .true., .t., and t would be also all valid logical values representing True
! MCMC specifications
chainSize = 30000
startPointVec = 4*1.e0 ! four values of 1.e0 are specified here by the repetition pattern symbol *
!sampleRefinementCount = 10
sampleRefinementMethod = "BatchMeans"
randomStartPointDomainLowerLimitVec = 4*-100.e0 ! repetition pattern rules apply again here. 4 dimensions => 4-element vector of values.
randomStartPointDomainUpperLimitVec = 4*100.0 ! repetition pattern rules apply again here. 4 dimensions => 4-element vector of values.
randomStartPointRequested = false
! DRAM specifications
scaleFactor = "2*0.5*Gelman" ! The asterisk here means multiplication since it is enclosed within quotation marks.
proposalModel = "normal" ! or "uniform" as you wish.
adaptiveUpdateCount = 10000000
adaptiveUpdatePeriod = 35
greedyAdaptationCount = 0
!delayedRejectionCount = 5
!delayedRejectionScaleFactorVec = 5*1.
!burninAdaptationMeasure = 1.
!proposalStartStdVec = 4*1.0 ! repetition pattern rules apply again here. 4 dimensions => 4-element vector of values.
!proposalStartCorMat = 1,0,0,0, ! 2-dimensional correlation-matrix definition, although it is commented out and won't be read.
! 0,1,0,0,
! 0,0,1,0,
! 0,0,0,1,
!proposalStartCovMat = 100,0,0,0,
! 0,100,0,0,
! 0,0,100,0,
! 0,0,0,100,
/
Compiling and linking to generate the executable
Now, follow the instructions on this page on how to build your C example for serial and parallel simulations on the Operating System (OS) of you choice: Windows
, Linux
, or macOS
/ Darwin
.
Running the ParaMonte simulations
Once you have built your application, running the simulation is as simple as calling the generated executable on the command-prompt for serial applications, or invoking the MPI/Coarray launcher on the command-prompt for MPI/Coarray parallel simulations. Follow the instructions and tips on this page on how to run your C example on either Windows
, Linux
, or macOS
/ Darwin
Operating System (OS), on either a single processor or in parallel, on multiple processors or clusters of nodes of processors.
main.exe: error while loading shared libraries: libparamonte_c_linux_x64_gnu_release_shared_heap.so: cannot open shared object file: No such file or directorySuch an error message imply that the shared runtime library mentioned in the error message is missing from the known search paths to application. To resolve such runtime errors, you will have to add the path to the required runtime shared library to your Bash environmental variable
LD_LIBRARY_PATH
. For example, the following will prepend the current directory to the list of paths for the runtime library search.
Running the ParaMonte simulations on a single processor
If the executable main.exe
has been built for serial simulations (using one processor), simply call the executable on the command line. The calling method is identical on all three Operating Systems: Windows, Linux, and Darwin (macOS). Here is an example call on Windows OS,
D:\example>main.exe
************************************************************************************************************************************
************************************************************************************************************************************
**** ****
**** ****
**** ParaMonte ****
**** Version 1.0.0 ****
**** May 23 2018 ****
**** ****
**** Department of Physics ****
**** Computational & Data Science Lab ****
**** Data Science Program, College of Science ****
**** The University of Texas at Arlington ****
**** ****
**** originally developed at ****
**** ****
**** Multiscale Modeling Group ****
**** Center for Computational Oncology (CCO) ****
**** Oden Institute for Computational Engineering and Sciences ****
**** Department of Aerospace Engineering and Engineering Mechanics ****
**** Department of Neurology, Dell-Seton Medical School ****
**** Department of Biomedical Engineering ****
**** The University of Texas at Austin ****
**** ****
**** For questions and further information, please contact: ****
**** ****
**** Amir Shahmoradi ****
**** ****
**** shahmoradi@utexas.edu ****
**** amir.shahmoradi@uta.edu ****
**** ashahmoradi@gmail.com ****
**** ****
**** cdslab.org/pm ****
**** ****
**** https://www.cdslab.org/paramonte/ ****
**** ****
**** ****
************************************************************************************************************************************
************************************************************************************************************************************
************************************************************************************************************************************
**** ****
**** Setting up ParaDRAM environment ****
**** ****
************************************************************************************************************************************
ParaDRAM - NOTE: Variable outputFileName detected among the input variables to ParaDRAM:
ParaDRAM - NOTE: ./out/
ParaDRAM - NOTE:
ParaDRAM - NOTE: Absolute path to the current working directory:
ParaDRAM - NOTE: D:\example
ParaDRAM - NOTE:
ParaDRAM - NOTE: Generating the requested directory for ParaDRAM output files:
ParaDRAM - NOTE: .\out\
ParaDRAM - NOTE:
ParaDRAM - NOTE: No user-input filename prefix for ParaDRAM output files detected.
ParaDRAM - NOTE: Generating appropriate filenames for ParaDRAM output files from the current date and time...
ParaDRAM - NOTE:
ParaDRAM - NOTE: ParaDRAM output files will be prefixed with:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200322_002335_512
ParaDRAM - NOTE: Searching for previous runs of ParaDRAM...
ParaDRAM - NOTE: No pre-existing ParaDRAM run detected.
ParaDRAM - NOTE: Starting a fresh ParaDRAM run...
ParaDRAM - NOTE: Generating the output report file:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200322_002335_512_process_1_report.txt
ParaDRAM - NOTE: Please see the output report and progress files for further realtime simulation details.
Accepted/Total Func. Call Dynamic/Overall Acc. Rate Elapsed/Remained Time [s]
========================= ========================= =========================
30000 / 130490 0.179 / 0.2291 2.2880 / 0.000
ParaDRAM - NOTE: Computing the Markov chain's statistical properties...
ParaDRAM - NOTE: Computing the final decorrelated sample size...
ParaDRAM - NOTE: Generating the output sample file:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200322_002335_512_process_1_sample.txt
ParaDRAM - NOTE: Computing the output sample's statistical properties...
ParaDRAM - NOTE: Mission Accomplished.
Running the ParaMonte simulations in parallel on multiple processors
If the executable main.exe
has been built for MPI-enabled parallel simulations (on multiple processors), you will need to invoke the MPI launcher to start the simulation.
- On Windows
For parallel simulations on a single node (of multiple processors), for example on a single computer, the command typically looks like the following,mpiexec -localonly -n 3 main.exe
where the flag
-localonly
ensures the simulation is run only on a single node. This option avoids the invocation of the Hydra service which would require prior registration. The flag-n 3
assigns three MPI tasks to three physical processors for the simulation. To run your simulation on a cluster of nodes, follow the guidelines and instructions provided by the Intel MPI library development team. - On Linux / Darwin (Mac)
The MPI launcher command for parallel simulations on Linux and Darwin are identical and look very similar to the Windows command (except for the flag-localonly
which is not needed),mpiexec -n 3 main.exe
where the flag
-n 3
assigns three MPI tasks to three physical processors for the simulation.
For example, the following command will launch the simulation on 3 processors on a Windows OS,
D:\example>mpiexec -localonly -n 3 main.exe
************************************************************************************************************************************
************************************************************************************************************************************
**** ****
**** ****
**** ParaMonte ****
**** Version 1.0.0 ****
**** May 23 2018 ****
**** ****
**** Department of Physics ****
**** Computational & Data Science Lab ****
**** Data Science Program, College of Science ****
**** The University of Texas at Arlington ****
**** ****
**** originally developed at ****
**** ****
**** Multiscale Modeling Group ****
**** Center for Computational Oncology (CCO) ****
**** Oden Institute for Computational Engineering and Sciences ****
**** Department of Aerospace Engineering and Engineering Mechanics ****
**** Department of Neurology, Dell-Seton Medical School ****
**** Department of Biomedical Engineering ****
**** The University of Texas at Austin ****
**** ****
**** For questions and further information, please contact: ****
**** ****
**** Amir Shahmoradi ****
**** ****
**** shahmoradi@utexas.edu ****
**** amir.shahmoradi@uta.edu ****
**** ashahmoradi@gmail.com ****
**** ****
**** cdslab.org/pm ****
**** ****
**** https://www.cdslab.org/paramonte/ ****
**** ****
**** ****
************************************************************************************************************************************
************************************************************************************************************************************
************************************************************************************************************************************
**** ****
**** Setting up ParaDRAM environment ****
**** ****
************************************************************************************************************************************
ParaDRAM - NOTE: Variable outputFileName detected among the input variables to ParaDRAM:
ParaDRAM - NOTE: ./out/
ParaDRAM - NOTE:
ParaDRAM - NOTE: Absolute path to the current working directory:
ParaDRAM - NOTE: D:\example
ParaDRAM - NOTE:
ParaDRAM - NOTE: Generating the requested directory for ParaDRAM output files:
ParaDRAM - NOTE: .\out\
ParaDRAM - NOTE:
ParaDRAM - NOTE: No user-input filename prefix for ParaDRAM output files detected.
ParaDRAM - NOTE: Generating appropriate filenames for ParaDRAM output files from the current date and time...
ParaDRAM - NOTE:
ParaDRAM - NOTE: ParaDRAM output files will be prefixed with:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200321_193047_392
ParaDRAM - NOTE: Searching for previous runs of ParaDRAM...
ParaDRAM - NOTE: No pre-existing ParaDRAM run detected.
ParaDRAM - NOTE: Starting a fresh ParaDRAM run...
ParaDRAM - NOTE: Generating the output report file:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200321_193047_392_process_1_report.txt
ParaDRAM - NOTE: Please see the output report and progress files for further realtime simulation details.
Accepted/Total Func. Call Dynamic/Overall Acc. Rate Elapsed/Remained Time [s]
========================= ========================= =========================
30000 / 130260 0.200 / 0.2298 1.3140 / 0.000
ParaDRAM - NOTE: The time cost of calling the user-provided objective function must be at least 52.5143 times more
ParaDRAM - NOTE: (that is, ~24.6651 microseconds) to see any performance benefits from singlechain parallelization
ParaDRAM - NOTE: model for this simulation. Consider simulating in the serial mode in the future (if used within the
ParaDRAM - NOTE: same computing environment and with the same configuration as used here).
ParaDRAM - NOTE: Computing the Markov chain's statistical properties...
ParaDRAM - NOTE: Computing the final decorrelated sample size...
ParaDRAM - NOTE: Generating the output sample file:
ParaDRAM - NOTE: .\out\ParaDRAM_run_20200321_193047_392_process_1_sample.txt
ParaDRAM - NOTE: Computing the output sample's statistical properties...
ParaDRAM - NOTE: Mission Accomplished.
You may have noticed the following note in the above simulation output,
ParaDRAM - NOTE: The time cost of calling the user-provided objective function must be at least 52.5143 times more
ParaDRAM - NOTE: (that is, ~24.6651 microseconds) to see any performance benefits from singlechain parallelization
ParaDRAM - NOTE: model for this simulation. Consider simulating in the serial mode in the future (if used within the
ParaDRAM - NOTE: same computing environment and with the same configuration as used here).
This indicates that the simulation in parallel does not lead to any efficiency gain. Given the computer specifications on which the above simulation is performed (a decently powerful computer), this may not be surprising. Based on the CPU power and inter-processor communication costs, our example simulation in the above seems to be a good candidate for serial simulations instead.
chainSize=2000
for MCMC simulations performed via ParaDRAM sampler). In general, the number of sampled points should be small enough to finish the test-simulation in a few minutes. Then, check the contents of the output file for this small simulation that has the suffix _report.txt
and look up the part corresponding to the following simulation information near the bottom of the file,
At least two numbers in the information provided in the above box merit your attention before you go ahead to set up your full-production simulation:1. The predicted optimal number of physical computing processes gives the simulation's best estimate for the optimal number of processes that could give you the highest speed-up, given the current sampling efficiency of your sampler. This is important because if your sampler's efficiency is high, then much of the work will be performed by only the very few first processors and all the work done by the rest of processors is wasted. Therefore, if you believe that your sampler's efficiency in the production mode will remain at the same level as in the test run, you may want to set the number of MPI tasks equal to the predicted optimal number by your test simulation.
2. The predicted absolute optimal number of physical computing processes gives the simulation's best estimate for the absolute-maximum number of processes that could give you the highest speed-up, for any conceivable sampling efficiency of your sampler. You may want to make sure that whatever number of processors you request for your production simulation, it would not go beyond this number as it will gradually become detrimental to your simulation's overall performance. Beyond this predicted absolute processor count, the cost of inter-processor communications gradually becomes comparable and eventually more than any time-savings that could be earned by simulating in parallel.
Keep in mind that these predictions for the optimal number of processors are only estimates and can vary widely depending on the runtime setup of your simulation, or the topology of the inter-processor network and communications.
Final notes
Mission accomplished. We have now used the adaptive Markov chain sampler routine of the ParaMonte library to generate random numbers from our objective function of interest. This simulation will generate five files in a folder named ./out/
in the current directory, each of which contains some specific information about the simulation. The purpose of each file is indicated by the suffix at the end of its name: _chain
, _sample
, _report
, _progress
, _restart
. If multichain
parallelism is requested via the input simulation specification variable parallelizationModel = 'multiChain'
, then there will be five simulation output files generated for each processor separately and independently of each other. This is as if we have simulated several independent but simultaneous ParaDRAM simulations of our objective function. More details about the contents of each of the ParaDRAM simulation output files can be found on this page.
Post-processing the ParaMonte simulation results
The post-processing of the output files generated by the ParaMonte routines can be nicely and quickly done via either the ParaMonte Python library or the ParaMonte MATLAB library.