The following variables specify the properties of simulations that are performed via the ParaDRAM routine of ParaMonte library. ParaDRAM stands for Parallel Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo.
The simulation specifications of ParaMonte’s ParaDRAM sampler
description
The variable 'description' contains general information about the specific ParaDRAM simulation
that is going to be performed. It has no effects on the simulation and serves only as a general description
of the simulation for future reference. The ParaDRAM parser automatically recognizes the C-style '\n'
escape sequence as the new-line character, and '\\' as the backslash character '\' if they used in
the description. For example, '\\n' will be converted to '\n' on the output, while '\n' translates
to the new-line character. Other C escape sequences are neither supported nor needed. The default
value for description is 'Nothing provided by the user.'.
inputFileHasPriority
A logical (boolean) variable. If TRUE (or .true. or true or .t. from within an input file), then the
input specifications of the ParaDRAM simulation will be read from the input file provided by the user, and the
simulation specification assignments from within the programming language environment (if any are made)
will be completely ignored. If inputFileHasPriority is FALSE, then all simulation specifications of the
ParaDRAM sampler that are taken from the user-specified input file will be overwritten by their
corresponding input values that are set from within the user's programming environment (if any is provided).
Note that this feature is useful when, for example, some simulation specifications have to computed and specified
at runtime and therefore, cannot be specified before the program execution. Currently, this functionality
(i.e., prioritizing the input file values to input-procedure-argument values) is available only in the
Fortran-interface to the ParaDRAM sampler. The default value is FALSE.
silentModeRequested
A logical (boolean) variable. If TRUE (or .true. or true or .t. from within an input file), then
the following contents will not be printed in the output report file of the ParaDRAM sampler:
- The ParaMonte library interface, compiler, and platform specifications.
- The ParaDRAM simulation specification descriptions.
Setting this variable to TRUE may break the functionality of the report-file parser methods of the
ParaMonte library in high-level languages (e.g., MATLAB, Python, R, ...). The default value is FALSE.
domainLowerLimitVec
domainLowerLimitVec represents the lower boundaries of the cubical domain of the objective function
to be sampled. It is an ndim-dimensional vector of 64-bit real numbers, where ndim is the number of
variables of the objective function. It is also possible to assign only select values of domainLowerLimitVec
and leave the rest of the components to be assigned the default value. This is POSSIBLE ONLY when
domainLowerLimitVec is defined inside the input file to the ParaDRAM sampler. For example,
having the following inside the input file,
domainLowerLimitVec(3:5) = -100
will only set the lower limits of the third, fourth, and the fifth dimensions to -100,
or,
domainLowerLimitVec(1) = -100, domainLowerLimitVec(2) = -1.e6
will set the lower limit on the first dimension to -100, and 1.e6 on the second dimension,
or,
domainLowerLimitVec = 3*-2.5e100
will only set the lower limits on the first, second, and the third dimensions to -2.5*10^100,
while the rest of the lower limits for the missing dimensions will be automatically set
to the default value.
The default value for all elements of domainLowerLimitVec is: -1.797693134862316E+307.
See also the input simulation specification
domainUpperLimitVec.
domainUpperLimitVec
domainUpperLimitVec represents the upper boundaries of the cubical domain of the objective function
to be sampled. It is an ndim-dimensional vector of 64-bit real numbers, where ndim is the number of
variables of the objective function. It is also possible to assign only select values of domainUpperLimitVec
and leave the rest of the components to be assigned the default value. This is POSSIBLE ONLY when
domainUpperLimitVec is defined inside the input file to the ParaDRAM sampler. For example,
domainUpperLimitVec(3:5) = 100
will only set the upper limits of the third, fourth, and the fifth dimensions to 100, or,
domainUpperLimitVec(1) = 100, domainUpperLimitVec(2) = 1.e6
will set the upper limit on the first dimension to 100, and 1.e6 on the second dimension,
or,
domainUpperLimitVec = 3*2.5e100
will only set the upper limits on the first, second, and the third dimensions to 2.5*10^100,
while the rest of the upper limits for the missing dimensions will be automatically set
to the default value.
The default value for all elements of domainUpperLimitVec is: 1.797693134862316E+307.
See also the input simulation specification
domainLowerLimitVec.
variableNameList
variableNameList contains the names of the variables to be sampled. It is used to construct
the header of the output sample file. Any element of variableNameList that is not set by the user
will be automatically assigned a default name. The default value is 'SampleVariablei' where integer
'i' is the index of the variable.
parallelizationModel
parallelizationModel is a string variable that represents the parallelization method to be used in
the ParaDRAM sampler. The string value must be enclosed by either single or double
quotation marks when provided as input. Two options are currently supported:
parallelizationModel = 'multiChain'
This method uses the Perfect Parallel scheme in which multiple MCMC chains are
generated independently of each other. In this case, multiple output MCMC chain files
will also be generated.
parallelizationModel = 'singleChain'
This method uses the fork-style parallelization scheme. A single MCMC chain file will be
generated in this case. At each MCMC step multiple proposal steps will be checked in
parallel until one proposal is accepted.
Note that in serial mode, there is no parallelism. Therefore, this option does not affect non-parallel
simulations and its value is ignored. The serial mode is equivalent to either of the parallelism
methods with only one simulation image (processor, core, or thread). The default value is
parallelizationModel = 'singleChain'. Note that the input values are case-INsensitive and white-space
characters are ignored.
See also the input simulation specification
mpiFinalizeRequested.
mpiFinalizeRequested
In parallel ParaDRAM simulations via MPI communication libraries, if mpiFinalizeRequested = true
(or T, both case-INsensitive), then a call will be made to the MPI_Finalize() routine from within the ParaDRAM
sampler at the end of the simulation to finalize the MPI communications. Set this variable to false (or f,
both case-INsensitive) if you do not want the ParaDRAM sampler to finalize the MPI communications for you.
This is a low-level simulation specification variable, relevant to simulations that directly involve MPI
parallelism. If you do not have any MPI-routine calls in your main program, you can safely ignore
this variable with its default value. Note that in non-MPI-enabled simulations, such as serial and
Coarray-enabled simulations, the value of this variable is completely ignored. The default value is
TRUE.
See also the input simulation specification
parallelizationModel.
outputFileName
outputFileName contains the path and the base of the filename for the ParaDRAM sampler output files. If not
provided by the user, the default outputFileName is constructed from the current date and time:
ParaDRAM_run_yyyymmdd_hhmmss_mmm
where yyyy, mm, dd, hh, mm, ss, mmm stand respectively for the current year, month, day, hour, minute,
second, and millisecond. In such a case, the default directory for the output files will be the current
working directory of the ParaDRAM sampler. If outputFileName is provided, but ends with a
separator character '/' or '\' (as in Linux or Windows OS), then its value will be used as the directory to
which the ParaDRAM sampler output files will be written. In this case, the output file naming
convention described above will be used. Also, the given directory will be automatically created if
it does not exist already.
See also the input simulation specification
overwriteRequested.
overwriteRequested
A logical (boolean) variable. If true (or .true. or TRUE or .t. from within an input file), then any
existing old simulation files with the same name as the current simulation will be overwritten with
the new simulation output files. Note that if overwriteRequested is set to TRUE, then the restart
functionality is automatically turned off and any existing old simulation output files with the
same name as the current simulation will be overwritten by the ParaDRAM sampler.
The default value is FALSE.
See also the input simulation specification
restartFileFormat,
outputFileName.
targetAcceptanceRate
targetAcceptanceRate sets an optimal target for the ratio of the number of accepted objective function
calls to the total number of function calls by the ParaDRAM sampler. It is a real-valued array of length 2,
whose elements determine the upper and lower bounds of the desired acceptance rate. When the acceptance
rate of the ParaDRAM sampler is outside the specified limits, the sampler's settings will be automatically adjusted
to bring the overall acceptance rate to within the specified limits by the input simulation specification targetAcceptanceRate.
When assigned from within a dynamic-language programming environment, such as MATLAB or Python, or from
within an input file, targetAcceptanceRate can also be a single real number between 0 and 1. In such case, the
ParaDRAM sampler will constantly attempt (with no guarantee of success) to bring the average acceptance ratio
of the sampler as close to the user-provided target ratio as possible. The success of the ParaDRAM sampler
in keeping the average acceptance ratio close to the requested target value depends heavily on:
1) the value of adaptiveUpdatePeriod; the larger, the easier.
2) the value of adaptiveUpdateCount; the larger, the easier.
Note that the acceptance ratio adjustments will only occur every adaptiveUpdatePeriod sampling
steps for a total number of adaptiveUpdateCount. There is no default value for targetAcceptanceRate,
as the acceptance ratio is not directly adjusted during sampling.
See also the input simulation specification scaleFactor.
sampleSize
The variable sampleSize is an integer that dictates the number of (hopefully, independent and
identically distributed [i.i.d.]) samples to be drawn from the user-provided objective function.
Three ranges of values are possible. If,
sampleSize < 0,
then, the absolute value of sampleSize dictates the sample size in units of the final
effective sample size generated by the sampler. The effective sample is by definition
i.i.d., and free from duplicates. The effective sample size is automatically determined
by ParaDRAM toward the end of the simulation.
For example:
sampleSize = -1 yields the effective i.i.d. sample drawn from the objective
function.
sampleSize = -2 yields a (potentially non-i.i.d.) sample twice as big as the
effective sample.
sampleSize > 0,
then, the specified value will represent the number of points to appear in the final
output sample file. If sampleSize turns out to be less than the estimated effective sample
size, then the resulting final sample will be i.i.d.. If sampleSize turns out to be larger
than the effective sample size, then the resulting sample will be potentially non-i.i.d..
The larger this difference, the more non-i.i.d. the resulting final refined sample will be.
For example,
sampleSize = 1000 yields a 1000-points final sample from the objective function.
sampleSize = 0,
in which case, no sample file will be generated.
The default value is sampleSize = -1.
See also the input simulation specification chainSize, sampleRefinementCount, sampleRefinementMethod.
randomSeed
randomSeed is a scalar 32bit integer that serves as the seed of the random number generator. When it
is provided, the seed of the random number generator will be set in a specific deterministic manner
to enable future replications of the simulation with the same configuration and input specifications.
The default value for randomSeed is an integer vector of processor-dependent size and value that will
vary from one simulation to another. However, enough care has been taken to assign unique random seed
values to the random number generator on each of the parallel threads (or images, processors, cores,
...) at all circumstances.
outputColumnWidth
The variable outputColumnWidth is a non-negative integer number that determines the width of the data
columns in the formatted output files of a ParaDRAM simulation with tabular structure. If it
is set to zero, the ParaDRAM sampler will ensure to set the width of each output element to
the minimum possible width without losing the requested output precision. In other words, setting
outputColumnWidth = 0 will result in the smallest-size for the formatted output files that are in ASCII
format. The default value is 0.
See also the input simulation specification
outputDelimiter,
outputRealPrecision,
chainFileFormat.
outputDelimiter
outputDelimiter is a string variable, containing a sequence of one or more characters (excluding
digits, the period symbol '.', and the addition and subtraction operators: '+' and '-'), that is used
to specify the boundary between separate, independent information elements in the tabular output
files of the ParaDRAM sampler. The string value must be enclosed by either single or double
quotation marks when provided as input. To output in Comma-Separated-Values (CSV) format, set
outputDelimiter = ','. If the input value is not provided, the default delimiter ',' will be used when
input outputColumnWidth = 0, and a single space character, ',' will be used when input
outputColumnWidth > 0. A value of '\t' is interpreted as the TAB character. To avoid this interpretation,
use '\\t' to yield '\t' without being interpreted as the TAB character. The default value is ','.
See also the input simulation specification
outputColumnWidth,
outputRealPrecision,
chainFileFormat.
outputRealPrecision
The variable outputRealPrecision is a 32-bit integer number that determines the precision - that is,
the number of significant digits - of the real numbers in the output files of a ParaDRAM simulation.
Any positive integer is acceptable as the input value of outputRealPrecision. However, any digits of the output
real numbers beyond the accuracy of 64-bit real numbers (approximately 16 digits of significance)
will be meaningless and random. Set this variable to 16 (or larger) if full reproducibility of the
simulation is needed in the future. But keep in mind that larger precisions will result in larger-size
output files. This variable is ignored for binary output (if any occurs during the simulation). The
default value is 8.
See also the input simulation specification
outputColumnWidth,
outputDelimiter,
chainFileFormat.
chainFileFormat
chainFileFormat is a string variable that represents the format of the output chain file(s) of a
ParaDRAM simulation. The string value must be enclosed by either single or double quotation
marks when provided as input. Three values are possible:
chainFileFormat = 'compact'
This is the ASCII (text) file format which is human-readable but does not preserve the
full accuracy of the output values. It is also a significantly slower mode of chain file
generation, compared to the binary file format (see below). If the compact format is
specified, each of the repeating MCMC states will be condensed into a single entry (row)
in the output MCMC chain file. Each entry will be then assigned a sample-weight that is
equal to the number of repetitions of that state in the MCMC chain. Thus, each row in
the output chain file will represent a unique sample from the objective function. This
will lead to a significantly smaller ASCII chain file and faster output size compared to
the verbose chain file format (see below).
chainFileFormat = 'verbose'
This is the ASCII (text) file format which is human-readable but does not preserve the
full accuracy of the output values. It is also a significantly slower mode of chain file
generation, compared to both compact and binary chain file formats (see above and below).
If the verbose format is specified, all MCMC states will have equal sample-weights of 1
in the output chain file. The verbose format can lead to much larger chain file sizes
than the compact and binary file formats. This is especially true if the target objective
function has a very high-dimensional state space.
chainFileFormat = 'binary'
This is the binary file format which is not human-readable, but preserves the exact values
in the output MCMC chain file. It is also often the fastest mode of chain file generation.
If the binary file format is chosen, the chain will be automatically output in the compact
format (but as binary) to ensure the production of the smallest-possible output chain
file. Binary chain files will have the .bin file extensions. Use the binary format if
you need full accuracy representation of the output values while having the smallest-size
output chain file in the shortest time possible.
The default value is chainFileFormat = 'compact' as it provides a reasonable trade-off between speed
and output file size while generating human-readable chain file contents. Note that the input values
are case-INsensitive.
See also the input simulation specification
outputColumnWidth,
outputDelimiter,
outputRealPrecision.
restartFileFormat
restartFileFormat is a string variable that represents the format of the output restart file(s) which
are used to restart an interrupted ParaDRAM simulation. The string value must be enclosed by either
single or double quotation marks when provided as input. Two values are possible:
restartFileFormat = 'binary'
This is the binary file format which is not human-readable, but preserves the exact values
of the specification variables required for the simulation restart. This full accuracy
representation is required to exactly reproduce an interrupted simulation. The binary
format is also normally the fastest mode of restart file generation. Binary restart files
will have the .bin file extensions.
restartFileFormat = 'ASCII'
This is the ASCII (text) file format which is human-readable but does not preserve the
full accuracy of the specification variables required for the simulation restart. It is
also a significantly slower mode of restart file generation, compared to the binary
format. Therefore, its usage should be limited to situations where the user wants to
track the dynamics of simulation specifications throughout the simulation time. ASCII
restart file(s) will have the .txt file extensions.
The default value is restartFileFormat = 'binary'. Note that the input values are case-INsensitive.
See also the input simulation specification
outputFileName,
overwriteRequested.
progressReportPeriod
Every progressReportPeriod calls to the objective function, the sampling progress will be reported
to the log file. Note that progressReportPeriod must be a positive integer. The default value is 1000.
maxNumDomainCheckToWarn
maxNumDomainCheckToWarn is an integer number beyond which the user will be warned about the newly-proposed
points being excessively proposed outside the domain of the objective function. For every
maxNumDomainCheckToWarn consecutively-proposed new points that fall outside the domain of the objective
function, the user will be warned until maxNumDomainCheckToWarn = maxNumDomainCheckToStop, in which
case the sampler returns a fatal error and the program stops globally. The counter for this warning
message is reset after a proposal sample from within the domain of the objective function is obtained.
When out-of-domain sampling happens frequently, it is a strong indication of something fundamentally wrong in the
simulation. It is, therefore, important to closely inspect and monitor for such frequent out-of-domain samplings.
This can be done by setting maxNumDomainCheckToWarn to an appropriate value determined by the user.
The default value is 1000.
See also the input simulation specification
maxNumDomainCheckToStop.
maxNumDomainCheckToStop
maxNumDomainCheckToStop is an integer number beyond which the program will stop globally with a fatal
error message declaring that the maximum number of proposal-out-of-domain-bounds has reached. The
counter for this global-stop request is reset after a proposal point is accepted as a sample from
within the domain of the objective function.
When out-of-domain sampling happens frequently, it is a strong indication of something fundamentally wrong in the
simulation. It is, therefore, important to closely inspect and monitor for such frequent out-of-domain samplings.
This can be done by setting maxNumDomainCheckToStop to an appropriate value determined by the user.
The default value is 10000.
See also the input simulation specification
maxNumDomainCheckToWarn.
chainSize
chainSize determines the number of non-refined, potentially auto-correlated, but unique, samples
drawn by the MCMC sampler before stopping ParaDRAM. For example, if you specify chainSize = 10000,
then 10000 unique sample points (with no duplicates) will be drawn from the target objective function
that the user has provided. The input value for chainSize must be a positive integer of a minimum value
ndim + 1 or larger, where ndim is the number of dimensions of the domain of the objective function to be sampled.
Note that chainSize is different from and always smaller than the length of the constructed MCMC chain.
The default value is 100000.
See also the input simulation specification sampleSize.
randomStartPointDomainLowerLimitVec
randomStartPointDomainLowerLimitVec represents the lower boundaries of the cubical domain from which
the starting point(s) of the MCMC chain(s) will be initialized randomly (only if requested via the
input variable randomStartPointRequested. This happens only when some or all of the elements of the
input variable StartPoint are missing. In such cases, every missing value of input StartPoint will
be set to the center point between randomStartPointDomainLowerLimitVec and RandomStartPointDomainUpperLimit
in the corresponding dimension. If randomStartPointRequested=TRUE (or True, true, t, all case-INsensitive),
then the missing elements of StartPoint will be initialized to values drawn randomly from within the
corresponding ranges specified by the input variable randomStartPointDomainLowerLimitVec. As an input
variable, randomStartPointDomainLowerLimitVec is an ndim-dimensional vector of 64-bit real numbers,
where ndim is the number of variables of the objective function. It is also possible to assign only
select values of randomStartPointDomainLowerLimitVec and leave the rest of the components to be
assigned the default value. This is POSSIBLE ONLY when randomStartPointDomainLowerLimitVec is defined
inside the input file to ParaDRAM. For example, having the following inside the input file,
randomStartPointDomainLowerLimitVec(3:5) = -100
will only set the lower limits of the third, fourth, and the fifth dimensions to -100,
or,
randomStartPointDomainLowerLimitVec(1) = -100, randomStartPointDomainLowerLimitVec(2) = -1.e6
will set the lower limit on the first dimension to -100, and 1.e6 on the second dimension,
or,
randomStartPointDomainLowerLimitVec = 3*-2.5e100
will only set the lower limits on the first, second, and the third dimensions to -2.5*10^100,
while the rest of the lower limits for the missing dimensions will be automatically set
to the default value.
The default values for all elements of randomStartPointDomainLowerLimitVec are taken from the
corresponding values in the input variable domainLowerLimitVec.
See also the input simulation specification randomStartPointDomainUpperLimitVec, randomStartPointRequested.
randomStartPointDomainUpperLimitVec
randomStartPointDomainUpperLimitVec represents the upper boundaries of the cubical domain from which
the starting point(s) of the MCMC chain(s) will be initialized randomly (only if requested via the
input variable randomStartPointRequested. This happens only when some or all of the elements of the
input variable StartPoint are missing. In such cases, every missing value of input StartPoint will
be set to the center point between randomStartPointDomainUpperLimitVec and randomStartPointDomainLowerLimitVec
in the corresponding dimension. If randomStartPointRequested=TRUE (or True, true, t, all case-INsensitive),
then the missing elements of StartPoint will be initialized to values drawn randomly from within the
corresponding ranges specified by the input variable randomStartPointDomainUpperLimitVec. As an input
variable, randomStartPointDomainUpperLimitVec is an ndim-dimensional vector of 64-bit real numbers,
where ndim is the number of variables of the objective function. It is also possible to assign only
select values of randomStartPointDomainUpperLimitVec and leave the rest of the components to be
assigned the default value. This is POSSIBLE ONLY when randomStartPointDomainUpperLimitVec is defined
inside the input file to ParaDRAM. For example, having the following inside the input file,
randomStartPointDomainUpperLimitVec(3:5) = -100
will only set the upper limits of the third, fourth, and the fifth dimensions to -100,
or,
randomStartPointDomainUpperLimitVec(1) = -100, randomStartPointDomainUpperLimitVec(2) = -1.e6
will set the upper limit on the first dimension to -100, and 1.e6 on the second dimension,
or,
randomStartPointDomainUpperLimitVec = 3*-2.5e100
will only set the upper limits on the first, second, and the third dimensions to -2.5*10^100,
while the rest of the upper limits for the missing dimensions will be automatically set
to the default value.
The default values for all elements of randomStartPointDomainUpperLimitVec are taken from the
corresponding values in the input variable domainUpperLimitVec.
See also the input simulation specification randomStartPointDomainLowerLimitVec, randomStartPointRequested.
startPointVec
startPointVec is a 64bit real-valued vector of length ndim (the dimension of the domain of the input
objective function). For every element of startPointVec that is not provided as input, the default
value will be the center of the domain of startPointVec as specified by domainLowerLimitVec
and domainUpperLimitVec input variables. If the input variable randomStartPointRequested=TRUE
(or true or t, all case-INsensitive), then the missing elements of startPointVec will be initialized
to values drawn randomly from within the corresponding ranges specified by the input variables
randomStartPointDomainLowerLimitVec and randomStartPointDomainUpperLimitVec.
See also the input simulation specification randomStartPointRequested.
randomStartPointRequested
A logical (boolean) variable. If true (or .true. or TRUE or .t. from within an input file), then the
variable startPointVec will be initialized randomly for each MCMC chain that is to be generated by
the sampler. The random values will be drawn from the specified or the default domain of startPointVec,
given by RandomStartPointDomain variable. Note that the value of startPointVec, if provided, has precedence
over random initialization. In other words, for every element of startPointVec that is not provided
as input only that element will initialized randomly if randomStartPointRequested=TRUE. Also, note
that even if startPointVec is randomly initialized, its random value will be deterministic between
different independent runs of ParaDRAM if the input variable randomSeed is provided by the user. The
default value is FALSE.
See also the input simulation specification startPointVec, randomStartPointDomainLowerLimitVec, randomStartPointDomainUpperLimitVec.
sampleRefinementCount
When sampleSize < 0, the integer variable sampleRefinementCount dictates the maximum number of
times the MCMC chain will be refined to remove the autocorrelation within the output MCMC sample.
For example,
if sampleRefinementCount = 0,
no refinement of the output MCMC chain will be performed, the resulting MCMC sample will
simply correspond to the full MCMC chain in verbose format (i.e., each sampled state has
a weight of one).
if sampleRefinementCount = 1,
the refinement of the output MCMC chain will be done only once if needed, and no more,
even though there may still exist some residual autocorrelation in the output MCMC sample.
In practice, only one refinement of the final output MCMC Chain should be enough to remove
the existing autocorrelations in the final output sample. Exceptions occur when the
Integrated Autocorrelation (IAC) of the output MCMC chain is comparable to or larger than
the length of the chain. In such cases, neither the BatchMeans method nor any other method
of IAC computation will be able to accurately compute the IAC. Consequently, the samples
generated based on the computed IAC values will likely not be i.i.d. and will still be
significantly autocorrelated. In such scenarios, more than one refinement of the MCMC
chain will be necessary. Very small sample size resulting from multiple refinements of
the sample could be a strong indication of the bad mixing of the MCMC chain and the output
chain may not contain true i.i.d. samples from the target objective function.
if sampleRefinementCount > 1,
the refinement of the output MCMC chain will be done for a maximum sampleRefinementCount
number of times, even though there may still exist some residual autocorrelation in the
final output MCMC sample.
if sampleRefinementCount >> 1 (e.g., comparable to or larger than the length of the MCMC chain),
the refinement of the output MCMC chain will continue until the integrated autocorrelation
of the resulting final sample is less than 2, virtually implying that an independent
identically-distributed (i.i.d.) sample has finally been obtained.
Note that to obtain i.i.d. samples from a multidimensional chain, ParaDRAM will, by default, use the
maximum of Integrated Autocorrelation (IAC) among all dimensions of the chain to refine the chain. Note that
the value specified for sampleRefinementCount is used only when the variable sampleSize < 0, otherwise,
it will be ignored. The default value is sampleRefinementCount = 1073741823.
See also the input simulation specification sampleRefinementMethod.
sampleRefinementMethod
sampleRefinementMethod is a string variable that represents the method of computing the Integrated
Autocorrelation Time (IAC) to be used in ParaDRAM for refining the final output MCMC chain and sample.
The string value must be enclosed by either single or double quotation marks when provided as input.
Options that are currently supported include:
sampleRefinementMethod = 'BatchMeans'
This method of computing the Integrated Autocorrelation Time is based on the approach
described in SCHMEISER, B., 1982, Batch size effects in the analysis of simulation output,
Oper. Res. 30 556-568. The batch sizes in the BatchMeans method are chosen to be int(N^(2/3))
where N is the length of the MCMC chain. As long as the batch size is larger than the
IAC of the chain and there are significantly more than 10 batches, the BatchMeans method
will provide reliable estimates of the IAC.
Note that the refinement strategy involves two separate phases of sample decorrelation. At the first stage,
the Markov chain is decorrelated recursively (for as long as needed) based on the IAC of its compact format,
where only the the uniquely-visited states are kept in the (compact) chain. Once the Markov chain is refined
such that its compact format is fully decorrelated, the second phase of the decorrelation begins during which
the Markov chain is decorrelated based on the IAC of the chain in its verbose (Markov) format. This process
is repeated recursively for as long as there is any residual autocorrelation in the refined sample.
sampleRefinementMethod = 'BatchMeans-compact'
This is the same as the first case in the above, except that only the first phase of the sample refinement
described in the above will be performed, that is, the (verbose) Markov chain is refined only based on the
IAC computed from the compact format of the Markov chain. This will lead to a larger final refined sample.
However, the final sample will likely not be fully decorrelated.
sampleRefinementMethod = 'BatchMeans-verbose'
This is the same as the first case in the above, except that only the second phase of the sample refinement
described in the above will be performed, that is, the (verbose) Markov chain is refined only based on the
IAC computed from the verbose format of the Markov chain. While the resulting refined sample will be fully
decorrelated, the size of the refined sample may be smaller than the default choice in the first case in the
above.
Note that in order to obtain i.i.d. samples from a multidimensional chain, the MCMC sampler will use the average of
IAC among all dimensions of the chain to refine the chain. If the maximum, minimum, or the median of IACs is preferred
add '-max' (or '-maximum'), '-min' (or '-minimum'), '-med' (or '-median'), respectively, to the value of
sampleRefinementMethod. For example,
sampleRefinementMethod = 'BatchMeans-max'
or,
sampleRefinementMethod = 'BatchMeans-compact-max'
or,
sampleRefinementMethod = 'BatchMeans-max-compact'
Also, note that the value specified for sampleRefinementCount is used only when the variable sampleSize < 0,
otherwise, it will be ignored. The default value is sampleRefinementMethod = 'BatchMeans'.
Note that the input values are case-INsensitive and white-space characters are ignored.
See also the input simulation specification sampleRefinementCount.
scaleFactor
scaleFactor is a real-valued positive number (which must be given as string), by the square of which
the covariance matrix of the proposal distribution of MCMC sampler is scaled. In other words, the
proposal distribution will be scaled in every direction by the value of scaleFactor. It can also be
given in units of the string keyword 'gelman' (which is case-INsensitive) after the paper:
Gelman, Roberts, and Gilks (1996): 'Efficient Metropolis Jumping Rules'.
The paper finds that the optimal scaling factor for a Multivariate Gaussian proposal distribution
for the Metropolis-Hastings Markov Chain Monte Carlo sampling of a target Multivariate Normal
Distribution of dimension ndim is given by:
scaleFactor = 2.38/sqrt(ndim) , in the limit of ndim -> Infinity.
Multiples of the gelman scale factors are also acceptable as input and can be specified like the
following examples:
scaleFactor = '1'
multiplies the ndim-dimensional proposal covariance matrix by 1, essentially no change
occurs to the covariance matrix.
scaleFactor = "1"
same as the previous example. The double-quotation marks act the same way as single-quotation
marks.
scaleFactor = '2.5'
multiplies the ndim-dimensional proposal covariance matrix by 2.5.
scaleFactor = '2.5*Gelman'
multiplies the ndim-dimensional proposal covariance matrix by 2.5 * 2.38/sqrt(ndim).
scaleFactor = "2.5 * gelman"
same as the previous example, but with double-quotation marks. space characters are
ignored.
scaleFactor = "2.5 * gelman*gelman*2"
equivalent to gelmanFactor-squared multiplied by 5.
Note, however, that the result of Gelman et al. paper applies only to multivariate normal proposal
distributions, in the limit of infinite dimensions. Therefore, care must be taken when using Gelman's
scaling factor with non-Gaussian proposals and target objective functions. Note that only the product
symbol (*) can be parsed in the string value of scaleFactor. The presence of other mathematical symbols
or multiple appearances of the product symbol will lead to a simulation crash. Also, note that the
prescription of an acceptance range specified by the input variable 'targetAcceptanceRate' will lead to
the dynamic modification of the initial input value of scaleFactor throughout sampling for
adaptiveUpdateCount times. The default scaleFactor string-value is 'gelman' (for all proposals),
which is subsequently converted to 2.38/sqrt(ndim).
See also the input simulation specification targetAcceptanceRate.
proposalModel
proposalModel is a string variable containing the name of the proposal distribution for the MCMC
sampler. The string value must be enclosed by either single or double quotation marks when provided
as input. Options that are currently supported include:
proposalModel = 'normal'
This is equivalent to the multivariate normal distribution, which is the most widely-used
proposal model along with MCMC samplers.
proposalModel = 'uniform'
The proposals will be drawn uniformly from within a ndim-dimensional ellipsoid whose
covariance matrix and scale are initialized by the user and optionally adaptively updated
throughout the simulation.
The default value is 'normal'.
See also the input simulation specification
proposalStartCovMat,
proposalStartCorMat,
proposalStartStdVec.
proposalStartCovMat
proposalStartCovMat is a real-valued positive-definite matrix of size (ndim,ndim), where ndim is the
dimension of the sampling space. It serves as the best-guess starting covariance matrix of the proposal
distribution. To bring the sampling efficiency of ParaDRAM to within the desired requested range,
the covariance matrix will be adaptively updated throughout the simulation, according to the user's
requested schedule. If proposalStartCovMat is not provided by the user or it is completely missing
from the input file, its value will be automatically computed via the input variables proposalStartCorMat
and proposalStartStdVec (or via their default values, if not provided).
The default value of proposalStartCovMat is an ndim-by-ndim Identity matrix.
See also the input simulation specification
proposalModel,
proposalStartCorMat,
proposalStartStdVec.
proposalStartCorMat
proposalStartCorMat is a real-valued positive-definite matrix of size (ndim,ndim), where ndim is the
dimension of the sampling space. It serves as the best-guess starting correlation matrix of the
proposal distribution used by ParaDRAM. It is used (along with the input vector proposalStartStdVec)
to construct the covariance matrix of the proposal distribution when the input covariance matrix is
missing in the input list of variables. If the covariance matrix is given as input to ParaDRAM, any
input values for proposalStartCorMat, as well as proposalStartStdVec, will be automatically ignored
by ParaDRAM. As input to ParaDRAM, the variable proposalStartCorMat along with proposalStartStdVec
is especially useful in situations where obtaining the best-guess covariance matrix is not trivial.
The default value of proposalStartCorMat is an ndim-by-ndim Identity matrix.
See also the input simulation specification
proposalModel,
proposalStartCovMat,
proposalStartStdVec.
proposalStartStdVec
proposalStartStdVec is a real-valued positive vector of length ndim, where ndim is the dimension of
the sampling space. It serves as the best-guess starting Standard Deviation of each of the components
of the proposal distribution. If the initial covariance matrix (proposalStartCovMat) is missing as
an input variable to ParaDRAM, then proposalStartStdVec (along with the input variable proposalStartCorMat)
will be used to construct the initial covariance matrix of the proposal distribution of the MCMC
sampler. However, if proposalStartCovMat is present as an input argument to ParaDRAM, then the input
proposalStartStdVec along with the input proposalStartCorMat will be completely ignored and the input
value for proposalStartCovMat will be used to construct the initial covariance matrix of the proposal
distribution of ParaDRAM. The default value of proposalStartStdVec is a vector of unit values (i.e.,
ones) of length ndim.
See also the input simulation specification
proposalModel,
proposalStartCovMat,
proposalStartCorMat.
adaptiveUpdatePeriod
Every adaptiveUpdatePeriod calls to the objective function, the parameters of the proposal distribution
will be updated. The variable adaptiveUpdatePeriod must be a positive non-zero integer. The smaller the
value of adaptiveUpdatePeriod, the easier it will be for the ParaDRAM kernel to adapt the proposal
distribution to the covariance structure of the objective function. However, this will happen at the
expense of slower simulation runtime as the adaptation process can become computationally expensive,
in particular, for very high dimensional objective functions (ndim >> 1). The larger the value of
adaptiveUpdatePeriod, the easier it will be for the ParaDRAM kernel to keep the sampling efficiency
close to the requested target acceptance rate range (if specified via the input variable
targetAcceptanceRate). However, too large values for adaptiveUpdatePeriod will only delay the adaptation
of the proposal distribution to the global structure of the objective function that is being sampled.
If adaptiveUpdatePeriod >= chainSize, then no adaptive updates to the proposal distribution will be
made. The default value is 4 * ndim, where ndim is the dimension of the domain of the objective
function to be sampled.
See also the input simulation specification
adaptiveUpdateCount,
greedyAdaptationCount.
adaptiveUpdateCount
adaptiveUpdateCount represents the total number of adaptive updates that will be made to the parameters
of the proposal distribution, to increase the efficiency of the sampler thus increasing the sampling
efficiency of ParaDRAM. Every adaptiveUpdatePeriod number of calls to the objective function, the
parameters of the proposal distribution will be updated until either the total number of adaptive
updates reaches the value of adaptiveUpdateCount. This variable must be a non-negative integer. As
a rule of thumb, it may be appropriate to set the input variable chainSize > 2 * adaptiveUpdatePeriod
* adaptiveUpdateCount, to ensure ergodicity and stationarity of the MCMC sampler. If adaptiveUpdateCount=0,
then the proposal distribution parameters will be fixed to the initial input values throughout the
entire MCMC sampling. The default value is 1073741823.
See also the input simulation specification
adaptiveUpdatePeriod,
greedyAdaptationCount.
greedyAdaptationCount
If greedyAdaptationCount is set to a positive integer then the first greedyAdaptationCount number of
the adaptive updates of the sampler will be made using only the 'unique' accepted points in the MCMC
chain. This is useful, for example, when the function to be sampled by ParaDRAM is high dimensional, in
which case, the adaptive updates to ParaDRAM's sampler distribution will less likely lead to numerical
instabilities, for example, a singular covariance matrix for the multivariate proposal sampler. The
variable greedyAdaptationCount must be a non-negative integer, and not larger than the value of
adaptiveUpdateCount. If it is larger, it will be automatically set to adaptiveUpdateCount for the
simulation. The default value is 0.
See also the input simulation specification
adaptiveUpdatePeriod,
adaptiveUpdateCount.
burninAdaptationMeasure
burninAdaptationMeasure is a 64-bit real number between 0 and 1, representing the adaptation measure
threshold below which the simulated Markov chain will be used to generate the output ParaDRAM sample.
In other words, any point in the output Markov Chain that has been sampled during significant adaptation
of the proposal distribution (as determined by burninAdaptationMeasure) will not be included in the
construction of the final ParaDRAM output sample. This is to ensure that the generation of the output
sample will be based on the part of the simulated chain that is practically guaranteed to be Markovian
and ergodic. If this variable is set to 0, then the output sample will be generated from the part of
the chain where no proposal adaptation has occurred. This non-adaptive or minimally-adaptive part of
the chain may not even exist if the total adaptation period of the simulation (as determined by
adaptiveUpdateCount and adaptiveUpdatePeriod input variables) is longer than the total length of the
output MCMC chain. In such cases, the resulting output sample may have a zero size. In general, when
good mixing occurs (e.g., when the input variable chainSize is very large) any specific value of
burninAdaptationMeasure becomes practically irrelevant. The default value for burninAdaptationMeasure
is 1.00000000000000, implying that the entire chain (with the exclusion of an initial automatically-determined
burnin period) will be used to generate the final output sample.
delayedRejectionCount
0 <= delayedRejectionCount <= 1000 is an integer that represents the total number of stages for which
rejections of new proposals will be tolerated by ParaDRAM before going back to the previously accepted
point (state). Possible values are:
delayedRejectionCount = 0
indicating no deployment of the delayed rejection algorithm.
delayedRejectionCount > 0
which implies a maximum delayedRejectionCount number of rejections will be tolerated.
For example, delayedRejectionCount = 1, means that at any point during the sampling, if a proposal
is rejected, ParaDRAM will not go back to the last sampled state. Instead, it will continue to
propose a new state from the last rejected proposal. If the new state is again rejected based on the rules of
ParaDRAM, then the algorithm will not tolerate further rejections, because the maximum number of
rejections to be tolerated has been set by the user to be delayedRejectionCount = 1. The algorithm
then goes back to the original last-accepted state and will begin proposing new states from that
location. The default value is delayedRejectionCount = 0.
See also the input simulation specification
delayedRejectionScaleFactorVec.
delayedRejectionScaleFactorVec
delayedRejectionScaleFactorVec is a real-valued positive vector of length (1:delayedRejectionCount)
by which the covariance matrix of the proposal distribution of ParaDRAM sampler is scaled when the
Delayed Rejection (DR) scheme is activated (by setting delayedRejectionCount>0). At each ith stage
of the DR process, the proposal distribution from the last stage is scaled by the factor
delayedRejectionScaleFactorVec(i). Missing elements of the delayedRejectionScaleFactorVec in the
input to ParaDRAM will be set to the default value. The default value at all stages is 0.5^(1/ndim)
where ndim is the number of dimensions of the domain of the objective function. This default value effectively
reduces the volume of the covariance matrix of the proposal distribution by half compared to the last DR stage.
See also the input simulation specification
delayedRejectionCount.