Source code for paramonte._paradram

####################################################################################################################################
####################################################################################################################################
####
####   MIT License
####
####   ParaMonte: plain powerful parallel Monte Carlo library.
####
####   Copyright (C) 2012-present, The Computational Data Science Lab
####
####   This file is part of the ParaMonte library.
####
####   Permission is hereby granted, free of charge, to any person obtaining a
####   copy of this software and associated documentation files (the "Software"),
####   to deal in the Software without restriction, including without limitation
####   the rights to use, copy, modify, merge, publish, distribute, sublicense,
####   and/or sell copies of the Software, and to permit persons to whom the
####   Software is furnished to do so, subject to the following conditions:
####
####   The above copyright notice and this permission notice shall be
####   included in all copies or substantial portions of the Software.
####
####   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
####   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
####   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
####   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
####   DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
####   OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
####   OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
####
####   ACKNOWLEDGMENT
####
####   ParaMonte is an honor-ware and its currency is acknowledgment and citations.
####   As per the ParaMonte library license agreement terms, if you use any parts of
####   this library for any purposes, kindly acknowledge the use of ParaMonte in your
####   work (education/research/industry/development/...) by citing the ParaMonte
####   library as described on this page:
####
####       https://github.com/cdslaborg/paramonte/blob/3548c097f2a25dfc0613061800656d27d0e0ddbe/ACKNOWLEDGMENT.md
####
####################################################################################################################################
####################################################################################################################################

import numpy as np
import typing as tp

from _ParaMonteSampler import ParaMonteSampler
from _TabularFileContents import TabularFileContents
import _paramonte as pm

newline = pm.newline

####################################################################################################################################
#### ParaDRAM class
####################################################################################################################################

[docs]class ParaDRAM(ParaMonteSampler): """ This is the **ParaDRAM** class to generate instances of **serial** and **parallel** **Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo** sampler class of the ParaMonte library. The ``ParaDRAM`` class is a child of the ``ParaMonteSampler`` class. All ParaDRAM class attributes are optional and all attributes can be set after a ParaDRAM instance is returned by the constructor. Once you set the optional attributes to your desired values, call the ParaDRAM sampler via the object's method ``runSampler()``. .. _example-serial-usage: **Example serial usage** Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules): .. code-block:: python :linenos: ################################## import numpy as np import paramonte as pm def getLogFunc(point): # return the log of a multivariate Normal # density function with ndim dimensions return -0.5 * np.dot(point, point) pmpd = pm.ParaDRAM() pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function , getLogFunc = getLogFunc # the objective function ) ################################## where, ndim represents the number of dimensions of the domain of the user's objective function ``getLogFunc(point)`` and, getLogFunc(point) represents the user's objective function to be sampled, which must take a single input argument ``point`` of type numpy-float64 array of length ``ndim`` and must return the natural logarithm of the objective function. .. _example-parallel-usage: **Example parallel usage** Copy and paste the following code enclosed between the two comment lines in your python/ipython/jupyter session (ensure the indentations of the pasted lines comply with Python rules): .. code-block:: python :linenos: ################################## with open("main_mpi.py", "w") as file: file.write (''' import numpy as np import paramonte as pm def getLogFunc(point): # return the log of the standard multivariate # Normal density function with ndim dimensions return -0.5 * np.dot(point, point) pmpd = pm.ParaDRAM() pmpd.mpiEnabled = True pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function , getLogFunc = getLogFunc # the objective function ) ''') ################################## where, ndim represents the number of dimensions of the domain of the user's objective function ``getLogFunc(point)`` and, getLogFunc(point) represents the user's objective function that is to be sampled. This function must take a single input argument ``point`` of type numpy-float64 array of length ndim and must return the natural logarithm of the objective function. mpiEnabled is a logical (boolean) indicator that, if ``True``, will cause the ParaDRAM simulation to run in parallel on the requested number of processors. The default value is ``False``. The above will generate a Parallel-ParaDRAM-simulation Python script in the current working directory of Python. Note the only difference between the serial and parallel simulation scripts: the extra line ``pmpd.mpiEnabled = True`` which forces the ParaMonte library to invoke the parallel sampler to run the simulation. Assuming that you already have an MPI runtime library installed on your system (see the **Tips on parallel usage** below), you can now execute this Python script file ``main.py`` in parallel in two ways: 1. from inside ipython or jupyter, type the following, .. code-block:: bash !mpiexec -n 3 python main_mpi.py 2. outside of Python environment, from within a Bash shell (on Linux or Mac) or, from within an Anaconda command prompt on Windows, type the following, .. code-block:: bash mpiexec -n 3 python main_mpi.py **Note:** On Windows platform, if you are using the Intel MPI library, we recommend that you also specify the extra flag -localonly, .. code-block:: bash mpiexec -localonly -n 3 python main_mpi.py This will cause the simulations to run in parallel only on a single node, but more importantly, it will also prevent the use of Hydra service and the requirement for its registration. If you are not on a Windows cluster, (e.g., you are using your personal device), then we highly recommend specifying this flag. In all cases in the above, the script ``main.py`` will run on 3 processors. Feel free to change the number of processors to any number desired. But do not request more than the available number of physical cores on your system. **Tips on parallel usage** For up-to-date detailed instructions on how to run simulations in parallel visit: https://www.cdslab.org/paramonte You can also use the following commands on the Python command-line, .. code-block:: python :linenos: ################################## import paramonte as pm pm.verify() # verify the existence of parallel simulation prerequisites ################################## to obtain specific information on how to run a parallel simulation, in particular, in relation to your current installation of ParaMonte. In general, for parallel simulations: 0. Ensure you need and will get a speedup by running the ParaDRAM sampler in parallel. Typically, if a single evaluation of the objective function takes much longer than a few milliseconds, your simulation may then benefit from the parallel run. 1. Ensure you have an MPI library installed, preferably, the Intel MPI runtime libraries. An MPI library should be automatically installed on your system with ParaMonte. If needed, you can download the Intel MPI library from their website and install it. 2. Ensure the ParaDRAM object property ``mpiEnabled`` is ``True`` (the default is ``False``). 3. Before running the parallel simulation, in particular, on Windows systems, you may need to define the necessary MPI environmental variables on your system. To get information on how to define the variables, use the paramonte module's function, ``verify()``, as described in the above. 4. Call your main Python code from a Python-aware mpiexec-aware command-line via, .. code-block:: bash mpi_launcher -n num_process python name_of_yor_python_code.py where, 1. "mpi_launcher" is the name of the MPI launcher of the MPI runtime library that you have installed. For example, the Intel MPI library's launcher is named mpiexec, also recognized by Microsoft, MPICH, and OpenMPI. Note that on supercomputers, the MPI launcher is usually something other than ``mpiexec``, for example: ``ibrun``, ``mpirun``, ... 2. "num_process" represents the number of cores on which you want to run the program. Replace this with the an integer number, like, 3 (meaning 3 cores). Do not assign more processes than the available number of physical cores on your device/cluster. Assigning more cores than physically available on your system will only slow down your simulation. Once the above script is saved in the file ``main_mpi.py``, open a Python-aware and MPI-aware command prompt to run the simulation in parallel via the MPI launcher, .. code-block:: bash mpiexec -n 3 python main_mpi.py This will execute the Python script ``main_mpi.py`` on three processes (images). Keep in mind that on Windows systems you may need to define MPI environmental variables before a parallel simulation, as described in the above. **ParaDRAM Class Attributes** See also: https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/ All input specifications (attributes) of a ParaDRAM simulation are optional. However, it is recommended that you provide as much information as possible about the specific ParaDRAM simulation and the objective function to be sampled via ParaDRAM simulation specifications. The ParaDRAM simulation specifications have lengthy comprehensive descriptions that appear in full in the output report file of every ParaDRAM simulation. The best way to learn about individual ParaDRAM simulation attributes is to a run a minimal serial simulation with the following Python script, .. code-block:: python :linenos: ################################## from paramonte import ParaDRAM pmpd = ParaDRAM() pmpd.spec.outputFileName = "./test" def getLogFunc(point): return -sum(point**2) pmpd.runSampler( ndim = 1, getLogFunc = getLogFunc ) ################################## Running this code will generate a set of simulation output files (in the current working directory of Python) that begin with the prefix ``test_process_1``. Among these, the file ``test_process_1_report.txt`` contains the full description of all input specifications of the ParaDRAM simulation as well as other information about the simulation results and statistics. **Parameters** None. The simulation specifications can be set once an object is instantiated. All simulation specification descriptions are collectively available at: https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/ Note that this is the new interface. The previous ParaDRAM class interface used to optionally take all simulation specifications as input. However, overtime, this approach has become more of liability than any potential benefit. All simulation specifications have to be now to be set solely after a ParaDRAM object is instantiated, instead of setting the specifications via the ParaDRAM class constructor. **Attributes** buildMode optional string argument with the default value "release". possible choices are: "debug" to be used for identifying sources of bug and causes of code crash. "release" to be used in all other normal scenarios for maximum runtime efficiency. mpiEnabled optional logical (boolean) indicator which is ``False`` by default. If it is set to ``True``, it will cause the ParaDRAM simulation to run in parallel on the requested number of processors. See the class documentation guidelines in the above for information on how to run a simulation in parallel. reportEnabled optional logical (boolean) indicator which is ``True`` by default. If it is set to ``True``, it will cause extensive guidelines to be printed on the standard output as the simulation or post-processing continues with hints on the next possible steps that could be taken in the process. If you do not need such help and information set this variable to ``False`` to silence all output messages. inputFile optional string input representing the path to an external input namelist of simulation specifications. **WARNING** **Use this optional argument only if you know the consequences**. Specifying an input file will cause the ParaDRAM sampler to ignore all other simulation specifications set by the user via the sampler instance's `spec`-component attributes. spec A frozen class containing all simulation specifications. All simulation attributes are by default set to appropriate values at runtime. To override the default simulation specifications, set the `spec` attributes to some desired values of your choice. For possible values, see: https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/ If you need help on any of the simulation specifications, try the supplied ``helpme()`` function in this component, like, .. code-block:: python :linenos: ################################## import paramonte as pm pmpd = pm.ParaDRAM() # instantiate a ParaDRAM sampler class pmpd.spec.helpme() # get help on all simulation specification pmpd.spec.helpme("chainSize") # get help on "chainSize" specifically ################################## **Methods** See below for information on the methods. **Returns** Object of class ParaDRAM sampler. --------------------------------------------------------------------------- """ def __init__(self): """ The constructor for ParaDRAM class. All input parameters are optional and all class attributes can be changed after the object construction. **Parameters** None """ super().__init__(methodName = "ParaDRAM") #### ParaMonte specifications # #self.spec = pm.utils.FrozenClass() # #### ParaMonte variables #self.spec.sampleSize = sampleSize #self.spec.randomSeed = randomSeed #self.spec.description = description #self.spec.outputFileName = outputFileName #self.spec.outputDelimiter = outputDelimiter #self.spec.chainFileFormat = chainFileFormat #self.spec.variableNameList = variableNameList #self.spec.restartFileFormat = restartFileFormat #self.spec.outputColumnWidth = outputColumnWidth #self.spec.outputRealPrecision = outputRealPrecision #self.spec.silentModeRequested = silentModeRequested #self.spec.domainLowerLimitVec = domainLowerLimitVec #self.spec.domainUpperLimitVec = domainUpperLimitVec #self.spec.parallelizationModel = parallelizationModel #self.spec.progressReportPeriod = progressReportPeriod #self.spec.targetAcceptanceRate = targetAcceptanceRate #self.spec.mpiFinalizeRequested = mpiFinalizeRequested #self.spec.maxNumDomainCheckToWarn = maxNumDomainCheckToWarn #self.spec.maxNumDomainCheckToStop = maxNumDomainCheckToStop #### ParaMCMC variables #self.spec.chainSize = chainSize #self.spec.scaleFactor = scaleFactor #self.spec.startPointVec = startPointVec #self.spec.proposalModel = proposalModel #self.spec.proposalStartCovMat = proposalStartCovMat #self.spec.proposalStartCorMat = proposalStartCorMat #self.spec.proposalStartStdVec = proposalStartStdVec #self.spec.sampleRefinementCount = sampleRefinementCount #self.spec.sampleRefinementMethod = sampleRefinementMethod #self.spec.randomStartPointRequested = randomStartPointRequested #self.spec.randomStartPointDomainLowerLimitVec = randomStartPointDomainLowerLimitVec #self.spec.randomStartPointDomainUpperLimitVec = randomStartPointDomainUpperLimitVec #### ParaDRAM variables #self.spec.adaptiveUpdateCount = adaptiveUpdateCount #self.spec.adaptiveUpdatePeriod = adaptiveUpdatePeriod #self.spec.greedyAdaptationCount = greedyAdaptationCount #self.spec.delayedRejectionCount = delayedRejectionCount #self.spec.burninAdaptationMeasure = burninAdaptationMeasure #self.spec.delayedRejectionScaleFactorVec = delayedRejectionScaleFactorVec # #self.spec.helpme = SpecDRAM.helpme #self.spec._freeze() ################################################################################################################################ #### runSampler ################################################################################################################################
[docs] def runSampler ( self , ndim : int , getLogFunc : tp.Callable[[tp.List[float]], float] , inputFile : tp.Optional[str] = None ) -> None: """ Run ParaDRAM sampler and return nothing. **Parameters** ndim An integer representing the number of dimensions of the domain of the user's objective function ``getLogFunc(point)``. It must be a positive integer. getLogFunc(point) represents the user's objective function to be sampled, which must take a single input argument ``point`` of type numpy-float64 array of length ``ndim`` and must return the natural logarithm of the objective function. inputFile (optional) A string input representing the path to an external input namelist of simulation specifications. **WARNING** Use this optional argument with caution and only if you know what you are doing. Specifying this option will cause the sampler to ignore all other simulation specifications set by the user via the ``spec`` component of the sampler instance. **Returns** None """ if not isinstance(ndim,int) or ndim<1: pm.abort( msg = "The input argument ndim must be a positive integer," + newline + "representing the number of dimensions of the domain of" + newline + "the user's objective function getLogFunc()." + newline + "You have entered ndim = " + str(ndim) , methodName = self._methodName , marginTop = 1 , marginBot = 1 ) if not callable(getLogFunc): pm.abort( msg = "The input argument getLogFunc must be a callable function." + newline + "It represents the user's objective function to be sampled," + newline + "which must take a single input argument of type numpy" + newline + "float64 array of length ndim and must return the" + newline + "natural logarithm of the objective function." , methodName = self._methodName , marginTop = 1 , marginBot = 1 ) if inputFile is not None and not isinstance(inputFile,str): pm.abort( msg = "The input argument ``inputFile`` must be of type str." + newline + "It is an optional string input representing the path to" + newline + "an external input namelist of simulation specifications." + newline + "USE THIS OPTIONAL ARGUMENT WITH CAUTION AND" + newline + "ONLY IF YOU KNOW WHAT YOU ARE DOING." + newline + "Specifying this option will cause the sampler to ignore" + newline + "all other simulation specifications set by the user via" + newline + "the ``spec`` component of the sampler instance." + newline + "You have entered inputFile = " + str(inputFile) , methodName = self._methodName , marginTop = 1 , marginBot = 1 ) def getLogFunc2arg(ndim,point): PointVec = np.array(point[0:ndim]) return getLogFunc(PointVec) self._runSampler( ndim , getLogFunc2arg , inputFile )
################################################################################################################################ #### readMarkovChain ################################################################################################################################
[docs] def readMarkovChain ( self , file : tp.Optional[str] = None , delimiter : tp.Optional[str] = None , parseContents : tp.Optional[bool] = True , renabled : tp.Optional[bool] = False ) -> tp.List[TabularFileContents] : """ Return a list of the unweighted verbose (Markov-chain) contents of a set of ParaDRAM output chain files, whose names begin the user-provided input variable ``file``. This method is to be only used for the postprocessing of the output chain file(s) of an already finished ParaDRAM simulation. It is not meant to be called by all processes in parallel mode, although it is possible. **Parameters** file (optional) A string representing the path to the chain file with the default value of ``None``. The path only needs to uniquely identify the simulation to which the chain file belongs. For example, specifying ``"./mydir/mysim"`` as input will lead to a search for a file that begins with ``"mysim"`` and ends with ``"_chain.txt"`` inside the directory ``"./mydir/"``. If there are multiple files with such name, then all of them will be read and returned as a list. If this input argument is not provided by the user, the value of the object attribute ``outputFileName`` will be used instead. At least one of the two mentioned routes must provide the path to the chain file otherwise, this method will break by calling ``sys.exit()``. delimiter (optional) An input string representing the delimiter used in the output chain file. If it is not provided as input argument, the value of the corresponding object attribute ``outputDelimiter`` will be used instead. If none of the two are available, the default comma delimiter ``","`` will be assumed and used. parseContents (optional) If set to ``True``, the contents of the file will be parsed and stored in a component of the object named ``contents``. The default value is ``True``. renabled (optional) If set to False, the contents of the file(s) will be stored as a list in a (new) component of the ParaDRAM object named ``markovChainList`` and ``None`` will be the return value of the method. If set to True, the reverse will done. The default value is ``False``. **Returns** A list of objects, each of which has the following properties: file The full absolute path to the chain file. delimiter The delimiter used in the chain file. ndim The number of dimensions of the domain of the objective function from which the chain has been drawn. count The number of unique (weighted) points in the chain file. This is essentially the number of rows in the chain file minus one (representing the header line). plot A structure containing the graphics tools for the visualization of the contents of the file. df The unweighted (Markovian) contents of the chain file in the form of a pandas-library DataFrame (hence called ``df``). contents corresponding to each column in the progress file, a property with the same name as the column header is also created for the object which contains the data stored in that column of the progress file. These properties are all stored in the attribute ``contents``. If ``renabled = True``, the list of objects will be returned as the return value of the method. Otherwise, the list will be stored in a component of the ParaDRAM object named ``markovChainList``. """ return self._readTabular( file = file , fileType = "markovChain" , delimiter = delimiter , parseContents = parseContents , renabled = renabled )
################################################################################################################################