Source code for paramonte._paradram

####################################################################################################################################
####################################################################################################################################
####
####   MIT License
####
####   ParaMonte: plain powerful parallel Monte Carlo library.
####
####   Copyright (C) 2012-present, The Computational Data Science Lab
####
####   This file is part of the ParaMonte library.
####
####   Permission is hereby granted, free of charge, to any person obtaining a
####   copy of this software and associated documentation files (the "Software"),
####   to deal in the Software without restriction, including without limitation
####   the rights to use, copy, modify, merge, publish, distribute, sublicense,
####   and/or sell copies of the Software, and to permit persons to whom the
####   Software is furnished to do so, subject to the following conditions:
####
####   The above copyright notice and this permission notice shall be
####   included in all copies or substantial portions of the Software.
####
####   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
####   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
####   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
####   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
####   DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
####   OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
####   OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
####
####   ACKNOWLEDGMENT
####
####   ParaMonte is an honor-ware and its currency is acknowledgment and citations.
####   As per the ParaMonte library license agreement terms, if you use any parts of
####   this library for any purposes, kindly acknowledge the use of ParaMonte in your
####   work (education/research/industry/development/...) by citing the ParaMonte
####   library as described on this page:
####
####       https://github.com/cdslaborg/paramonte/blob/3548c097f2a25dfc0613061800656d27d0e0ddbe/ACKNOWLEDGMENT.md
####
####################################################################################################################################
####################################################################################################################################

import numpy as np
import typing as tp

from _ParaMonteSampler import ParaMonteSampler
from _TabularFileContents import TabularFileContents
import _paramonte as pm

newline = pm.newline

####################################################################################################################################
#### ParaDRAM class
####################################################################################################################################

[docs]class ParaDRAM(ParaMonteSampler):
    """

    This is the **ParaDRAM** class to generate instances of **serial** and **parallel**
    **Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo**
    sampler class of the ParaMonte library. The ``ParaDRAM`` class is a
    child of the ``ParaMonteSampler`` class.

    All ParaDRAM class attributes are optional and all
    attributes can be set after a ParaDRAM instance
    is returned by the constructor.

    Once you set the optional attributes to your desired values,
    call the ParaDRAM sampler via the object's method ``runSampler()``.

    .. _example-serial-usage:

    **Example serial usage**

    Copy and paste the following code enclosed between the
    two comment lines in your python/ipython/jupyter session
    (ensure the indentations of the pasted lines comply with Python rules):

    .. code-block:: python
        :linenos:

        ##################################
        import numpy as np
        import paramonte as pm
        def getLogFunc(point):
            # return the log of a multivariate Normal
            # density function with ndim dimensions
            return -0.5 * np.dot(point, point)
        pmpd = pm.ParaDRAM()
        pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function
                        , getLogFunc = getLogFunc   # the objective function
                        )
        ##################################

    where,

        ndim

            represents the number of dimensions of the domain of
            the user's objective function ``getLogFunc(point)`` and,

        getLogFunc(point)

            represents the user's objective function to be sampled,
            which must take a single input argument ``point`` of type
            numpy-float64 array of length ``ndim`` and must return the
            natural logarithm of the objective function.

    .. _example-parallel-usage:

    **Example parallel usage**

    Copy and paste the following code enclosed between the
    two comment lines in your python/ipython/jupyter session
    (ensure the indentations of the pasted lines comply with Python rules):

    .. code-block:: python
        :linenos:

        ##################################
        with open("main_mpi.py", "w") as file:
            file.write  ('''
        import numpy as np
        import paramonte as pm
        def getLogFunc(point):
            # return the log of the standard multivariate
            # Normal density function with ndim dimensions
            return -0.5 * np.dot(point, point)
        pmpd = pm.ParaDRAM()
        pmpd.mpiEnabled = True
        pmpd.runSampler ( ndim = 4 # assume 4-dimensional objective function
                        , getLogFunc = getLogFunc   # the objective function
                        )
        ''')
        ##################################

    where,

        ndim

            represents the number of dimensions of the domain of
            the user's objective function ``getLogFunc(point)`` and,

        getLogFunc(point)

            represents the user's objective function that is to be sampled.
            This function must take a single input argument ``point`` of type
            numpy-float64 array of length ndim and must return the natural
            logarithm of the objective function.

        mpiEnabled

            is a logical (boolean) indicator that, if ``True``, will
            cause the ParaDRAM simulation to run in parallel
            on the requested number of processors.
            The default value is ``False``.

    The above will generate a Parallel-ParaDRAM-simulation Python script in the
    current working directory of Python. Note the only difference between the
    serial and parallel simulation scripts: the extra line ``pmpd.mpiEnabled = True``
    which forces the ParaMonte library to invoke the parallel sampler to run the simulation.

    Assuming that you already have an MPI runtime library installed on your
    system (see the **Tips on parallel usage** below), you can now execute this
    Python script file ``main.py`` in parallel in two ways:

    1.  from inside ipython or jupyter, type the following,

        .. code-block:: bash

            !mpiexec -n 3 python main_mpi.py

    2.  outside of Python environment,
        from within a Bash shell (on Linux or Mac) or,
        from within an Anaconda command prompt on Windows,
        type the following,

        .. code-block:: bash

            mpiexec -n 3 python main_mpi.py

    **Note:**

    On Windows platform, if you are using the Intel MPI library,
    we recommend that you also specify the extra flag -localonly,

    .. code-block:: bash

        mpiexec -localonly -n 3 python main_mpi.py

    This will cause the simulations to run in parallel only on a single node,
    but more importantly, it will also prevent the use of Hydra service and
    the requirement for its registration. If you are not on a Windows cluster,
    (e.g., you are using your personal device), then we highly recommend
    specifying this flag.


    In all cases in the above, the script ``main.py`` will run on 3 processors.
    Feel free to change the number of processors to any number desired. But do
    not request more than the available number of physical cores on your system.

    **Tips on parallel usage**

    For up-to-date detailed instructions on how to run simulations in parallel visit:

        https://www.cdslab.org/paramonte

    You can also use the following commands on the Python command-line,

    .. code-block:: python
        :linenos:

        ##################################
        import paramonte as pm
        pm.verify() # verify the existence of parallel simulation prerequisites
        ##################################

    to obtain specific information on how to run a parallel simulation,
    in particular, in relation to your current installation of ParaMonte.
    In general, for parallel simulations:

    0.  Ensure you need and will get a speedup by running the ParaDRAM sampler
        in parallel. Typically, if a single evaluation of the objective function
        takes much longer than a few milliseconds, your simulation may then
        benefit from the parallel run.

    1.  Ensure you have an MPI library installed, preferably, the Intel MPI
        runtime libraries. An MPI library should be automatically installed
        on your system with ParaMonte. If needed, you can download the Intel
        MPI library from their website and install it.

    2.  Ensure the ParaDRAM object property ``mpiEnabled`` is ``True``
        (the default is ``False``).

    3.  Before running the parallel simulation, in particular, on Windows systems,
        you may need to define the necessary MPI environmental variables on your system.
        To get information on how to define the variables, use the paramonte module's
        function, ``verify()``, as described in the above.

    4.  Call your main Python code from a Python-aware mpiexec-aware command-line via,

        .. code-block:: bash

            mpi_launcher -n num_process python name_of_yor_python_code.py

        where,

        1.  "mpi_launcher" is the name of the MPI launcher
            of the MPI runtime library that you have installed.
            For example, the Intel MPI library's launcher is named mpiexec,
            also recognized by Microsoft, MPICH, and OpenMPI.
            Note that on supercomputers, the MPI launcher is usually
            something other than ``mpiexec``, for example:
            ``ibrun``, ``mpirun``, ...

        2.  "num_process" represents the number of cores
            on which you want to run the program. Replace this
            with the an integer number, like, 3 (meaning 3 cores).

            Do not assign more processes than the available number of
            physical cores on your device/cluster. Assigning more cores
            than physically available on your system will only slow down
            your simulation.

    Once the above script is saved in the file ``main_mpi.py``, open a Python-aware and
    MPI-aware command prompt to run the simulation in parallel via the MPI launcher,

    .. code-block:: bash

        mpiexec -n 3 python main_mpi.py

    This will execute the Python script ``main_mpi.py`` on three processes (images).
    Keep in mind that on Windows systems you may need to define MPI environmental
    variables before a parallel simulation, as described in the above.

    **ParaDRAM Class Attributes**

    See also:

        https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/

    All input specifications (attributes) of a ParaDRAM simulation are optional.
    However, it is recommended that you provide as much information as possible
    about the specific ParaDRAM simulation and the objective function to be sampled
    via ParaDRAM simulation specifications.

    The ParaDRAM simulation specifications have lengthy comprehensive descriptions
    that appear in full in the output report file of every ParaDRAM simulation.

    The best way to learn about individual ParaDRAM simulation attributes
    is to a run a minimal serial simulation with the following Python script,

    .. code-block:: python
        :linenos:

        ##################################
        from paramonte import ParaDRAM
        pmpd = ParaDRAM()
        pmpd.spec.outputFileName = "./test"
        def getLogFunc(point): return -sum(point**2)
        pmpd.runSampler( ndim = 1, getLogFunc = getLogFunc )
        ##################################

    Running this code will generate a set of simulation output files (in the current
    working directory of Python) that begin with the prefix ``test_process_1``. Among
    these, the file ``test_process_1_report.txt`` contains the full description of all
    input specifications of the ParaDRAM simulation as well as other information
    about the simulation results and statistics.

    **Parameters**

        None. The simulation specifications can be set once an object is instantiated.
        All simulation specification descriptions are collectively available at:

            https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/

        Note that this is the new interface. The previous ParaDRAM class interface
        used to optionally take all simulation specifications as input. However,
        overtime, this approach has become more of liability than any potential
        benefit. All simulation specifications have to be now to be set solely
        after a ParaDRAM object is instantiated, instead of setting the
        specifications via the ParaDRAM class constructor.

    **Attributes**

        buildMode

            optional string argument with the default value "release".
            possible choices are:

                "debug"

                    to be used for identifying sources of bug
                    and causes of code crash.

                "release"

                    to be used in all other normal scenarios
                    for maximum runtime efficiency.

        mpiEnabled

            optional logical (boolean) indicator which is ``False`` by default.
            If it is set to ``True``, it will cause the ParaDRAM simulation
            to run in parallel on the requested number of processors.
            See the class documentation guidelines in the above for
            information on how to run a simulation in parallel.

        reportEnabled

            optional logical (boolean) indicator which is ``True`` by default.
            If it is set to ``True``, it will cause extensive guidelines to be
            printed on the standard output as the simulation or post-processing
            continues with hints on the next possible steps that could be taken
            in the process. If you do not need such help and information set
            this variable to ``False`` to silence all output messages.

        inputFile

            optional string input representing the path to
            an external input namelist of simulation specifications.

            **WARNING**

            **Use this optional argument only if you know the consequences**.
            Specifying an input file will cause the ParaDRAM sampler to ignore
            all other simulation specifications set by the user via the sampler
            instance's `spec`-component attributes.

        spec

            A frozen class containing all simulation specifications.
            All simulation attributes are by default set to appropriate
            values at runtime. To override the default simulation
            specifications, set the `spec` attributes to some
            desired values of your choice. For possible values, see:

                https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/

            If you need help on any of the simulation specifications, try
            the supplied ``helpme()`` function in this component, like,

            .. code-block:: python
                :linenos:

                ##################################
                import paramonte as pm
                pmpd = pm.ParaDRAM()          # instantiate a ParaDRAM sampler class
                pmpd.spec.helpme()            # get help on all simulation specification
                pmpd.spec.helpme("chainSize") # get help on "chainSize" specifically
                ##################################

    **Methods**

        See below for information on the methods.

    **Returns**

        Object of class ParaDRAM sampler.

    ---------------------------------------------------------------------------
    """

    def __init__(self):
        """

        The constructor for ParaDRAM class.
        All input parameters are optional and all class
        attributes can be changed after the object construction.

            **Parameters**

                None

        """

        super().__init__(methodName = "ParaDRAM")

        #### ParaMonte specifications
        #
        #self.spec = pm.utils.FrozenClass()
        #
        #### ParaMonte variables
        #self.spec.sampleSize                           = sampleSize
        #self.spec.randomSeed                           = randomSeed
        #self.spec.description                          = description
        #self.spec.outputFileName                       = outputFileName
        #self.spec.outputDelimiter                      = outputDelimiter
        #self.spec.chainFileFormat                      = chainFileFormat
        #self.spec.variableNameList                     = variableNameList
        #self.spec.restartFileFormat                    = restartFileFormat
        #self.spec.outputColumnWidth                    = outputColumnWidth
        #self.spec.outputRealPrecision                  = outputRealPrecision
        #self.spec.silentModeRequested                  = silentModeRequested
        #self.spec.domainLowerLimitVec                  = domainLowerLimitVec
        #self.spec.domainUpperLimitVec                  = domainUpperLimitVec
        #self.spec.parallelizationModel                 = parallelizationModel
        #self.spec.progressReportPeriod                 = progressReportPeriod
        #self.spec.targetAcceptanceRate                 = targetAcceptanceRate
        #self.spec.mpiFinalizeRequested                 = mpiFinalizeRequested
        #self.spec.maxNumDomainCheckToWarn              = maxNumDomainCheckToWarn
        #self.spec.maxNumDomainCheckToStop              = maxNumDomainCheckToStop
        #### ParaMCMC variables
        #self.spec.chainSize                            = chainSize
        #self.spec.scaleFactor                          = scaleFactor
        #self.spec.startPointVec                        = startPointVec
        #self.spec.proposalModel                        = proposalModel
        #self.spec.proposalStartCovMat                  = proposalStartCovMat
        #self.spec.proposalStartCorMat                  = proposalStartCorMat
        #self.spec.proposalStartStdVec                  = proposalStartStdVec
        #self.spec.sampleRefinementCount                = sampleRefinementCount
        #self.spec.sampleRefinementMethod               = sampleRefinementMethod
        #self.spec.randomStartPointRequested            = randomStartPointRequested
        #self.spec.randomStartPointDomainLowerLimitVec  = randomStartPointDomainLowerLimitVec
        #self.spec.randomStartPointDomainUpperLimitVec  = randomStartPointDomainUpperLimitVec
        #### ParaDRAM variables
        #self.spec.adaptiveUpdateCount                  = adaptiveUpdateCount
        #self.spec.adaptiveUpdatePeriod                 = adaptiveUpdatePeriod
        #self.spec.greedyAdaptationCount                = greedyAdaptationCount
        #self.spec.delayedRejectionCount                = delayedRejectionCount
        #self.spec.burninAdaptationMeasure              = burninAdaptationMeasure
        #self.spec.delayedRejectionScaleFactorVec       = delayedRejectionScaleFactorVec
        #
        #self.spec.helpme = SpecDRAM.helpme
        #self.spec._freeze()

    ################################################################################################################################
    #### runSampler
    ################################################################################################################################

[docs]    def runSampler  ( self
                    , ndim          : int
                    , getLogFunc    : tp.Callable[[tp.List[float]], float]
                    , inputFile     : tp.Optional[str] = None
                    ) -> None:
        """

        Run ParaDRAM sampler and return nothing.

            **Parameters**

                ndim

                    An integer representing the number of dimensions of the
                    domain of the user's objective function ``getLogFunc(point)``.
                    It must be a positive integer.

                getLogFunc(point)

                    represents the user's objective function to be sampled,
                    which must take a single input argument ``point`` of type
                    numpy-float64 array of length ``ndim`` and must return the
                    natural logarithm of the objective function.

                inputFile (optional)

                    A string input representing the path to an external
                    input namelist of simulation specifications.

                        **WARNING**

                        Use this optional argument with caution and only
                        if you know what you are doing. Specifying this option
                        will cause the sampler to ignore all other simulation
                        specifications set by the user via the ``spec``
                        component of the sampler instance.

            **Returns**

                None

        """

        if not isinstance(ndim,int) or ndim<1:
            pm.abort( msg   = "The input argument ndim must be a positive integer," + newline
                            + "representing the number of dimensions of the domain of" + newline
                            + "the user's objective function getLogFunc()." + newline
                            + "You have entered ndim = " + str(ndim)
                    , methodName = self._methodName
                    , marginTop = 1
                    , marginBot = 1
                    )

        if not callable(getLogFunc):
            pm.abort( msg   = "The input argument getLogFunc must be a callable function." + newline
                            + "It represents the user's objective function to be sampled," + newline
                            + "which must take a single input argument of type numpy" + newline
                            + "float64 array of length ndim and must return the" + newline
                            + "natural logarithm of the objective function."
                    , methodName = self._methodName
                    , marginTop = 1
                    , marginBot = 1
                    )

        if inputFile is not None and not isinstance(inputFile,str):
            pm.abort( msg   = "The input argument ``inputFile`` must be of type str." + newline
                            + "It is an optional string input representing the path to" + newline
                            + "an external input namelist of simulation specifications." + newline
                            + "USE THIS OPTIONAL ARGUMENT WITH CAUTION AND" + newline
                            + "ONLY IF YOU KNOW WHAT YOU ARE DOING." + newline
                            + "Specifying this option will cause the sampler to ignore" + newline
                            + "all other simulation specifications set by the user via" + newline
                            + "the ``spec`` component of the sampler instance." + newline
                            + "You have entered inputFile = " + str(inputFile)
                    , methodName = self._methodName
                    , marginTop = 1
                    , marginBot = 1
                    )

        def getLogFunc2arg(ndim,point):
            PointVec = np.array(point[0:ndim])
            return getLogFunc(PointVec)

        self._runSampler( ndim
                        , getLogFunc2arg
                        , inputFile
                        )

    ################################################################################################################################
    #### readMarkovChain
    ################################################################################################################################

[docs]    def readMarkovChain ( self
                        , file          : tp.Optional[str] = None
                        , delimiter     : tp.Optional[str] = None
                        , parseContents : tp.Optional[bool] = True
                        , renabled      : tp.Optional[bool] = False
                        ) -> tp.List[TabularFileContents] :
        """

        Return a list of the unweighted verbose (Markov-chain) contents
        of a set of ParaDRAM output chain files, whose names begin the
        user-provided input variable ``file``. This method is to be only
        used for the postprocessing of the output chain file(s) of an
        already finished ParaDRAM simulation. It is not meant to be called
        by all processes in parallel mode, although it is possible.

            **Parameters**

                file (optional)

                    A string representing the path to the chain file with
                    the default value of ``None``.
                    The path only needs to uniquely identify the simulation
                    to which the chain file belongs. For example, specifying
                    ``"./mydir/mysim"`` as input will lead to a search for a file
                    that begins with ``"mysim"`` and ends with ``"_chain.txt"``
                    inside the directory ``"./mydir/"``. If there are multiple
                    files with such name, then all of them will be read
                    and returned as a list.
                    If this input argument is not provided by the user, the
                    value of the object attribute ``outputFileName`` will be
                    used instead. At least one of the two mentioned routes
                    must provide the path to the chain file otherwise,
                    this method will break by calling ``sys.exit()``.

                delimiter (optional)

                    An input string representing the delimiter used in the
                    output chain file. If it is not provided as input argument, the
                    value of the corresponding object attribute ``outputDelimiter``
                    will be used instead. If none of the two are available,
                    the default comma delimiter ``","`` will be assumed and used.

                parseContents (optional)

                    If set to ``True``, the contents of the file will be parsed
                    and stored in a component of the object named ``contents``.
                    The default value is ``True``.

                renabled (optional)

                    If set to False, the contents of the file(s) will be stored
                    as a list in a (new) component of the ParaDRAM object named
                    ``markovChainList`` and ``None`` will be the return value
                    of the method. If set to True, the reverse will done.
                    The default value is ``False``.

            **Returns**

                A list of objects, each of which has the following properties:

                    file

                        The full absolute path to the chain file.

                    delimiter

                        The delimiter used in the chain file.

                    ndim

                        The number of dimensions of the domain of the objective
                        function from which the chain has been drawn.

                    count

                        The number of unique (weighted) points in the chain file.
                        This is essentially the number of rows in the chain file
                        minus one (representing the header line).

                    plot

                        A structure containing the graphics tools for the
                        visualization of the contents of the file.

                    df

                        The unweighted (Markovian) contents of the chain file in the
                        form of a pandas-library DataFrame (hence called ``df``).

                    contents

                        corresponding to each column in the progress file, a property
                        with the same name as the column header is also created
                        for the object which contains the data stored in that column
                        of the progress file. These properties are all stored in the
                        attribute ``contents``.

                If ``renabled = True``, the list of objects will be returned as the
                return value of the method. Otherwise, the list will be stored in a
                component of the ParaDRAM object named ``markovChainList``.

        """

        return self._readTabular( file = file
                                , fileType = "markovChain"
                                , delimiter = delimiter
                                , parseContents = parseContents
                                , renabled = renabled
                                )

    ################################################################################################################################