Encapsulation | Data Science with Python

This note provides a brief review of the notion of encapsulation and data hiding in Object-Oriented Programming (OOP) paradigm via Python.

Encapsulation and data hiding

Classes can be used for many purposes in scientific programming and computation. One of the most frequently encountered tasks is to represent mathematical functions that have a set of parameters in addition to one or more independent variables. These functions, together with all of their essential and auxiliary variables have to be frequently passed to other functions to perform some tasks. Implementing such problems using purely procedural programming tools that we have learned so far can lead to the development of an unsafe, unclean, and undesired codebase, which is not easy to work with either. To understand why and how this happens, consider the following scientific problem.

A common programming challenge in numerical computing

Consider a function that takes in some parameters as input, for example, the equation of motion of a stone thrown upward in the air. The physics equation describing this motion is $y(t) = v_0t-\frac{1}{2}gt^2$. This equation gives the position $y$ from the ground as a function of time. Therefore, in physics, $y$ is viewed as a function of $t$.

Mathematically speaking, however, $y$ also depends on two other parameters, $v_0$ and $g$, although it is unnatural to view $y$ as a function of these parameters. One can therefore write $f(t;v_0,g)$ to emphasize that $t$ is the independent variable, while $v_0$ and $g$ are the parameters of the projectile motion model that we have proposed in the above. Strictly speaking, $g$ is a fixed parameter (as long as the experiment is done on the surface of the earth), so only $v_0$ and $t$ can be arbitrarily chosen in the formula.

It would then be better to write $y(t;v_0)$. Here is an implementation of this function,

def getHeight(time,initVelocity):
    gravityConstant = 9.81
    return initVelocity * time - 0.5*gravityConstant*time**2

getHeight(1,10)

5.095

This function gives the height of the projectile as a function of time. Now suppose we want to differentiate height ($y(t)$) with respect to time ($t$) in order to obtain the velocity at the given moment in time. We could write the following generic code for differentiation to do so,

def getDiff(getFunc, x, deltaX = 1.e-5):
    return ( getFunc(x+deltaX) - getFunc(x) ) / deltaX

But, here is the catch with this problem of differentiation. The getDiff() function works with any function getFunc() that takes only one argument. In other words, if we want to input getHeight() to getDiff(), then we will have to redefine getHeight() so that it takes only one argument. You may wonder why not change getDiff(). For this simple problem, this could be a solution. But, for larger problems, you are more likely to use the sophisticated routines and modules that have been already developed by the community. Many of these routines and libraries are not aware of the specific problem that you are dealing with. Therefore, they write generic library functions that take an input function with a specific interface, in this case, a differentiation function that takes an input differential function that only has one input parameter. This situation is frequently encountered in the case with high-performance integration routines.

One, perhaps bad, solution to the above problem is to use global variables. To do so, we can define a wrapper function that wraps around getHeight() and hides the extra input parameter initVelocity from the view of getDiff(),

# wrapper function for getHeight
def getHeightWrapper(time):
    return getHeight(time,initVelocity)

This function will work only if initVelocity is a global variable, initialized before any attempts to call the function getHeightWrapper(). Here is an example call where getDiff() differentiates y,

initVelocity = 10 # note that initVelocity has be passed globally
getHeightWrapper(1)

5.095

Now, we can pass this wrapper function to getDiff() to take its derivative with respect to time,

initVelocity = 10 # note that initVelocity has be passed globally
getDiff(getHeightWrapper,1)

0.18995094990259528

The use of global variables is generally considered bad programming. Why global variables are problematic in the present case can be illustrated when there is a need to work with several versions of a function. Suppose we want to work with two versions of getHeight(time,initVelocity), one with initVelocity=10 and one with initVelocity=5. Every time we call getHeight(), we must remember which version of the function we work with, and set initVelocity accordingly prior to the call,

initVelocity = 10
print( getDiff(getHeightWrapper,1) )
initVelocity = 5
print( getDiff(getHeightWrapper,1) )

0.18995094990259528  
-4.810049050085752

Another problem is that the variable names such as initVelocity are now exposed in the code and could potentially overwrite (or get overwritten by) some other variables. In the best case scenario, such name clashes will cause a syntax or runtime error, but frequently, they go unnoticed in the code causing the program to yield wrong results which, depending on the context in which they occur, could be devastating.

Another major problem with global variables is that they could cause side effects. For example, if the value of initVelocity is mistakenly changed inside the function getHeight() it will remain unnoticed and the change affects other parts of the program in an unintentional way. This is one reason why a golden rule of programming tells us to limit the use of global variables as much as possible.

So, is there a good remedy? The answer is yes: the class concept solves all the problems described above.

Class representation of a function

A class contains a set of variables (data) and a set of functions, held together as one unit. The variables are visible in all the functions in the class. That is, we can view the variables as “global” in these functions. These characteristics also apply to modules, and modules can be used to obtain many of the same advantages as classes offer. However, classes are technically very different from modules. You can also make many copies of a class, while there can be only one copy of a module. When you master both modules and classes, you will clearly see the similarities and differences. Now we continue with a specific example of a class.

Consider the function $y(t;v_0) = v_0t - \frac{1}{2}gt^2$. We may say that $v_0$ and $g$, represented by the variables initVelocity and gravityConstant, constitute the data. A Python function, say getHeight(time), is then needed to compute the value of $y(t;v_0)$ and this function must have access to the data initVelocity and gravityConstant, while time is an argument. A programmer experienced with classes will then suggest collecting the data initVelocity and gravityConstant, and the function getHeight(time), together as a class.

A class usually has another function, called constructor for initializing the data. The constructor is always named __init__. Every class must have a name, often starting with a capital letter. For our problem here, we can choose Projectile as the name of the class since it represents a (vertical) projectile motion. The next step is to implement this class in Python. A complete class code Projectile for our problem here in Python could look like the following,

class Projectile():

    gravityConstant = 9.81

    def __init__(self, initVelocity):
        self.initVelocity = initVelocity

    def getHeight(self,time):
        return self.initVelocity * time - 0.5 * self.gravityConstant * time**2

A class creates a new data type, here of name Projectile, so when we use the class to make objects, those objects are of type Projectile(). All the standard Python objects, such as lists, tuples, strings, floating-point numbers, integers, …, are built-in Python classes, and each time the user creates on these variable types, one instance of these classes is created by the Python interpreter. A user-defined object class (like Y) is usually called an instance. We need such an instance in order to use the data in the class and call the value function. The following statement constructs an instance of Projectile() bound to the variable named projectile,

projectile = Projectile(initVelocity=10)

Seemingly, we call the class Projectile() as if it were a function. Indeed, Projectile(3) is automatically translated by Python to a call to the constructor __init__() in class Projectile. The arguments in the call, here initVelocity, are always passed on as arguments to __init__() after the self argument. That is, initVelocity gets the value 10 and self is just dropped in the call. This may be confusing, but it is a rule that the self argument is never used in calls to functions in classes. With the instance projectile, we can compute the value of y(t=0.1;v_0=10) by the statement,

height = projectile.getHeight(0.1)
print(height)

0.95095

Note that the self input argument is dropped in the call to getHeight(). To access functions and variables in a class, one must prefix the function and variable names by the name of the instance and a dot: the value function is reached as projectile.getHeight, and the variables are reached as projectile.initVelocity and projectile.gravityConstant. One could, for example, print the value of initVelocity in the instance projectile by writing,

print(projectile.initVelocity)

We have already introduced the term instance for an object of a specific class. Functions in classes are commonly called methods, and variables (data) in classes are called data attributes. Methods are also known as method attributes. For example, in our sample class Projectile we have two methods or method attributes, __init__() and getHeight(), two data attributes, initVelocity and gravityConstant, and four attributes in total (__init__, getHeight, initVelocity, gravityConstant). Note that the names of attributes can be chosen freely, just as names of ordinary Python functions and variables. However, the constructor must have the name __init__(), otherwise it is not automatically called when new instances are created. You can do whatever you want in whatever method, but it is a common convention to use the constructor for initializing the variables in the class.

With this class, we can now call getDiff() to take the derivative of the height of the projectile motion we have defined, without the need to create global variables,

projectile = Projectile(initVelocity=10)
print( getDiff(projectile.getHeight,1) )

0.18995094990259528

Now, it may seem a bit redundant to type projectile.getHeight() to get the height. Amazingly, Python provides a neat shortcut for such instances via the __call__() method to create callable objects.

Callable objects

If you recall, computing the value of the mathematical function represented by class Projectile, with projectile as the name of the instance, is performed by writing projectile.getHeight(). If we could write just projectile(), the projectile instance would look like an ordinary function. Such a syntax is indeed possible and offered by the special method named __call__,

class Projectile():

    gravityConstant = 9.81

    def __init__(self, initVelocity):
        self.initVelocity = initVelocity

    def __call__(self, time):
        return self.getHeight(time)

    def getHeight(self,time):
        return self.initVelocity * time - 0.5 * self.gravityConstant * time**2

then writing,

projectile = Projectile(initVelocity=10)
print(projectile(1))
print(projectile.getHeight(1))

5.095  
5.095

would yield identical results. With this __call__ method, the getHeight method could be even considered as redundant and the class could be written more concisely as,

class Projectile():
    gravityConstant = 9.81
    def __init__(self, initVelocity): self.initVelocity = initVelocity
    def __call__(self, time): return self.initVelocity * time - 0.5 * self.gravityConstant * time**2

A good programming convention is to include a __call__ method in all classes that represent a mathematical function. Objects that are instances of classes with __call__ methods are said to be callable objects, just as plain functions are callable objects as well. The call syntax for callable objects is the same, regardless of whether the object is a function or a class instance.

You can always test if an instance is callable or not by callable(),

projectile = Projectile(initVelocity=10)
callable(projectile)

True