This lecture focuses on array computing and code vectorization, as well as methods of plotting data in Python.
Vectorization and array computing
With regards to capabilities of Python for scientific calculations, there are conflicting opinions. On the scientific side of the opinion spectrum, some people think that Python is not good enough for number crunching (as a result of which, new programming languages such as Julia have been developed). However, there are people at the other extreme who believe that Python is too much oriented towards scientific computation (as a result of which, new programming languages have emerged, such as Google’s Go language.
So far in this course, you may have noticed that all numerical vector calculations were either performed with lists, tuples, or dictionaries. Sadly, Python standard does not have an intrinsic special way of defining and manipulating numerical vectors and arrays, unlike most High Performance Computing (HPC) languages for scientific computations (such as Fortran, Ada, or C). However, there are powerful Python modules that enable a Python programmer to use Python efficiently for numerical analysis as well.
If you expect to use Python heavily and mostly for scientific computation in future, you should keep in mind that Python's builtin list, tuple and dictionary types can be very slow for number crunching.
Vectors, arrays and the Numerical Python (numpy) package
In Python, a list can be heterogeneous meaning that not all its elements are of the same type. An array object in Python can be viewed as a variant of a list, but with the following assumptions:
- All elements must be of the same type, preferably integer, real, or complex numbers, for efficient numerical computing and storage.
- The number of elements must be known when the array is created.
- Arrays are not part of standard Python – one needs an additional package called Numerical Python, often abbreviated as NumPy. The Python name of the package, to be used in import statements, is
numpy
. - With numpy, a wide range of mathematical operations can be done directly on complete arrays, thereby removing the need for loops over array elements. This is commonly called vectorization.
- Arrays with one index are often called vectors. Arrays with two indices are used as an efficient data structure for tables, instead of lists of lists. Arrays can also have three or more indices.
The number of elements of an array can be changed, but keep in mind that this can cause significant computational cost. Creating an array of a given length is frequently referred to as allocating the array. It means that a part of the computer’s memory is marked for being occupied by this array.
To create a numpy array, you will have to first import it,
import numpy as np
The tradition is to import numpy
as np
. To convert a list to a numpy array,
In [3]: import numpy as np
In [4]: a = [1,2,3,4,5]
In [5]: a = np.array(a)
In [6]: type(a)
Out[6]: numpy.ndarray
In [7]: a
Out[7]: array([1, 2, 3, 4, 5])
To create a new array of length n, filled with zeros,
a = np.zeros(n)
Note that numpy automatically identifies the appropriate type for all array elements, whether int
, float
, or etc.
In [10]: a[1]
Out[10]: 2
In [11]: type(a[1])
Out[11]: numpy.int32
Even if there is a single float
element in the list, then all elements in the list will be converted to float in the numpy array by default,
In [11]: type(a[1])
Out[11]: numpy.int32
In [12]: a = [1,2,3,4,5.0]
In [13]: a = np.array(a)
In [14]: type(a[1])
Out[14]: numpy.float64
If you want to get the desired element type, then you will have to ask numpy for it explicitly,
In [17]: a = [1,2,3.5,4.9,5.0]
In [18]: a = np.array(a, int) # convert all elements in the list to integer
In [19]: a
Out[19]: array([1, 2, 3, 4, 5])
You can see the full list of input arguments to np.array function here.
A similar function np.zeros_like(c)
generates an array of zeros where the length of the generated array is that of the input array c and the element type is the same as those in c.
In [33]: b = [1,2,3,4,5,6,7]
In [34]: a = np.zeros_like(b)
In [35]: a
Out[35]: array([0, 0, 0, 0, 0, 0, 0])
Often one wants an array to have $n$ elements with uniformly distributed values in an interval $[p,q]$. The numpy function linspace
creates such arrays,
In [36]: a = np.linspace(1, 100, 53)
In [37]: a
Out[37]:
array([ 1. , 2.90384615, 4.80769231, 6.71153846,
8.61538462, 10.51923077, 12.42307692, 14.32692308,
16.23076923, 18.13461538, 20.03846154, 21.94230769,
23.84615385, 25.75 , 27.65384615, 29.55769231,
31.46153846, 33.36538462, 35.26923077, 37.17307692,
39.07692308, 40.98076923, 42.88461538, 44.78846154,
46.69230769, 48.59615385, 50.5 , 52.40384615,
54.30769231, 56.21153846, 58.11538462, 60.01923077,
61.92307692, 63.82692308, 65.73076923, 67.63461538,
69.53846154, 71.44230769, 73.34615385, 75.25 ,
77.15384615, 79.05769231, 80.96153846, 82.86538462,
84.76923077, 86.67307692, 88.57692308, 90.48076923,
92.38461538, 94.28846154, 96.19230769, 98.09615385, 100. ])
Vectorization
Loops over very long arrays may run slowly. An advantage of arrays is that, with arrays, loops can be avoided the whole array be manipulated directly and simultaneously. If you are a Fortran programmer, you are likely already quite familiar with the powerful idea of array computing and vectorization. If not, then consider the following example,
x = np.linspace(0, 2, 201)
In [39]: x
Out[39]:
array([ 0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16,
0.18, 0.2 , 0.22, 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34,
0.36, 0.38, 0.4 , 0.42, 0.44, 0.46, 0.48, 0.5 , 0.52,
0.54, 0.56, 0.58, 0.6 , 0.62, 0.64, 0.66, 0.68, 0.7 ,
0.72, 0.74, 0.76, 0.78, 0.8 , 0.82, 0.84, 0.86, 0.88,
0.9 , 0.92, 0.94, 0.96, 0.98, 1. , 1.02, 1.04, 1.06,
1.08, 1.1 , 1.12, 1.14, 1.16, 1.18, 1.2 , 1.22, 1.24,
1.26, 1.28, 1.3 , 1.32, 1.34, 1.36, 1.38, 1.4 , 1.42,
1.44, 1.46, 1.48, 1.5 , 1.52, 1.54, 1.56, 1.58, 1.6 ,
1.62, 1.64, 1.66, 1.68, 1.7 , 1.72, 1.74, 1.76, 1.78,
1.8 , 1.82, 1.84, 1.86, 1.88, 1.9 , 1.92, 1.94, 1.96,
1.98, 2. ])
Now, if you wanted to calculate the sin
of the elements of x
in the traditional way, you would do,
In [41]: from math import sin
In [42]: sinX = [sin(i) for i in x]
This approach however, can be quite time consuming and computationally costly, because for-loops are very slow in Python, up to a few hundred times than what you get in Fortran or C.
A more appropriate solution to the above problem is use the sin
function from numpy module, which enables vectorization,
sinX = np.sin(x)
You see, with the above numpy call, there is no need for a for-loop. The above Python code is an example of a vectorized code and the previous code which contained for-loop is an example scalar code. The numpy functions are capable of handling arrays as input. Compare the performance of the two codes in the above example,
In [45]: %timeit np.sin(x)
The slowest run took 11.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.21 µs per loop
In [46]: %timeit [sin(i) for i in x]
10000 loops, best of 3: 23.1 µs per loop
The vectorized code in this example appears to be more than one order of magnitude (more than 10 times) faster than faster than the scalar version of the code.
Why is the vectorized code faster in Python?
The reason is that numpy uses precompiled Fortran and C loops to loop over the elements of the input array. loops in Fortran and C have far less overhead than loops in Python. Similar to the above example, you can define your own functions that are also vectorized, for example,
def f(x):
return x**2*np.exp(-x**2)
x = np.linspace(-3, 3, 101)
y = f(x)
The numpy package also has a method for Automatic vectorization of scalar functions (function that only take scalar arguments), for example,
func_vec = np.vectorize(func_scalar)
However, for serious programming, I do not recommend you to use this numpy functionality as it can be slow and inefficient.
vectorization of if-blocks
For vectorization of calculations involving booleans and if conditions, the solution can be problem dependent, but one common easy way of addressing simple boolean problems could be where
method in numpy package. For example, suppose you have an list of numbers and you would like to perform a task on all negative numbers in the array, say set them all to zero, and leave the positive numbers intact. One solution would be the following,
In [57]: x = np.array([1,-1,3,-5,-6,8,7,4,10])
In [58]: np.where(x<0,0,x)
Out[58]: array([ 1, 0, 3, 0, 0, 8, 7, 4, 10])
Aliasing vs. copying arrays
If you recall from lecture , there is a difference between aliasing and copying sequence objects in Python. The same rules also hold for numpy arrays, meaning that if you need an independent copy of an existing array, then you have to use copy
method to generate it,
In [63]: a = np.array([1,2,3,4,5])
In [64]: b = a.copy()
In [65]: b[0] = -1
In [66]: a
Out[66]: array([1, 2, 3, 4, 5])
In [67]: b
Out[67]: array([-1, 2, 3, 4, 5])
otherwise a simple equality assignment like b = a
will only create an alias for numpy array a
.
In [68]: a = np.array([1,2,3,4,5])
In [69]: b = a
In [70]: b[0] = -1
In [71]: a
Out[71]: array([-1, 2, 3, 4, 5])
In-place arithmetic in Python
Consider two arrays a
and b
of the same shape. The expression a += b
means a = a + b
. There are however hidden differences between the two. In the statement a = a + b
, the sum a + b
is first computed, yielding a new array, and then the name a
is bound to this new array. The old array a is lost unless there are other names assigned to this array. In the statement a += b
, elements of b
are added directly into the elements of a
(in memory). There is no hidden intermediate array as in a = a + b
. This implies that a += b
is more efficient than a = a + b
since Python avoids making an extra array. In other words, the operators +=, *=, and similar operators, perform in-place arithmetic in arrays.
Allocating arrays in Python
We have already seen in the above that the np.zeros
function is useful for making a new array of a given size. Very often the size and the type of array elements are known a priori or has to match another existing array’s shape and type b
. There are two ways of achieving this goal,
In [66]: a
Out[66]: array([1, 2, 3, 4, 5])
In [67]: b
Out[67]: array([-1, 2, 3, 4, 5])
In [68]: a
Out[68]: array([1, 2, 3, 4, 5])
In [69]: b = a.copy()
In [70]: c = np.zeros(a.shape, a.dtype)
In [71]: a.shape
Out[71]: (5,)
In [72]: a.
a.T a.argsort a.compress a.cumsum a.dumps a.imag a.min a.prod a.reshape a.shape a.sum a.tostring
a.all a.astype a.conj a.data a.fill a.item a.nbytes a.ptp a.resize a.size a.swapaxes a.trace
a.any a.base a.conjugate a.diagonal a.flags a.itemset a.ndim a.put a.round a.sort a.take a.transpose
a.argmax a.byteswap a.copy a.dot a.flat a.itemsize a.newbyteorder a.ravel a.searchsorted a.squeeze a.tobytes a.var
a.argmin a.choose a.ctypes a.dtype a.flatten a.max a.nonzero a.real a.setfield a.std a.tofile a.view
a.argpartition a.clip a.cumprod a.dump a.getfield a.mean a.partition a.repeat a.setflags a.strides a.tolist
Notice how the attribute a.dtype
(dtype standing for data type), and x.shape
(a tuple) were used in the above example. The shape attribute in array objects holds the shape, i.e., the size of each dimension. The method size
returns the total number of elements in the array.
Sometimes one may also want to ensure that an object is an array, and if not, turn it into an array. The np.asarray
function is useful in such cases,
a = np.asarray(a)
Note that one could have also use,
a = np.array(a)
There is no difference in the output, but note that the second approach does one redundant step, because in the first approach, if the input object is already an array, then there is no need in converting it to an array.
Multidimensional NumPy arrays
Creating multidimensional arrays is very much the same as vectors in numpy. The only thing to keep in mind is that the shape of the array is given as a tuple to np.array()
. For example, to initialize a 3D array of size (0:3,0:5,0:2), you would do,
In [86]: a = np.zeros((3,5,2))
In [87]: a
Out[87]:
array([[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
The arrays created so far have been of type ndarray
. NumPy also has a matrix type called matrix
or mat
for one- and two-dimensional arrays. One-dimensional arrays are then extended with one extra dimension such that they become matrices, i.e., either a row vector or a column vector,
In [99]: x1 = np.array([1, 2, 3], float)
In [100]: x2 = np.matrix(x1) # or np.mat(x1)
In [102]: x3 = np.mat(x1).T # transpose = column vector
In [103]: x3
Out[103]:
matrix([[ 1.],
[ 2.],
[ 3.]])
In [104]: type(x3)
Out[104]: numpy.matrixlib.defmatrix.matrix
A special feature of matrix objects in NumPy is that the multiplication operator represents the matrix-matrix, vector-matrix, or matrix-vector product as we know from linear algebra. However, keep in mind that the multiplication operator between standard ndarray objects is different from multiplication between numpy matrices. The ndarray
multiplication is simply a vectorized version of scalar multiplication,
In [105]: a = np.array([1,2,3])
In [106]: b = np.array([1,2,3])
In [107]: a*b
Out[107]: array([1, 4, 9])
whereas, the matrix multiplication would yield,
In [108]: aMat = np.mat(a)
In [109]: bMat = np.mat(b)
In [110]: aMat*bMat.T
Out[110]: matrix([[14]])
In [111]: aMat.T*bMat
Out[111]:
matrix([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
If you intend to use Python and MATLAB together for your projects, then I recommend you to consider programming with matrices in Python instead of ndarray
objects, because the matrix type in Python behaves quite similar to matrices in MATLAB.
Numpy has a lot more to offer for linear algebra operation, that far beyond the scope of this lecture. More information about algebraic operations in NumPy can be found here.
Symbolic linear algebra
There also a package SymPy that supports symbolic computations for linear algebra operations as well,
In [116]: import sympy as sym
In [117]: a = sym.Matrix([[2, 0], [0, 5]])
In [118]: a**-1 # inverse of matrix a
Out[118]:
Matrix([
[1/2, 0],
[ 0, 1/5]])
In [119]: a.inv() # same as above, inverse of a
Out[119]:
Matrix([
[1/2, 0],
[ 0, 1/5]])
In [120]: a.det() # determinant of a
Out[120]: 10
In [121]: a.eigenvals() # eigenvalues of a
Out[121]: {2: 1, 5: 1}
In [122]: a.eigenvects() # eigenvectors of a
Out[122]:
[(2, 1, [Matrix([
[1],
[0]])]), (5, 1, [Matrix([
[0],
[1]])])]
A tutorial on sympy
can be found here.
Curve plotting in Python
The workhorse of plotting in Python is Matplotlib which is a Python 2D plotting library capable of producing publication quality figures. The usage of matplotlib is very similar to MATLAB.
Matplotlib, the workhorse of plotting in Python
To see how plotting with Matplotlib works, let’s start with a simple example of 2D curve plotting,
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return x**2*np.exp(-x**2)
x = np.linspace(0, 3, 51) # 51 points between 0 and 3
y = np.zeros(len(x)) # allocate y with float elements
for i in range(len(x)):
y[i] = f(x[i])
plt.plot(x, y)
plt.show()
If you try the above code in IPython, the out on screen would be something like the following,
You can also save the figure output as a file by,
In [8]: plt.plot(x, y)
Out[8]: [<matplotlib.lines.Line2D at 0x1bff2e479e8>]
In [9]: plt.savefig('simple_curve.pdf') # produces PDF file.
In [10]: plt.savefig('simple_curve.png') # produces PNG file.
In [11]: pwd
Out[11]: 'C:\\Users\\Amir' # files are saved here
Just like MATLAB, the figures could be also decorated with axis labels, plot title, legend and a lot more, in a syntax very much like MATLAB,
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['x^2*exp(-x^2)'])
plt.axis([0, 3, -0.05, 0.6]) # [xmin, xmax, ymin, ymax]
plt.title('A simple Matplotlib decorated plot')
plt.savefig('simple_curve_decorated.png')
plt.show()
which outputs this file in your current directory,
Plotting multiple curves in one figure
Again, similar to MATLAB, this can be achieved by the statement hold('on')
like the following,
def f(x):
return x**2*np.exp(-x**2)
def g(x):
return x*np.exp(-x)
x = np.linspace(0, 3, 51) # 51 points between 0 and 3
yf = np.zeros(len(x)) # allocate y with float elements
yg = np.zeros(len(x)) # allocate y with float elements
for i in range(len(x)):
yf[i] = f(x[i])
yg[i] = g(x[i])
plt.plot(x, yf, 'r-') # plot with color red, as line
plt.hold('on')
plt.plot(x, yg, 'bo') # # plot with color blue, as points
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['x^2*exp(-x^2)' , 'x*exp(-x)'])
plt.axis([0, 3, -0.05, 0.6]) # [xmin, xmax, ymin, ymax]
plt.title('multiple Matplotlib curves in a decorated plot')
plt.savefig('multiple_curves_decorated.png')
plt.show()
The output of the code is a PNG figure available here.
If you need to discontinue multiple plots on the same figure, again, as in MATLAB, you use hold('off')
.
Subplots in Matplotlib
Suppose you wanted to generate the same curves as in the above example, but each in a different plot, but in the same figure. One way to do this would be like the following,
plt.figure() # generates a new figure as in MATLAB
plt.subplot(2,1,1) # create a 2-row, 1-column subplot, and this is the 1st subplot.
plt.plot(x, yf, 'r-') # plot with color red, as line
plt.subplot(2,1,2) # this is the 2nd subplot.
plt.plot(x, yg, 'bo') # plot with color blue, as points
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['x*exp(-x)'])
plt.axis([0, 3, -0.05, 0.6]) # [xmin, xmax, ymin, ymax]
plt.title('an example Matplotlib subplot')
plt.savefig('two_by_one_subplot.png')
plt.show()
The output of the code is a PNG figure available here.
Note that since the decorations appeared only for the second subplot, only the second one in the figure above is decorated with plot title, legend, etc. Also, note that the figure()
method creates a new plot window on the screen.
Other plotting packages
For more complicated 2D/3D or vector field plotting, you may find Matplotlib inadequate. To address these inadequacies, other packages have been developed which provide interface to more advanced plotting software such as, MATLAB, Gnuplot, Grace, OpenDX, VTK, and others.
Easyviz from SciTools
Because each of the above mentioned visualization software has its own plotting syntax, a Python module easyviz
has been developed which provides a universal interface for any of the above mentioned back-end plotting software. In other words, the user can request eazyvis to use one of the above-mentioned software as the plotting engine in Python, while the syntax of the Python code is universal and the same for all of them, and this is achieved by using eazyvis
. Just like Matplotlib, the syntax of eazyvis
has been also purposefully made very similar to MATLAB.
The Easyviz module is part of the SciTools package, which consists of a set of Python tools building on Numerical Python, ScientificPython, the comprehensive SciPy environment, and other packages for scientific computing with Python. However, keep in mind that SciTools strictly requires Python v2.7 and Numerical Python.
Mayavi visualization package
Mayavi is another advanced, free, scientific data visualizer for Python, with emphasis on three-dimensional visualization techniques. The package is written in Python, and uses the Visualization Toolkit (VTK) in C++ for rendering graphics. Since VTK can be configured with different backends, so can Mayavi. Mayavi is cross platform and runs on most platforms like Mac OS X, Windows, and Linux.