This lecture further explains topics on Input/Output processes and error handling in Python, as well as methods of testing the accuracy and robustness of your code.
I/o (continued)
So far in this course, we have indirectly discussed several methods of getting input information from the user, and several methods of outputting the result in a Python program. This lecture, attempts at formalizing all the previous discussions and introduce more general efficient methods of the code interaction with users.
Methods of inputting data
Let’s begin with an example code, explaining the meaning of input/output (I/O) in Python,
from math import exp
a = 0.1
b = 1
x = 0.6
y = a*exp(b*x)
print(y)
0.1822118800390509
In the above code, $a,b,x$ are examples of input data to a code, and $y$ is an example of code output. In such case as in the above, the input data is said to be hardcoded in the program.
In general, in any programming language, including Python you should always avoid hardcoding input information into your program.
If data is hardcoded, then every time it has to change, the user has to change the content of the code, and this is not considered good programming style for software development.
In general, input data can be fed to a program in four different ways:
- let the user answer questions in a dialog in the terminal window,
- let the user provide input on the command line,
- let the user provide input data in a file,
- let the user write input data in a graphical interface.
Input data from terminal window
We have already introduced and used this method frequently in previous lectures, via the Python’s builtin function input()
. If we were to get the input data for the above example code via the terminal window, an example would be the following,
from math import exp
a,b,x = input('input the values for a,b,x (comma separated): ').split(",")
y = float(a)*exp(float(b)*float(x))
print(y)
input the values for a,b,x (comma separated): 0.1,1,0.6
0.1822118800390509
Input data from command line
This approach, which we discussed in previous lecture, is most popular in Unix-like environments, where most users are accustomed to using Bash command line. However, it can be readily used in Windows environment as well. For this approach, there is a Python module sys
that can accomplish what we desire,
from math import exp
import sys
a,b,x = sys.argv[1],sys.argv[2],sys.argv[3]
y = float(a)*exp(float(b)*float(x))
print(y)
Now if you save this code in a file, and run it on the Bash command line, the program expects you the enter 3 float numbers following the name of the program,
$ python input_via_sys.py 0.1 1 0.6
0.1822118800390509
ATTENTION: Notice the convention for command-line arguments
1. As you see in the above example, the name of the program is considered as the first command line argument (sys.argv[0]
). Also the arguments must be separated by a white space, and should appear in the proper order.
2. If one value has a white space (e.g., a string value with white space character in it), then it has to be contained in quotation marks''
or""
.
3. Also note that all input command-line arguments are taken as string values. Therefore, you will have to convert them to the proper type (e.g., float, int, ...) once they are read from the command line.
Variable number of command line arguments
If the number of input arguments on the command line is not known a priori, then you can get a list of all input arguments using sys.argv[1:]
and then use a for-loop to loop over individual elements of it, or use len()
function to find the total number of input arguments.
Option-value pairs as command-line input
Once the number of input arguments to your code increases, the process of inputting data as command line arguments can get complicated. Ideally, the user should be able to enter data in any arbitrary order. This can be done by indicating the meaning of each input by a flag before the input value. For example, suppose you were to find the location $y(t)$ of an object thrown up in the air vertically, given that the object started at $y=y_0$, at $t=0$ with an initial velocity $v_0$, and thereafter was subject to a constant acceleration $a$,
\(y(t) = y_0 + v_0t + \frac{1}{2}at^2 ~.\)
Obviously, this formula requires four input variables: $y_0$, $v_0$, $a$, and $t$, and we don’t the program user to memorize their order of entry on the command line. The solution is to identify the type of each input using a flag preceding the input value. This can be done using argparse
Python module. Details of the usage of this module goes beyond the limited time of our class. However, I recommend you to have a look at the syntax and usage of argparse module, as you will find it very handy in your Python codes, projects, and software development.
Input data from file
In cases where the input data is large, the command-line arguments and input from terminal window are not efficient anymore. In such cases, the most common approach is to let the code read input data from a file, the path to which is most often given to the code from the command line or the terminal window.
Reading a file line by line
To read a file, say this file, one first needs to open it,
In [1]: myfile = open('data.in', 'r')
In [2]: type(myfile)
Out[2]: _io.TextIOWrapper
In [5]: myfile.
myfile.buffer myfile.detach myfile.fileno myfile.line_buffering myfile.newlines myfile.readline myfile.seekable myfile.writable
myfile.close myfile.encoding myfile.flush myfile.mode myfile.read myfile.readlines myfile.tell myfile.write
myfile.closed myfile.errors myfile.isatty myfile.name myfile.readable myfile.seek myfile.truncate myfile.writelines
The function open
creates a file object, stored in the variable myfile
. The second input argument to open
, 'r'
tells the function that the purpose of this file opening is to read data (as opposed to, for example, writing data, or both reading and writing).
Now you can use a for loop to read the data in this file line by line:
for line in myfile:
print(line)
1
3
4
5
6
7
88
65
What is printed here, is actually the content of data.in
file, line by line.
Alternative method of reading file data
Instead of reading one line at a time, as in the above, we can load all lines into a single list of strings,
In [9]: myfile = open('data.in', 'r')
In [10]: lines = myfile.readlines()
In [11]: type(lines)
Out[11]: list
Note that each element of line
contains one line of the file.
In [15]: lines
Out[15]: ['1\n', '3\n', '4\n', '5\n', '6\n', '7\n', '88\n', '65\n']
The action of the method readlines()
is equivalent to a for-loop like the following,
In [16]: myfile = open('data.in', 'r')
...: lines = []
...: for line in myfile:
...: lines.append(line)
...: lines
...:
Out[16]: ['1\n', '3\n', '4\n', '5\n', '6\n', '7\n', '88\n', '65\n']
or this list comprehension format,
In [19]: myfile = open('data.in', 'r')
...: lines = [line for line in myfile]
...: lines
...:
Out[19]: ['1\n', '3\n', '4\n', '5\n', '6\n', '7\n', '88\n', '65\n']
Now suppose you were to calculate the mean of the numbers in this file. You could simply use the following list comprehension code to do so,
In [22]: mean = sum([float(line) for line in lines])/len(lines)
...: print(mean)
22.375
Note that once you read the file, you can close it using,
myfile.close()
The with statement
More often in modern Python code you may see the with
statement for reading a file, like the following
In [34]: with open('data.in', 'r') as myfile:
...: for line in myfile:
...: print(line)
...:
1
3
4
5
6
7
88
65
This is technically equivalent to,
In [35]: myfile = open('data.in', 'r')
...: for line in myfile:
...: print(line)
...: myfile.close()
...:
1
3
4
5
6
7
88
65
The difference here is that with the modern with
statement, there is no need to close the file in the end.
The old while True construction
The call myfile.readline()
returns a string containing the text at the current line. A new myfile.readline()
statement will read the next line. If the file reaches the end, then myfile.readline()
returns an empty string, the end of the file has
reached and the code must stop further reading of the file. The traditional way of telling the code to stop at the end of the file is a while
loop like the following,
In [36]: myfile = open('data.in', 'r')
...: while True:
...: line = myfile.readline()
...: if not line:
...: break
...: print(line)
1
3
4
5
6
7
88
65
Reading an entire file as a single string
While the readlines()
method returns a list of lines in the file, the read()
method returns a string containing the entire content of the file.
In [37]: myfile = open('data.in', 'r')
...: s = myfile.read()
In [38]: s
Out[38]: '1\n3\n4\n5\n6\n7\n88\n65\n'
In [39]: print(s)
1
3
4
5
6
7
88
65
The major advantage of this method of reading file content is that you can then immediately apply string methods directly on the file content.
In [48]: myfile = open('data.in', 'r')
...: numbers = [float(x) for x in myfile.read().split()]
...: mean = sum(numbers)/len(numbers)
...:
In [49]: print(mean)
22.375
Converting user input to live Python objects
One of the cool features in Python I/O is that you can provide text containing valid Python code as input to a program and then turn that text into live Python objects as if the text were lines of code written directly into the program beforehand. This is a very powerful tool for letting users specify function formulas, for instance, as input to a program. The program code itself has no knowledge about the kind of function the user wants to work with, yet at run time the user’s desired formula enters the computations. To achieve the goal, one can use Python’s magic functions, a.k.a. special methods.
The magic eval function
The eval
function takes a string as argument and evaluates this string as a Python expression. The result of an expression is an object. For example,
In [10]: eval('1+2')
Out[10]: 3
This is equivalent to typing,
In [11]: 1+2
Out[11]: 3
or another example,
In [12]: a = 1
In [13]: b = 2
In [14]: c = eval('a+b')
In [15]: c
Out[15]: 3
or,
In [19]: from math import sqrt
In [20]: eval('sqrt(4)')
Out[20]: 2.0
But, note that in all of the above examples, the eval
function evaluates a Python expression, that is, this function cannot execute a Python statement.
Now the cool thing about this function is that, you can directly apply it to the user input. For example, suppose the user is asked to input a Python expression and then the code is supposed to evaluate the input just like a simple calculator,
eval(input('Input an arithmetic expression to evaluate: '))
Input an arithmetic expression to evaluate: 2 + 3.0/5 + exp(7)
1099.2331584284584
The magic exec function
Similar to the eval
function, there is also an exec
magic function that executes a string containing an arbitrary
Python code, not just a Python expression. This is a powerful idea since it now enables the user to write a formula as input to the program, available to the program in the form of a string object. The program can then convert this formula to a callable Python code, or function, using the magic exec
function.
In [21]: exec('import math')
In [22]: exec('a=1; b=2; c=a+b')
In [23]: a,b,c
Out[23]: (1, 2, 3)
One could even input a full function definition to the exec function,
myFuncString = input('Input a Python function definition of interest: ')
f = exec(myFuncString)
Input a Python function definition of interest, named func: def func(x): return 2*x + 1
func(x=1)
3
Now, since this is such a useful functionality in Python, there is already a package written scitools
, that converts an input expression to a Python function,
from scitools.StringFunction import StringFunction
myfuncString = input('Input a Python expression to build your requested Python function: ')
myfunc = StringFunction(myfuncString)
The only major caveat with this module is that, at the moment, it only works with Python 2.x, and not Python 3. So, the above code will not work on your Python 3 platform.
Methods of outputting data
Two major methods of data output are,
- writing to the terminal window, as previously done using
print()
function, or, - writing to an output file.
We have already extensively discussed printing output to the terminal window. Writing data to file is also easy.
Writing to a file
Similar to reading from a file, in order to write to a file, one has to first open the file, this time for the purpose of writing, which is indicated by 'w'
or 'a'
,
outfile = open(filename, 'w') # write to a new file, or overwrite file
One could also append some output to an existing file using the 'a'
indicator as input to open()
,
outfile = open(filename, 'a') # append to the end of an existing file
In both cases, the string valued variable filename
contains the path to the file that should be created or manipulated. Suppose we want to write the output of the above code in previous section to a new file. All you would need to do is the following,
myfile = open('data.in', 'r')
numbers = [float(x) for x in myfile.read().split()]
mean = sum(numbers)/len(numbers)
outfile = open('data.out','w')
outfile.write(str(mean) + '\n')
myfile.close()
outfile.close()
This will result in the creation of a new file named data.out
which contains the value of mean
variable. Note that the addition of the character '\n'
at the end of the write
statement is necessary, otherwise the next write to the file will not appear on a new line.
Writing a table of data to a file
Now suppose you were to write the following list to an output file,
data = [[ 0.75, 0.29619813, -0.29619813, -0.75 ],
[ 0.29619813, 0.11697778, -0.11697778, -0.29619813],
[-0.29619813, -0.11697778, 0.11697778, 0.29619813],
[-0.75, -0.29619813, 0.29619813, 0.75 ]]
One solution would be the following,
outfile = open('table.out', 'w')
for row in data:
for column in row:
outfile.write( '{:14.8f}'.format(column) )
outfile.write('\n')
outfile.close()
This code would result in the creation of an output file named table.out
which contain the content of data
variable, in a nice formatted style as the following,
0.75000000 0.29619813 -0.29619813 -0.75000000
0.29619813 0.11697778 -0.11697778 -0.29619813
-0.29619813 -0.11697778 0.11697778 0.29619813
-0.75000000 -0.29619813 0.29619813 0.75000000
Error handling in Python
A good code has to be able to handle exceptional situations that may occur during the code execution. These exceptions may occur during data input from either command line, terminal window, or an input file. They may also occur as a result of repeated operations on the input data, inside the code. For example, in lecture 7, we explained a way of handling the wrong number of input command line arguments. This and similar measures to handle nicely the unexpected runtime errors is what’s called error and exception handling.
A simple way of error handling is to write multiple if-blocks each of which handle a special exceptional situation. That is, to let the code execute some statements, and if something goes wrong, write the program in such a way that can detect this and jump to a set of statements that handle the erroneous situation as desired.
A more modern and flexible way of handling such potential errors in Python is through the following Python construction,
try:
<Python statements>
except <error type>:
<Python statements>
For example, if we were to rewrite the command line argument section in this code in lecture 7, to handle exceptions that arise due to ValueError
(e.g., not an integer input), it would look something like the following,
if __name__ == "__main__":
import sys
if len( sys.argv ) != 2: # check the number of arguments to be exactly 2.
print('''
Error: Exactly two arguments must be given on the command line.
Usage:''')
print(" ", sys.argv[0], "<a positive integer number>", '\n')
sys.exit(' Program stopped.\n')
else:
try:
n = int(sys.argv[1])
print('Here is a list of all prime numbers smaller than {}:'.format(n))
get_primes(n)
except ValueError:
print('The input you entered is not an integer!\n Try again...')
sys.exit(1)
The statement sys.exit(1)
aborts the program. The whole code can be found here. Now if we run the original code with a non-integer input, we would get the following Python error,
$ ../7/cmd_find_primes.py amir
Traceback (most recent call last):
File "../7/cmd_find_primes.py", line 34, in <module>
n = int(sys.argv[1])
ValueError: invalid literal for int() with base 10: 'amir'
whereas, if we run the newly written code, the non-integer error is noicely handled by outputting a gentle error message to the user and exiting the program gracefully.
$ ./cmd_find_primes_modern.py amir
The input you entered is not an integer!
Try again...
The type of error occurring in the above example was ValueError
. There can be however, many other types of errors and exceptions. For this reason, Python has a builtin list of exceptions that frequently occur in programming.
The raise statement
Instead of the print statement in the above except
block, Python has a builtin function to handle the error together with an input message from the programmer. For example, the previous code, could be modified to the following code,
if __name__ == "__main__":
import sys
if len( sys.argv ) != 2: # check the number of arguments to be exactly 2.
print('''
Error: Exactly two arguments must be given on the command line.
Usage:''')
print(" ", sys.argv[0], "<a positive integer number>", '\n')
sys.exit(' Program stopped.\n')
else:
try:
n = int(sys.argv[1])
print('Here is a list of all prime numbers smaller than {}:'.format(n))
get_primes(n)
except ValueError:
raise ValueError('The input you entered is not an integer!\n Try again...')
sys.exit(1)
Executing the code with wrong input would give,
$ ./cmd_find_primes_raise.py amir
Traceback (most recent call last):
File "./cmd_find_primes_raise.py", line 35, in <module>
n = int(sys.argv[1])
ValueError: invalid literal for int() with base 10: 'amir'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./cmd_find_primes_raise.py", line 39, in <module>
raise ValueError('The input you entered is not an integer!\n Try again...')
ValueError: The input you entered is not an integer!
Try again...
A more elegant and cleaner way of handling and outputting the error would be use the following syntax, in this modified code,
if __name__ == "__main__":
import sys
if len( sys.argv ) != 2: # check the number of arguments to be exactly 2.
print('''
Error: Exactly two arguments must be given on the command line.
Usage:''')
print(" ", sys.argv[0], "<a positive integer number>", '\n')
sys.exit(' Program stopped.\n')
else:
try:
n = int(sys.argv[1])
print('Here is a list of all prime numbers smaller than {}:'.format(n))
get_primes(n)
except ValueError as err:
print(err)
sys.exit(1)
With the following output,
$ ./cmd_find_primes_raise_as_err.py amir
invalid literal for int() with base 10: 'amir'
In the statement except ValueError as err:
one could use Exception
for all types of errors instead of only ValueError
exceptions, or use a tuple syntax such as except (ValueError, IndexError) as err:
to cover these two exceptions.
Code verification and unit testing
In the previous lecture we discussed the process of creating modules and collecting functions in one file as a personal module to be used later. As soon as the list of your codes and functions grow, you will need to have a unified way of ensuring all your functions work appropriately, regardless of the potential future internal changes that are made to the functions. This is what the unit testing exists for. Unit testing is a software development process in which the smallest testable parts of an application, called units, are individually and independently scrutinized for proper operation. Unit testing can be done manually, but if you have a long list of functions (which you most often have), you’d want to automate the testing process.
The grand goal in unit testing is to reduce the risk of encountering potential problems when running the code in the smallest possible units of the code. This means,
- ensuring the code has the correct behavior when given the proper input data.
- ensuring the code robustness to exceptions and invalid input data, meaning that it does not crash when it reaches unexpected situations during the code execution and gracefully handles the error, without interruption.
Because of the goals for which the unit tests are designed, they are mostly written and used by the developers of the code.
Unit test frameworks
There are many ways to write tests for codes. Now, if you asked each software developer to write a unit test for a specific software, each would likely come up with their own set of rules and tests of the software. You will end up with many tests, that will generally only be usable by the developer that wrote the tests. That is why you should select a unit test framework. A unit test framework provides consistency for how the unit tests for your project are written. There are many test frameworks to choose from for just about any language you want to program with, including Python. Just like programming language, almost every programmer has a strong opinion which test framework is the best. Research what’s out there and use the one that meets the needs of your organization (For example, there is one experienced Python programmer in our ECL class who does not like any of the existing unit tests for Python, and wants to write his own unit test as the project of this course!).
The framework will provide a consistent testing structure to create maintainable tests with reproducible results. From a product quality and business point of view, those are the most valuable reasons to use a unit test framework. When you write a code, you should also think of a quick and simple way to develop and verify your logic in isolation. Once you make sure you have it working solidly by itself, then you can proceed to integrate it into the larger project solutions with great confidence.
Python offers three unit testing frameworks,
which automate as much as possible the process of testing all of your codes, whenever required. The last, pytest
appears to be the most popular unit testing framework as of today.
Conventions for test functions
The simplest way of using the testing frameworks (e.g., pytest or nose) is to write a set of test functions, scattered around in files, such that pytest or nose can automatically find and run all of these test functions. To achieve the goal, the test functions need to follow certain conventions:
- The name of a test function starts with
test_
. - A test function cannot take any arguments.
- Any test must be formulated as a boolean condition.
- An
AssertionError
exception is raised if the boolean condition isFalse
(i.e., when the test fails).
Testing function accuracy
Suppose we have written the following function which runs the Newton’s method for solving an algebraic equation of the form $f(x)=0$, and we would like to write a test function that ensures its correct behavior.
def newton(f, dfdx, x, eps=1E-7):
n = 0 # iteration counter
while abs(f(x)) > eps:
x = x - f(x)/dfdx(x)
n += 1
return x, f(x), n
Our goal is to write a function that tests the validity of the output of the function for a special case for which we know the results a priori. In the case of the above code, the function output is a not a fixed result, but an approximate float number $x_0$ which satisfies the condition $f(x_0)<\epsilon$ where $\epsilon$ is a prescribed number close to zero. Therefore, we have to first come up with a mathematical test input function to the function newton
, for which we have calculated the correct answer a priori, and we want to make sure if the above code gives the same answer. Since the output of the function newton
is a float that depends on the machine precision, we cannot expect the function to output the exact same result every time the code is run on any computer. Therefore, we have to define our test such that the function passes the test even if the result is not exactly what we expect, but still close enough to the correct answer. Here is an example test function for the above code using the sin(x)
function as the test input function to newton()
,
def test_newton_sin():
from math import sin, cos, pi
def f(x):
return sin(x)
def dfdx(x):
return cos(x)
x_ref = 0.000769691024206
f_x_ref = 0.000769690948209
n_ref = 3
x, f_x, n = newton(f, dfdx, x=-pi/3, eps=1E-2)
tol = 1E-15 # tolerance for comparing real numbers
assert abs(x_ref - x) < tol , "The test for the value of x_0 failed" # is x correct?
assert abs(f_x_ref - f_x) < tol , "The test for the function value failed" # is f_x correct?
assert n == 3 , "The test for the number of iterations failed" # is f_x correct? # is n correct?
Note that in the above test function, the function name begins with test_
, takes no arguments, and raises an assertionError
at the end. Now if you run the test,
test_newton_sin()
you will notice that the function passed the test. However, if in the above test, we set eps=1E-10
, and run the test again, you will get an assertion error like the following,
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-20-8be9faac8d8e> in <module>()
----> 1 test_newton_sin()
<ipython-input-18-263651ba410f> in test_newton_sin()
14 x, f_x, n = newton(f, dfdx, x=-pi/3, eps=1E-10)
15 tol = 1E-15 # tolerance for comparing real numbers
---> 16 assert abs(x_ref - x) < tol , "The test for the value of x_0 failed" # is x correct?
17 assert abs(f_x_ref - f_x) < tol , "The test for the function value failed" # is f_x correct?
18 assert n == 3 , "The test for the number of iterations failed" # is f_x correct? # is n correct?
AssertionError: The test for the value of x_0 failed
One could also write exact tests for the function newton
which test for an exact result which is known a priori, for example a mathematical linear input function to newton
.
Testing function robustness
The above newton
function is very basic and suffers from several problems:
- for divergent iterations it will iterate forever,
- it can divide by zero in f(x)/dfdx(x),
- it can perform integer division in f(x)/dfdx(x),
- it does not test whether the arguments have acceptable types and values.
A more robust implementation dealing with these potential problems would look like the following:
def Newton(f, dfdx, x, eps=1E-7, maxit=100):
if not callable(f): raise TypeError( 'f is %s, should be function or class with __call__' % type(f) )
if not callable(dfdx): raise TypeError( 'dfdx is %s, should be function or class with __call__' % type(dfdx) )
if not isinstance(maxit, int): raise TypeError( 'maxit is %s, must be int' % type(maxit) )
if maxit <= 0: raise ValueError( 'maxit=%d <= 0, must be > 0' % maxit )
n = 0 # iteration counter
while abs(f(x)) > eps and n < maxit:
try:
x = x - f(x)/float(dfdx(x))
except ZeroDivisionError:
raise ZeroDivisionError( 'dfdx(%g)=%g - cannot divide by zero' % (x, dfdx(x)) )
n += 1
return x, f(x), n
Now, for this more robust code (than the earlier version: newton
), we have to also write a set of tests, examining the robustness of the code, subject to potential exceptions. For example, one can write a test function that examines the behavior of Newton
subject to an input mathematical function that is known to lead to divergent (infinite) iterations, if the initial starting point $x$ is not sufficiently close to the root of the function. One such example is $f(x)=tanh(x)$, for which a starting search value of $x=20$ would lead to infinite iterations in the Newton’s method. So we can set maxit=12
in our robust Newton
code, and test that the actual number of iterations reaches this limit. Given our prior knowledge for this function, that the value of $x$ will also diverge after 12 iterations, we could also add a test for the value of $x$, like the following,
def test_Newton_divergence():
from math import tanh
f = tanh
dfdx = lambda x: 10./(1 + x**2)
x, f_x, n = Newton(f, dfdx, 20, eps=1E-4, maxit=12)
assert n == 12
assert x > 1E+50
test_Newton_divergence()
The example given here, only tests for the robustness of Newton()
in handling divergent situations. For other potential problems, one has to write other test functions, some which will be given as exercise.
Summary: unit testing
Unit testing is a component of test-driven development (TDD), a pragmatic methodology that takes a meticulous approach to building a product by means of continual testing and revision.
Unit testing has a steep learning curve. The development team needs to learn what unit testing is, how to unit test, what to unit test and how to use automated software tools to facilitate the process on an on-going basis. The great benefit to unit testing is that the earlier a problem is identified, the fewer compound errors occur. A compound error is one that doesn’t seem to break anything at first, but eventually conflicts with something down the line and results in a problem.
There is a lot more to unit testing and the existing Python frameworks for it than we discussed here. However, covering all those topics would require a dedicated course for unit testing, which is certainly beyond the capacity of this course. But if you are interested to know more, I recommend you to refer to one of the three unit testing frameworks mentioned above. There are also books already written on this topic an example of which is available here.