This homework aims at giving you some experience with Python I/O, error handling in your code, and testing you code for accuracy and robustness.
1. Write a simple program named sum.py
, that takes in an arbitrary-size list of input floats from the command-line, and prints out the sum of them on the terminal with the following message,
$ python sum.py 1 2 1 23
The sum of 1 2 1 23 is 27.0
Note that you will need to use the Python’s built-in function sum()
.
2. Similar to the previous probelm, write a simple program named sum_via_eval.py
, that takes in an arbitrary-size list of input numbers from the command-line, and prints out the sum of them on the terminal, this time using Python’s eval
function. The program output should look like the following,
$ python sum.py 1 2 1 23
The sum of 1 2 1 23 is 27
3. Consider this data file. It contains information about the amino acids in a protein called 1A2T
. Each amino acid in protein is labeled by a single letter. There are 20 amin acid molecules in nature, and each has a total surface area (in units of Angstroms squared) that is given by the following table,
'A': 129.0
'R': 274.0
'N': 195.0
'D': 193.0
'C': 167.0
'Q': 225.0
'E': 223.0
'G': 104.0
'H': 224.0
'I': 197.0
'L': 201.0
'K': 236.0
'M': 224.0
'F': 240.0
'P': 159.0
'S': 155.0
'T': 172.0
'W': 285.0
'Y': 263.0
'V': 174.0
However, when these amino acids sit next to each other to form a chain protein, they cover parts of each other, such that only parts of their surfaces is exposed, while the rest is hidden from the outside world by other neighboring amino acids. Therefore, one would expect an amino acid that is at the core of a spherical protein would have almost zero exposed surface area.
Now given the above information, write a Python program that takes in two command-line input arguments, one of which is a string containing the path to the above input file 1A2T_A.dssp
which contains the partially exposed surface areas of amino acids in protein 1A2T
for each of its amino acids, and a second command-line argument which is the path to the file containing output of the code (e.g., it could be ./readDSSP.out
). Then,
- the code reads the content of this file, and
- extracts the names of the amino acids in this protein from the data column inside the file which has the header
AA
(look at the line number 25 inside the input data file, belowAA
is the column containing the one-letter names of amino acids in this protein), and
- also extracts the partially exposed surface area information for each of these amino acids which appear in the column with header
ACC
, and
- then uses the above table of maximum surface area values to calculate the fractional exposed surface area of each amino acid in this protein (i.e., for each amino acid, fraction_of_exposed_surface = ACC / maximum_surface_area_from_table), and
- finally for each amino acid in this protein, it prints the one-letter name of the amino acid, its corresponding partially exposed surface area (ACC from the input file), and its corresponding fractional exposed surface area (name it RSA) to the output file given by the user on the command line.
- On the first column of the output file, the code should also write the name of the protein (which is basically the name of the input file
1A2T_A
) on each line of the output file. Note that your code should extract the protein name from the input filename (by removing the file extension and other unnecessary information from the input command line string). Here is an example output of the code.
- Your code should also be able to handle an error resulting from less or more than 2 input command line arguments. That is, if the number of input arguments is 3 or 1, then it should input the following message on screen and stop.
$ ./readDSSP.py ./1A2T_A.dssp
Usage:
./readDSSP.py <input dssp file> <output summary file>
Program aborted.
or,
$ ./readDSSP.py ./1A2T_A.dssp ./readDSSP.out amir
Usage:
./readDSSP.py <input dssp file> <output summary file>
Program aborted.
To achieve the above goal, you will have to create a dictionary from the above table, with amino acid names as the keys, and the maximum surface areas as the corresponding values. Name your code readDSSP.py
and submit it to your repository.
Write your code in such a way that it checks for the existence of the output file. If it already exists, then it does not remove the content of the file, whereas, it appends new data to the existing file. therwise, if the file does not exist, then it creates a new output file as requested by the user. To do so, you will need to use os.path.isfile
function from module os
.
ATTENTION: Note that in some rows instead of a one-letter amino acid name, there is !
. In such cases, your code should be able to detect the abnormality and skip that row, because that row does not contain amino acid information.
4. Consider the simplest program for evaluating the formula $y(t) = v_0t-\frac{1}{2}gt^2$,
v0 = 3; g = 9.81; t = 0.6
y = v0*t - 0.5*g*t**2
print(y)
(A) Write a program that takes in the above necessary input data ($t$,$v_0$) as command line arguments.
(B) Extend your program from part (A) with exception handling such that missing command-line arguments are detected. For example, if the user has entered enough input arguments, then the code should raise IndexError
exception. In the except IndexError
block, the code should use the input
function to ask the user for the missing input data.
(C) Add another exception handling block that tests if the $t$ value read from the command line, lies between $0$ and $2v_0/g$. If not, then it raises a ValueError
exception in the if block on the legal values of $t$, and notifes the user about the legal interval for $t$ in the exception message.
Here are some example runs of the code,
$ ./projectile.py
Both v0 and t must be supplied on the command line
v0 = ?
5
t = ?
4
Traceback (most recent call last):
File "./projectile.py", line 17, in <module>
'must be between 0 and 2v0/g = {}'.format(t,2.0*v0/g))
ValueError: t = 4.0 is a non-physical value.
must be between 0 and 2v0/g = 1.019367991845056
$ ./projectile.py
Both v0 and t must be supplied on the command line
v0 = ?
5
t = ?
0.5
y = 1.27375
$ ./projectile.py 5 0.4
y = 1.2151999999999998
$ ./projectile.py 5 0.4 3
y = 1.2151999999999998
5. Consider the function Newton
that we discussed in lecture 8,
def Newton(f, dfdx, x, eps=1E-7, maxit=100):
if not callable(f): raise TypeError( 'f is %s, should be function or class with __call__' % type(f) )
if not callable(dfdx): raise TypeError( 'dfdx is %s, should be function or class with __call__' % type(dfdx) )
if not isinstance(maxit, int): raise TypeError( 'maxit is %s, must be int' % type(maxit) )
if maxit <= 0: raise ValueError( 'maxit=%d <= 0, must be > 0' % maxit )
n = 0 # iteration counter
while abs(f(x)) > eps and n < maxit:
try:
x = x - f(x)/float(dfdx(x))
except ZeroDivisionError:
raise ZeroDivisionError( 'dfdx(%g)=%g - cannot divide by zero' % (x, dfdx(x)) )
n += 1
return x, f(x), n
This function is supposed to be able to handle exceptions such as divergent iterations (which we discussed in the lecture), and division-by-zero. The latter error happens when dfdx(x)=0
in the above code. Write a test code that ensures the above code is able to correctly identify a division-by-zero exception and raise the correct assertionError.
(Hint: To do so, you need to consider a test mathematical function as input to Newton
. One example could be $f(x)=\cos(x)$ with a starting search value $x=0$. This would result in derivative value $f’(x=0)=-\sin(x=0)=0$, which should lead to a ZeroDivisionError
exception. Now, write a test function test_Newton_div_by_zero
that can explicitly handle this exception by introducing a boolean variable success
that is True
if the exception is raised and otherwise False
.)