BIO-210: Applied software engineering for life sciences
Python Introduction III - Numpy 2 and branching operations#
A deeper dive into Numpy#
Numpy is a widely used Python library for scientific computing. During the last lesson you already learnt quite a few features of Numpy. Today, let’s explore more features!
import numpy as np
Slicing operations (refresh)#
Let’s review together how to index a multi-dimensional array using slicing
a = np.arange(1,101).reshape(10,10)
print('By default, indexing with colon will return all rows and columns')
b = a[:,:] #[all rows, all columns]
print(b)
print('We can define the start at the end of indexed rows')
b = a[1:3,:] #[all rows, all columns]
print(b)
print('or the start at the end of indexed columns')
b = a[:,1:3] #[all rows, all columns]
print(b)
print('We can also specify the start and the end for both rows and columns')
b = a[4:7,1:3] #[all rows, all columns]
print(b)
Sometimes, it can be useful to skip indexes. This can be achieved by adding another colon (:
) and the value that specify how many values you want to skip. Therefore, we can summarize all slicing operations with the following notation [start_idx : end_idx : skip_idx]
.
print('Print every fourth rows')
b = a[::4,:]
print(b)
While here we are working with relatively small arrays, in real life you might work with large datasets having several axes. When that’s the case, the ellipsis syntax (...
) is useful to skip multiple axes. For example, imagine we have a 5D array and we want to get all the axes except the last one, where we want to get the first two elements. We can use the following syntax:
mat = np.random.randn(4,4,4,4,4)
print(mat)
print('mat.shape:', mat.shape)
mat_sub = mat[..., :2] # equivalent to mat[:,:,:,:,:2]
print('mat_sub.shape:', mat_sub.shape)
Exercise 0. Index matrix a
and print all even numbers between 40 (excluded) and 70.
# Your code here
Basic statistical functions#
NumPy contains various statistical functions that are used for data analysis. These functions are useful, for example, to find the maximum or the minimum element of a vector. It is also used to compute common statistical operations like standard deviation, variance, etc.
The functions mean
and std
are used to caculate the mean and standard deviation of the input data (e.g., of an array). Besides caculating the result for the whole data, they can also be used to calculate it along a specific axis.
a = np.array([[1, 2], [3, 4]])
print("The full matrix:\n", a)
print("The mean of the whole matrix is:", np.mean(a))
print("The standard deviation of the whole matrix is:", np.std(a))
print("The mean of each column is:", np.mean(a, axis=0))
print("The mean of each row is:", np.mean(a, axis=1))
print("The standard deviation of each column is:", np.std(a, axis=0))
Now, let’s generate a random array drawn from a gaussian distribution \(\mathcal{N}\left(3, 6.25\right)\). The Numpy function random.randn
samples values from a standard gaussian distribution \(\mathcal{N}\left(0, 1\right)\). Therefore, to get a gaussian distribution distribution \(\mathcal{N}\left(3, 6.25\right)\), we need to multiply the vector by the standard deviation (i.e., \(\sqrt{6.25}\)) and by adding the mean (i.e., \(3\)).
a = 3 + 2.5 * np.random.randn(2, 4)
Exercise 1. Calculate the mean and standard deviation first of the whole matrix a
and then along the first axis of matrix a
.
# Your code here
Is it close to what you expect? How would you create another matrix a
, in which the mean and the standard deviation are closer to the expected ones?
# Your code here
Exercise 2. Besides mean
and std
, Numpy also offers the functions min
, max
, median
, argmin
, argmax
to caculate the minimum, maximum and median values, index of the minimum and index of the maximum of the array. Apply these functions to the matrix a
and along its axis 0 (think of it as coordinates of your array, with axis 0 along rows and axis 1 along columns). Take a better look at the example above to help you understand the importance of this parameter! If you still feel confused check out this article.
# Your code here
Numpy also supports non-standard numbers, such as np.inf, which represents infinity, and np.nan, which represents “not-a-number”. These can be the results of operations such as division by 0:
a = np.array([0, 1, -4]) / 0
print("Dividing by 0 can generate np.nan or np.inf (also negative) as a result:", a)
Standard operations, when applied to data containing np.nan, will also return np.nan:
a = [0, np.nan, 1]
print("The mean of a vector with a NaN is: ", np.mean(a))
However, Numpy offers functions that can ignore NaNs, such as nanmax
, nanmin
and nanmean
. Let’s create an array including NaN values and test these functions.
Exercise 3. Apply the following functions of numpy to the array a: min
, max
and nanmin
, nanmax
.
# Your code here
Exercise 4. We want to write some code which, given a point, finds the closest one in a set of other points. Such a function is important, for example, in information theory, as it is the basic operation of the vector quantization (VQ) algorithm. In the simple, two-dimensional case shown below, the values refer to the weight and height of an athlete. The set of weights and heights represents different classes of athletes. We want to assign the athlete to the class it is closest to. Finding the closest point requires calculating the Euclidean distance between the athlete’s parameters and each of the classes of athletes. Now, let’s define an athlete with $\(\left[\text{weight, height}\right] = \left[111.0, 188.0\right]\)\( and an array of 4 classes \)\([[102.0, 203.0],\)\( \)\([132.0, 193.0],\)\( \)\([45.0, 155.0],\)\( \)\([57.0, 173.0]]\)$ In the next cell, write some code which returns the index of the class of athletes that the athlete should be assigned to.
observation = np.array([111.0, 188.0])
codes = np.array([[102.0, 203.0],
[132.0, 193.0],
[45.0, 155.0],
[57.0, 173.0]])
diff = codes - observation # the broadcast happens here
print(diff.shape)
dist = np.sqrt(np.sum(diff**2, axis=-1))
print(np.argmin(dist))
Linear algebra examples#
Linear algebra is at the core of Data Science. That’s why NumPy offers array-like data structures & dedicated operations and methods. Let’s first have a look together at the dot
function as an example, which computes the matrix multiplication between two vectors or matrices.
a = np.array([[1,2,3],[2,0,3],[7,-5,1]])
b = np.array([[3,-1,5],[-2,-6,4], [0,4,4]])
print('a @ b: \n', np.dot(a,b))
print('a @ b: \n', a.dot(b))
Exercise 5. Define two random matrices, a
and b
, of sizes (4x2). Transpose b
and save in c
the matrix product between a
and b
transposed.
# Your code here
Exercise 6. Can the c
matrix be inverted? Check it out by computing its determinant and, if it exists, get the inverse matrix.
# Your code here
Exercise 7. Using the inverse matrix and the matrix-multiplication operator, you can now solve a matrix-vector equation. Let’s now find the vector \(x\) that solves the equation $\(Ax = b\)\( given \)A=\left(\begin{matrix} 2 & 1 & -2\ 3 & 0 & 1\ 1 & 1 & -1\end{matrix}\right)\( and \)b=\left(\begin{matrix}-3 \ 5 \ -2 \end{matrix}\right)$.
# Your code here
Exercise 8. Computing the inverse could be very time-consuming. Therefore, it is always better to take advantage of the highly optimized NumPy functions to solve linear equations. Try to solve the same exercise as before but using NumPy’s function linalg.solve
to compute \(x\).
# Your code here
Branching operations#
if, else and elif#
In Python, similarly to all of the C-like languages, branching operations are implemented using the if keyword. If the expression is true, the statement following it will be executed. Otherwise, it is possible to specify the statement to execute in case of the expression is false, by using the else keyword. Both if and else need a colon (:
) at the line, as in the following example:
r = np.random.randn()
if r > 0:
print("The random number is positive")
else:
print("The random number is negative")
In case you want to create multiple branches by applying more than one condition, you can use the keyword elif as in the following example:
animal = "cat"
if animal == "cat":
print("meow")
elif animal == "dog":
print("woof")
elif animal == "cow":
print("moo")
else:
print(f"I don't know the {animal}'s call, sorry :(")
Exercise 9. Let’s try to implement a calculator using if, else and elif. The head of the calculator is already written as the following. You can input a
, b
and option
when running the code. There should be 4 allowed operations:
addition (
1
)subtraction (
2
)multiplication (
3
)division (
4
)
If the option is not one of the 4, the calculator should print “Invalid option”. Implement the missing code using the if
, elif
and else
statements along with the appropriate operations.
print("Welcome to CALCULATOR!")
a = float(input("Enter the first number: "))
b = float(input("Enter the second number: "))
print("Choose one of the following operations:")
print("1 - addition")
print("2 - subtraction")
print("3 - multiplication")
print("4 - division")
option = int(input(""))
# Your code here
print("The result is ", result)
break and continue#
The break statement in Python terminates the current loop and resumes execution at the next statement, just like the traditional break found in C. On the other hand, the continue statement skips all the remaining code in the current iteration of the loop and moves the control back to the top of the loop.
Exercise 10. Try to use a for
loop and the continue
statement to remove all the "h"
s in the string "hello, haha, python"
. The result should be stored in a new string.
# Your code here
Exercise 11. Try to use a for
loop and the break
statement to only keep the letters before "p"
in the string "hello, haha, python"
. The result should be stored in a new string.
# Your code here