BIO-210: Applied software engineering for life sciences

Python Introduction III - Numpy 2 and branching operations#

A deeper dive into Numpy#

Numpy is a widely used Python library for scientific computing. During the last lesson you already learnt quite a few features of Numpy. Today, let’s explore more features!

import numpy as np

Slicing operations (refresh)#

Let’s review together how to index a multi-dimensional array using slicing

a = np.arange(1,101).reshape(10,10)

print('By default, indexing with colon will return all rows and columns')
b = a[:,:]  #[all rows, all columns]
print(b)

print('We can define the start at the end of indexed rows')
b = a[1:3,:]  #[all rows, all columns]
print(b)

print('or the start at the end of indexed columns')
b = a[:,1:3]  #[all rows, all columns]
print(b)

print('We can also specify the start and the end for both rows and columns')
b = a[4:7,1:3]  #[all rows, all columns]
print(b)
By default, indexing with colon will return all rows and columns
[[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]]
We can define the start at the end of indexed rows
[[11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]
or the start at the end of indexed columns
[[ 2  3]
 [12 13]
 [22 23]
 [32 33]
 [42 43]
 [52 53]
 [62 63]
 [72 73]
 [82 83]
 [92 93]]
We can also specify the start and the end for both rows and columns
[[42 43]
 [52 53]
 [62 63]]

Sometimes, it can be useful to skip indexes. This can be achieved by adding another colon (:) and the value that specify how many values you want to skip. Therefore, we can summarize all slicing operations with the following notation [start_idx : end_idx : skip_idx].

print('Print every fourth rows')
b = a[::4,:]
print(b)
Print every fourth rows
[[ 1  2  3  4  5  6  7  8  9 10]
 [41 42 43 44 45 46 47 48 49 50]
 [81 82 83 84 85 86 87 88 89 90]]

While here we are working with relatively small arrays, in real life you might work with large datasets having several axes. When that’s the case, the ellipsis syntax (...) is useful to skip multiple axes. For example, imagine we have a 5D array and we want to get all the axes except the last one, where we want to get the first two elements. We can use the following syntax:

a = np.random.randn(4,4,4,4,4)
print(a)
print('a.shape:', a.shape)
a_sub = a[..., :2] # equivalent to a[:,:,:,:,:2]
print('a_sub.shape:', a_sub.shape)
[[[[[-0.73259046  1.65545975  1.08024118 -0.42085442]
    [-1.6054865   1.72937103  1.54958104 -0.65992926]
    [ 1.6515223  -0.91994808 -0.82500721 -0.33333335]
    [-2.40580452  0.41883391  0.65998413 -0.8733521 ]]

   [[ 1.24488699 -0.13564178 -0.59178281 -0.5337302 ]
    [-1.6150049  -1.10536026 -0.78325083 -0.47528696]
    [ 0.77357331 -1.32467894 -1.18685341 -0.80151691]
    [-0.94098032 -1.80859046 -0.79334853  0.26840461]]

   [[-0.49428918  0.52879157  2.74870648  0.5111648 ]
    [-2.84354461  1.27828775 -0.62351866 -1.01917844]
    [ 0.18338327 -0.17158633  1.24856519 -0.10013884]
    [ 0.89725172  0.7664134   1.15609714  0.74210532]]

   [[-0.48173285 -0.77885719  1.21689275  0.23953797]
    [-1.26093379 -1.37300118 -0.33930374 -1.55224907]
    [ 0.01330765 -0.50545944 -0.59965776  0.15699968]
    [ 1.38298739  1.42096562  0.82907462 -0.63712526]]]


  [[[ 0.44825886  0.22312607 -0.23625828 -0.2219254 ]
    [ 0.23240939  0.11068342 -0.74111152 -0.07567086]
    [-0.86750453  0.13870986  0.30202825 -1.45222476]
    [ 0.70279146  0.91429361 -0.28835752  1.09753195]]

   [[ 0.62717781  0.9101612  -1.092846   -0.27136325]
    [-0.18727622 -0.66303794  0.03206192  0.22749391]
    [ 1.18739422 -0.64874733 -0.12410616  0.58398329]
    [-0.16791128  0.00980347 -0.99674162  2.31762874]]

   [[-1.07391302 -2.42444307 -0.15819801 -0.81630379]
    [-0.09925163 -1.85851579  0.46815795  0.58142563]
    [-0.05754652  0.88360885  0.55416193 -0.17428387]
    [-0.33814477 -0.32675505 -1.1878859  -0.66226207]]

   [[-1.10915366  0.82568464  0.75148642 -0.59365749]
    [-1.5453057   0.22151883  0.62150591  1.12316989]
    [ 0.70768829 -0.71628524 -1.79221523  0.95368563]
    [ 1.29914872  0.90513006  0.48895048  1.13221181]]]


  [[[-2.71406141  0.01677285 -0.74250963 -0.61315778]
    [ 1.77414919  0.89236114 -1.28613943  1.54259896]
    [-1.5727704   0.97545387  1.56664974 -0.25259424]
    [-0.87382138  0.54227999  0.87696907 -0.39912982]]

   [[ 1.78038713 -1.37169447 -1.31496656  1.14545611]
    [-0.29024297 -0.60186632  1.2559353  -0.3945573 ]
    [-1.57268599  0.09213959 -0.48218126  1.28113387]
    [-0.25791337 -1.34791577 -0.32112244  1.34104579]]

   [[-0.88078494 -0.39074689 -0.11457184 -0.88755344]
    [ 1.43873022 -2.14144584  0.25712976 -0.6687793 ]
    [ 0.48489041  0.18735221  0.0426221   2.92511854]
    [-1.82448514 -2.65161802  0.43546285 -0.57026356]]

   [[ 0.2248444  -0.83463324 -0.9268543   1.61950195]
    [-2.26435514 -0.25643981 -0.73777053 -0.22319449]
    [-0.31662714  0.36447363 -0.3635102   0.670484  ]
    [-1.31197974 -0.26704947  0.44254283  0.10372228]]]


  [[[-0.07219214  0.25001691  0.91954929  2.10657764]
    [-1.15263338  0.14207967 -1.33864456  0.5309228 ]
    [-0.67997616  2.41921076  0.02012306 -0.14627855]
    [ 0.43251591  1.9735727   1.57737086 -0.59988659]]

   [[-1.09176406 -1.30386954 -0.69468812 -0.29948274]
    [-0.31503001  0.4359144   2.27511963 -0.66481783]
    [-0.62467849  2.22310528 -1.207204    1.73002641]
    [ 1.04834695 -0.10624908 -0.36533522  0.27655963]]

   [[-0.06189781 -0.81247536  0.95415779  0.00589826]
    [ 0.89149222  1.23945209  0.91627868  0.03845657]
    [-0.64192702 -0.29826606 -0.85156393 -0.86823544]
    [-0.04947991 -0.5833417  -0.50176033  0.21279114]]

   [[ 2.53871561 -0.86343476 -0.8965879   0.38364734]
    [ 0.15592102 -0.24908802  0.0508601  -0.89064672]
    [ 1.0636094  -0.43415873 -0.09446403 -1.50191963]
    [ 0.5188404   1.16909307  1.09018558  0.14047888]]]]



 [[[[ 1.45910594  0.71695788  0.36238917  0.99561179]
    [-0.51034202 -0.9160037  -0.57902592  0.32485011]
    [-0.03291205 -1.47767545  0.20945856 -0.3836998 ]
    [ 0.16916339 -0.26499198 -1.10689414 -0.54273766]]

   [[ 0.180127   -0.45518805 -0.58955763 -0.26721   ]
    [-0.42533733 -0.04499218 -0.07429212  1.04001999]
    [ 1.69406179 -0.13041538 -2.35809118 -1.07355955]
    [-0.83548747 -1.0089865  -0.973705   -1.07777041]]

   [[ 2.08488695  1.61393253  0.49109219  1.59539215]
    [ 1.48695038 -0.43337407 -0.16519323 -0.56642192]
    [ 0.89461304  0.93844437  0.00934827  0.38523137]
    [-0.43490527 -0.24723249 -1.34740185 -1.4652221 ]]

   [[ 0.18209451  1.61126509 -0.51281566 -0.98078637]
    [-0.57095313  0.39577302  0.38914878  0.68044265]
    [ 1.10127454 -0.74902788 -0.34618576  0.75466216]
    [-0.38427347 -0.45906779 -1.23619255  0.62349936]]]


  [[[-0.08317853  0.33001772  1.76911931 -0.24764222]
    [-0.35164411 -0.16209143  0.32816817 -0.07228298]
    [ 0.41341724 -0.48250005 -0.63354446 -0.60600489]
    [ 0.68334795 -0.73455753  0.10872484  0.94004726]]

   [[ 0.82256997  0.48836885 -2.17011812 -1.0727351 ]
    [ 1.6074252   2.32015003  1.4650647  -0.33886137]
    [-0.25704464  1.74823929  0.29811659 -1.75075748]
    [ 0.45307257  2.75177439 -0.09003418  0.55848572]]

   [[-0.28183253 -1.17953636  0.73374394  1.22556271]
    [-0.47721748  0.4587902   0.06288066  0.01613481]
    [ 0.23491499 -0.78116665  0.90385449  0.20460412]
    [-1.10723115  0.85637374  0.11730646 -1.60798174]]

   [[-1.44354241  0.22156641  1.75800452 -0.1748248 ]
    [-1.17049927  0.32042999  0.41832688  0.68708452]
    [ 0.4010655   1.95028347  0.39269625 -0.57273457]
    [ 0.04868504 -1.82422354  0.85322918 -1.15968501]]]


  [[[ 0.83826607  0.87222225  1.26408095  0.18543829]
    [-1.57123075  1.28067039  0.04822204 -0.66978865]
    [ 0.4374888  -1.65769653 -1.00265249  0.80234876]
    [-1.58496517  1.19660029 -0.2877891   1.64194613]]

   [[-0.398539    0.18211977 -1.64872684  0.57200267]
    [-0.96154791  0.2230738  -0.54056302  0.12540754]
    [ 0.84547048  0.19488898 -0.10088778  0.30657294]
    [ 0.16098107  0.47566789 -0.17286851 -0.30375344]]

   [[ 0.31866635 -0.93592452 -1.14501744 -0.50578272]
    [ 0.83490725  1.4199449   0.04559826  0.12218243]
    [-0.25430045 -0.06103935  0.48664066 -0.65341953]
    [ 2.18905352 -0.70741332 -0.1556759  -1.05955294]]

   [[ 0.24817371  1.36113964  1.77520707 -0.39395376]
    [-0.31320237 -1.63275277  0.2160415  -0.02841376]
    [-2.13086309 -0.40995599  0.36801186 -1.68913982]
    [ 0.38299442 -0.43932537  1.5249273   2.66000749]]]


  [[[-0.560522   -0.11444549  0.83199988  0.03843145]
    [ 0.4586097  -0.62421525  0.15465179  1.5092242 ]
    [ 0.29903737 -1.4377552   0.71661027 -0.89738867]
    [ 0.55863967 -0.64751199  1.02019973 -1.90517128]]

   [[-1.28889682  0.8867429   0.07033505  0.67643244]
    [-0.79502708  1.41023305 -0.77673977 -0.76112596]
    [ 2.19800123  0.30181513  0.58769916 -0.67154171]
    [-0.53822552 -1.68427348  1.80265423 -0.01630918]]

   [[ 1.09911062 -0.64867456 -0.6247806   0.70904645]
    [-1.33735196  2.30502702 -0.22899375  1.55632749]
    [ 0.39202086 -0.40672105  0.13914959  0.72482612]
    [ 0.65210961 -0.07297388 -2.24154974  1.6214445 ]]

   [[-0.42928305 -0.69173067  0.95685677  0.13378826]
    [ 0.46397889 -0.89057565 -0.06105192  1.09770755]
    [-0.77608998 -0.15023429 -0.25298443 -1.13385834]
    [ 1.3594014   1.44246229 -0.98565479  0.16790738]]]]



 [[[[ 0.43778379  0.09078628  0.34380662  0.53559061]
    [-0.25007767  1.4110224   0.38787894 -2.36692976]
    [-1.66002629 -1.32200152 -0.6398609   0.57808438]
    [ 0.36296494 -1.44394001  0.66820121  0.5898018 ]]

   [[ 1.44704526 -0.78122154  0.2961461  -1.14699285]
    [-0.77310583  0.03045289  0.16864332 -0.20718994]
    [ 1.85296692  0.95933144 -0.23109271 -0.98616872]
    [ 0.9277783   1.02428863  0.59457676 -0.09356425]]

   [[ 2.61897402  1.2173076  -0.47778208 -0.57243863]
    [ 1.07386914 -0.83622597 -0.76204733  0.909127  ]
    [ 1.42745718  1.2099638  -1.26264761 -0.99573906]
    [-0.25724755 -1.05067845  0.38015032 -1.99642609]]

   [[-0.81936396 -0.13951251  0.56648719  1.69994458]
    [ 0.19684228  0.24332256 -0.71891198 -0.04924593]
    [-0.55704383 -0.55075517 -0.5949907  -1.91707008]
    [ 0.31871205 -1.06038908  0.25790121  1.71070927]]]


  [[[ 0.32843961  0.4208313   1.83861231  0.61545208]
    [-0.66511838 -1.01501007 -0.73466157  1.14881848]
    [ 0.40389187  0.78693654 -1.07122412  1.04168787]
    [-0.74777234 -0.70293332  0.37748098  0.80616132]]

   [[-1.96518545  0.89553721  0.17597466  0.47861846]
    [ 0.10412026  0.1673505  -0.25278219  0.98116444]
    [-0.47656582  0.94471294 -1.13013432 -0.835704  ]
    [ 0.6886547  -1.93986475 -1.35699211  0.28020863]]

   [[ 0.18955071 -0.27955463  1.1382406  -0.31087741]
    [ 1.27560883 -0.37508738 -0.9160232   0.40085467]
    [ 1.06311423 -0.36891092  0.26799282 -0.32411431]
    [ 0.99821407 -0.21023537  0.86741461  1.26828189]]

   [[ 0.08851312  1.53655482 -0.46661447 -1.21580107]
    [ 1.40233887 -0.57221984 -0.59119358  1.11274908]
    [ 1.66893434  0.75439462 -0.33131519  0.25552038]
    [-0.06027168  0.37445775  0.78837237  0.15650732]]]


  [[[-0.59029549 -0.91350391  0.47208276 -0.96299131]
    [ 0.19048229  1.10570232  0.42237264 -0.22267007]
    [-0.2423586  -3.11792439 -1.06241035  1.15353604]
    [ 0.22205142  0.57312509 -0.31760994 -0.04778285]]

   [[ 0.24932875  0.19661966  1.62868755  0.1555762 ]
    [-0.05714253 -0.91888259 -0.44773456  1.70384864]
    [-0.4142581   1.3327024   1.32899672 -0.2218352 ]
    [-1.44189801 -1.09445816 -2.64481843 -0.67608692]]

   [[-0.82569006 -1.1407492   1.20287163  0.61157367]
    [-0.79145744 -0.42686774  0.68901237 -0.85858048]
    [-1.40322859  0.7963617  -0.536079   -0.71175974]
    [-0.5044757  -0.96121814  1.99022391 -0.71583115]]

   [[-0.33616086  0.20701976  0.1348324  -0.21282513]
    [-0.2433392  -0.02115323 -0.92197072  1.71855366]
    [ 1.22347796  1.31013152  0.90650556 -1.52518511]
    [-0.18526411 -0.61903874  1.48748396  0.44425589]]]


  [[[-0.44350753 -0.29303471  1.06656971  1.1355841 ]
    [ 0.66644243  0.54825225  1.31318201  1.68042424]
    [-1.04880359  0.64259577 -0.47660495 -1.65533231]
    [-0.44554762  1.02842138 -0.63241187 -0.55359434]]

   [[ 0.85835594  0.01356334  0.20403106  0.44640005]
    [ 0.56893238  0.02747976 -0.27433689 -1.02254174]
    [-0.63159543 -2.16560901 -1.47985711 -0.96166601]
    [ 0.57119324 -0.97867621 -0.24240759  0.83527695]]

   [[ 0.77234314  1.11781576  0.98140754 -0.07414728]
    [ 1.52344543 -0.67092495  0.92831387 -0.1821982 ]
    [ 1.82841966  0.72257109 -0.01087559  0.44881439]
    [-1.18401874 -1.44633002 -1.14082521 -0.16775695]]

   [[ 2.67388786 -1.66117649 -0.74651276 -0.71283018]
    [ 0.41681931  1.40216203 -1.12629993 -1.63952406]
    [ 0.71465646 -0.80488016 -0.71173544  0.87924565]
    [ 0.14064198  0.6377893   0.07412289  0.91856746]]]]



 [[[[-0.65065455 -0.82570066  0.11720313  0.43164587]
    [ 1.50981916 -0.37742887 -0.51355906 -0.53465078]
    [ 0.78626725  1.19143764  1.25683396 -0.73081732]
    [-1.10802249  0.17694536  0.15646043 -0.22753898]]

   [[-0.46839988 -0.00636386 -0.26019584 -0.89211461]
    [ 0.84986196  0.59582409  1.01268808 -1.06441051]
    [ 0.21993469  0.53677396 -0.36703415 -0.18856509]
    [ 0.69819644  0.66563417  0.2705344  -0.84308909]]

   [[ 1.9174527  -1.36880836 -0.59102606  0.23634864]
    [-0.53557197 -0.64049546  0.74944914 -1.69664302]
    [-0.85986192 -1.38246393  0.42746125  1.26691124]
    [-1.35914189 -0.66241463  0.59896132  0.89700137]]

   [[ 0.2199705  -0.70865597  0.5440263   1.13705973]
    [ 0.12588573  0.2537209  -0.80799049  0.01876606]
    [ 0.57123353 -0.22269951 -0.44952335 -0.24270526]
    [ 0.4640281   0.44955864 -0.75313983  1.02543118]]]


  [[[-1.77423503 -1.64186887 -1.20684688 -1.36584523]
    [ 0.87351233  1.08758746 -1.64308888  1.08315447]
    [-1.2254353   0.02337178 -0.21439532 -1.4025054 ]
    [ 0.689348    0.56704237  0.38735452  0.41990557]]

   [[ 0.41098235 -0.6378108   0.32072005 -0.5730145 ]
    [-0.76328869 -0.69072242 -0.88844502  1.34399674]
    [-1.09154938 -0.58212064  0.16694833 -0.22677904]
    [ 0.41060362 -1.61386261  1.26806118  0.56481418]]

   [[ 0.55828012 -1.25801494  2.35211284 -1.08821172]
    [ 0.41333355 -0.41458534 -0.20651351 -0.72433932]
    [-1.42693539 -0.72201555  0.02062595 -0.98677299]
    [-0.30276583 -1.06571901  0.35789612 -1.36645202]]

   [[ 0.63925426  0.25131643  0.26331432 -1.16432147]
    [ 1.5827864  -1.67659517  0.61073907 -0.21080467]
    [-0.92717006 -0.6448695   0.46564099  0.67999495]
    [-0.03737796  0.77530846 -0.11852301 -0.17499072]]]


  [[[ 0.998257    1.47537408 -0.06306386  0.90795759]
    [-0.16107848 -0.37986683 -2.27071439  0.43075923]
    [-0.22823398 -0.17209298  0.47808669 -0.560028  ]
    [ 0.03774485  0.87277237  0.55242103 -2.35426508]]

   [[ 1.31657476 -0.09694453 -0.77932889 -0.05183872]
    [ 1.20023865 -0.12021736 -0.83688304  2.63452138]
    [ 1.14010123 -1.239519   -0.71332227  1.03103466]
    [-1.40128164 -0.21079976 -0.6555462   0.44599248]]

   [[-0.36346664 -0.30535581  1.57707538  0.64706787]
    [ 1.77222191 -3.23880815 -0.91219992  1.4059642 ]
    [-1.37531781  1.701963   -0.293898   -0.14979814]
    [ 1.34334528  0.82948189  0.1951424   0.03129858]]

   [[ 0.92952076 -1.30324799 -0.43007961  0.2995522 ]
    [-0.09134232  0.40625987  0.4845333  -0.80219878]
    [ 0.6113299  -0.01927511 -0.23682886 -0.5945755 ]
    [-1.18705477  0.84260391  0.81329115  2.34164962]]]


  [[[-0.49324365  1.3278168   1.18149961 -0.33829924]
    [ 0.68017171  2.26747949 -0.28064444  0.20743049]
    [ 0.30153931 -0.4787519   2.07853777 -1.38638622]
    [-0.962696    0.3246863  -0.58142074 -1.54932231]]

   [[-2.39717581  0.65159786 -1.75034916 -0.80269451]
    [-0.8519692   0.33241481 -3.03341099 -1.56335997]
    [ 0.55543396  0.42181026  0.23340134  0.80508611]
    [-0.29128807  1.98878761 -0.49392521 -0.75204588]]

   [[-0.45299498  0.72660144  1.49981726  0.67928226]
    [ 0.42261351 -1.06870531  0.62246345 -2.37213809]
    [-0.24911462  0.29165253  0.40225083  0.31565021]
    [-1.42547738 -0.8394182   2.12919443 -0.21275316]]

   [[ 0.21633526 -1.09405081  1.13325267  2.34627174]
    [ 0.07659923 -0.8474831  -0.22910391  2.1836285 ]
    [-1.06922081  0.04930368  1.03629542  0.08104873]
    [-0.71584298 -0.75581855  0.03076705  2.50967895]]]]]
a.shape: (4, 4, 4, 4, 4)
a_sub.shape: (4, 4, 4, 4, 2)

Exercise 0. Index matrix a and print all even numbers between 40 (excluded) and 70.

b = a[4:7,1::2]  #[all rows, all columns]
print(b)
[]

Basic statistical functions#

NumPy contains various statistical functions that are used for data analysis. These functions are useful, for example, to find the maximum or the minimum element of a vector. It is also used to compute common statistical operations like standard deviation, variance, etc.

The functions mean and std are used to caculate the mean and standard deviation of the input data (e.g., of an array). Besides caculating the result for the whole data, they can also be used to calculate it along a specific axis.

a = np.array([[1, 2], [3, 4]])

print("The full matrix:\n", a)
print("The mean of the whole matrix is:", np.mean(a))
print("The standard deviation of the whole matrix is:", np.std(a))
print("The mean of each column is:", np.mean(a, axis=0))
print("The mean of each row is:", np.mean(a, axis=1))
print("The standard deviation of each column is:", np.std(a, axis=0))
The full matrix:
 [[1 2]
 [3 4]]
The mean of the whole matrix is: 2.5
The standard deviation of the whole matrix is: 1.118033988749895
The mean of each column is: [2. 3.]
The mean of each row is: [1.5 3.5]
The standard deviation of each column is: [1. 1.]

Now, let’s generate a random array drawn from a gaussian distribution \(\mathcal{N}\left(3, 6.25\right)\). The Numpy function random.randn samples values from a standard gaussian distribution \(\mathcal{N}\left(0, 1\right)\). Therefore, to get a gaussian distribution distribution \(\mathcal{N}\left(3, 6.25\right)\), we need to multiply the vector by the standard deviation (i.e., \(\sqrt{6.25}\)) and by adding the mean (i.e., \(3\)).

a = 3 + 2.5 * np.random.randn(2, 4)

Exercise 1. Calculate the mean and standard deviation first of the whole matrix a and then along the first axis of matrix a.

print(np.mean(a))
print(np.std(a))
print(np.mean(a, axis=0))
print(np.std(a, axis=0))
3.064807678395922
1.7832098983100468
[2.13287193 4.17379435 2.6700895  3.28247493]
[0.44933934 0.66132021 0.98292674 2.96857167]

Is it close to what you expect? How would you create another matrix a, in which the mean and the standard deviation are closer to the expected ones?

a = 3 + 2.5 * np.random.randn(100, 100)
print(np.mean(a))
print(np.std(a))
print(np.mean(a, axis=0))
print(np.std(a, axis=0))
2.9904579566166567
2.5072731201850615
[3.10541875 2.49603685 2.91885578 3.33744535 3.2932903  2.90978897
 2.67220505 2.78440997 2.86235316 3.22165684 2.41504156 2.70757555
 3.35381106 3.10099544 2.55861295 3.01582228 2.91416112 3.01357986
 3.10593719 2.98477492 3.07971777 3.55893887 3.29258474 3.12801052
 2.89650106 3.03472116 3.02446446 2.30289049 2.98279343 3.14805667
 2.58144433 3.07794682 3.14388897 3.07571043 2.76484406 2.94509178
 3.03467247 3.0301527  2.80982711 2.7456566  2.56985151 3.38005056
 2.6727635  3.01561375 2.90215531 3.46032185 2.97004436 3.2630465
 2.73746091 3.27658269 3.33390036 2.89638696 3.51980733 2.71361656
 2.81537787 3.19019628 3.02086331 2.80267468 3.45409111 3.14725347
 2.97773914 2.90822528 3.59174743 2.80860135 3.15160842 3.17729413
 3.323342   2.74538924 2.91385382 2.85295356 2.98961219 3.11351895
 2.85689969 2.29009684 2.44987694 2.82805582 3.20735298 2.78372852
 2.98537614 2.98109643 2.9340485  3.08334186 2.80671126 2.84375314
 3.3683201  2.93189283 2.91794566 2.82591942 3.28682059 3.0231723
 2.62286117 3.02667836 3.06347233 2.50996961 3.15915902 3.18570559
 3.21963063 3.03779893 3.30290273 3.38757651]
[2.53342115 2.41790511 2.3203647  2.31210795 2.4559351  2.38367383
 2.28846081 2.43956671 2.22881098 2.54276834 2.47344991 2.43783144
 2.65646629 2.52511308 2.34249211 2.57910457 2.24073984 2.29793795
 2.11607518 2.50901768 2.48044361 1.95820197 2.35989814 2.60757372
 2.70943395 2.39189439 2.62085345 2.41793423 2.31107848 2.16036742
 2.43171675 2.32530875 2.45138558 2.36673165 2.2163198  2.55553786
 2.68074543 2.96407682 2.54510516 2.58989536 2.63860587 2.57802201
 2.55400134 2.45494325 2.82940711 2.33820408 2.65858018 2.93109802
 2.56721141 2.46666815 2.53849804 2.50371734 2.38016277 2.67721737
 2.36046411 2.54074031 2.79263678 2.69559571 2.55346842 2.65356925
 2.46698576 2.42267783 2.73290837 2.61081388 2.29035837 2.35867405
 2.42371753 2.73804487 2.56595728 2.45645919 2.52767296 2.3054107
 2.45086725 2.78753204 2.27771945 2.60589718 2.85370621 2.49000124
 2.82519971 2.31781459 2.53789844 2.51881077 2.36218411 2.1428078
 2.45933708 2.5202895  2.55450988 2.72492414 2.66102325 2.36518402
 2.70344124 2.20895881 2.42275011 2.29220193 2.38876095 2.29171825
 2.56611213 2.57688247 2.40043694 2.48681623]

Exercise 2. Besides mean and std, Numpy also offers the functions min, max, median, argmin, argmax to caculate the minimum, maximum and median values, index of the minimum and index of the maximum of the array. Apply these functions to the matrix a and along its axis 0 (think of it as coordinates of your array, with axis 0 along rows and axis 1 along columns). Take a better look at the example above to help you understand the importance of this parameter! If you still feel confused check out this article.

print(np.min(a))
print(np.max(a))
print(np.median(a))
print(np.argmin(a))
print(np.argmax(a))
print(np.min(a, axis=0))
print(np.max(a, axis=0))
print(np.median(a, axis=0))
print(np.argmin(a, axis=0))
print(np.argmax(a, axis=0))
-6.979279296567148
12.032330825220777
2.9771355543544926
4035
3841
[-3.12832309 -5.31194792 -2.8816978  -2.72265791 -2.37058577 -2.7162038
 -2.7007688  -3.75540962 -1.48343833 -3.51255916 -3.63385632 -5.53817595
 -2.9676817  -4.31863239 -5.35568812 -3.76613221 -2.28588299 -2.46622853
 -2.40289339 -3.13675217 -5.03052703 -1.39053094 -1.91904117 -3.79454246
 -3.21374104 -2.7022795  -1.68386541 -3.01199193 -2.10728892 -2.0145649
 -3.56639044 -2.65559535 -3.27633916 -3.29520907 -3.31410924 -6.9792793
 -4.5583797  -2.84875665 -3.85645354 -2.34909069 -4.58729219 -2.78805561
 -2.14821199 -3.01051143 -6.52258767 -2.93503089 -4.28876851 -4.24326889
 -4.29138525 -5.63549462 -2.54672068 -3.50023718 -3.14840135 -2.88378325
 -2.3918024  -3.03145261 -5.14022247 -3.85196265 -2.42263187 -1.917807
 -2.47422481 -3.21945669 -3.69215113 -2.16874898 -1.68640033 -1.42160212
 -1.96567707 -3.20291171 -3.45131056 -1.94554154 -3.02927807 -2.431723
 -2.48302331 -4.36795739 -4.48227289 -5.32018809 -5.07392065 -3.15461078
 -3.74152395 -3.17104011 -3.35323813 -2.43510933 -3.09558109 -1.87901746
 -2.3233309  -3.42450093 -4.97817636 -4.00562773 -2.17910957 -2.42223214
 -3.31389104 -1.69009649 -2.35036601 -3.84374566 -3.79652811 -2.38618476
 -3.7914225  -4.62115825 -2.4723096  -2.67757098]
[ 8.6224609   8.60173785  8.91367475  8.86535794  9.41664213  7.92720893
  7.36609452  8.22295798  7.36707772 10.03900024  7.84599785  9.62529338
 10.7602499   9.31849044  8.63676511  9.17772903  7.25526954  8.2176893
  7.85413254 10.95691758  8.98159813  8.06385451 11.34213562 10.42732429
  9.71138218  7.21104617 10.61404078  8.45614149  8.77628201  7.38522388
  9.75950826  9.49468176  8.66224993  8.38678763  8.36761743  8.09255087
  9.02006688  9.20399608  9.96731008  8.71123437  9.85517232 12.03233083
 10.44778152 11.10201768  9.78606348  8.94284974 10.52778802  9.54594183
 10.13766102  9.57312038  9.32248231 10.84571309  8.96466831  9.2992305
 10.47749162  9.38477042  9.61563535 11.97214746  9.50802376 10.39927453
 10.62007593  9.87658083  9.75966065 11.42104795  9.61789093 10.76392813
  9.05338474  9.40080592  8.90009035 10.01275632  7.88276372  9.0924306
 10.19846151  9.07527209  8.02655395  9.06707314  9.0510807  10.3921829
  9.94152666  8.69352364 10.01838181  8.9458616   8.12957031  8.13553265
  9.48599968  9.20002515  8.47912164  9.0714022   9.74852604  9.63150831
  9.92675606  7.92559222  7.93196064  8.47069947  9.37531047  8.38168966
  9.55033695 10.57183178  9.15402527  9.71483443]
[3.19362295 2.59446267 2.96571203 3.45750122 3.04568117 2.89730356
 2.84137127 2.85262497 2.94548271 3.08486138 2.30059676 2.84198312
 3.61858939 3.11350766 2.78633522 2.85489904 2.93920905 3.09245307
 3.06921265 3.01982673 2.92438664 3.38197257 3.31104351 2.91633187
 3.03523572 3.06783001 2.7725262  1.84611958 3.0675667  3.47154729
 2.44412895 3.23264319 3.18608284 3.26832053 2.77411539 3.07234315
 3.10085324 3.08166979 2.813815   2.62618365 2.83797945 2.99844153
 2.7958101  2.99394017 2.88816556 3.69482077 2.75997302 3.31675232
 2.61994819 3.24834757 3.23025361 3.03240471 3.49350264 2.31433126
 2.85641867 3.10453878 3.063164   2.6219501  3.57488039 2.96460403
 3.09473154 2.81717489 3.95003087 2.3015068  3.33605769 2.62129407
 3.20258061 2.82377334 2.95712402 2.62494657 3.40322145 3.23371035
 2.91661429 2.10590471 2.43402573 2.96734082 3.08811686 2.77367034
 3.04226342 3.02577113 3.05571548 2.93465662 2.91956885 2.85333176
 3.15513283 2.79690966 3.03819062 2.97778545 3.3710827  2.96372007
 2.52193453 2.84422363 2.83236774 2.65904656 3.30512607 3.18612721
 2.93306766 3.24797648 3.35750721 3.45847637]
[84 96 14 16 29 55 23 59 84 94 47 48 81 53 94 59 13 71 28 14 80 40 99 46
 23 46 89 44 24 20 19  4 65 72 83 40 51 77 49 43 96 20 11 89 65 85 40 69
 75 81 53 47 81  0 59 95 54 28 65 79 33  1 98 54 64 93 36 42 47 31 99 65
 59 81 73 23 95 88 44 76 21 23  8 26 70 53 76 28 34 24 31  0 90 75 28 55
  3 68 49 63]
[45 66 20 23 97 40 47 97 30 20 72 56 16 82 74 39  3 63 18 71 54 83 16 81
 11 72 48 31 85 52 62 51 10 90 65 86 12 18 77 69 84 38 28 86 35 93  5 56
 25 13 76 66 74 34 38 22 45 72 34  4 70 19 54  2 16 82 23 41 40 56 32 93
 48 18 25 39 93 73 45 41 74 16 99 96 74 11 54 30 59 39 22 40 56 42 43 98
 36  9 96 70]

Numpy also supports non-standard numbers, such as np.inf, which represents infinity, and np.nan, which represents “not-a-number”. These can be the results of operations such as division by 0:

a = np.array([0, 1, -4]) / 0
print("Dividing by 0 can generate np.nan or np.inf (also negative) as a result:", a)
Dividing by 0 can generate np.nan or np.inf (also negative) as a result: [ nan  inf -inf]
/var/folders/kf/0ks7zrps72scr4cj79sdx08m0000gq/T/ipykernel_69182/972857553.py:1: RuntimeWarning: divide by zero encountered in true_divide
  a = np.array([0, 1, -4]) / 0
/var/folders/kf/0ks7zrps72scr4cj79sdx08m0000gq/T/ipykernel_69182/972857553.py:1: RuntimeWarning: invalid value encountered in true_divide
  a = np.array([0, 1, -4]) / 0

Standard operations, when applied to data containing np.nan, will also return np.nan:

a = [0, np.nan, 1]
print("The mean of a vector with a NaN is: ", np.mean(a))
The mean of a vector with a NaN is:  nan

However, Numpy offers functions that can ignore NaNs, such as nanmax, nanmin and nanmean . Let’s create an array including NaN values and test these functions.

Exercise 3. Apply the following functions of numpy to the array a: min, max and nanmin, nanmax.

a = np.array([1, 2, np.nan, np.inf])
print(np.min(a))
print(np.max(a))
print(np.nanmin(a))
print(np.nanmax(a))
nan
nan
1.0
inf

Exercise 4. We want to write some code which, given a point, finds the closest one in a set of other points. Such a function is important, for example, in information theory, as it is the basic operation of the vector quantization (VQ) algorithm. In the simple, two-dimensional case shown below, the values refer to the weight and height of an athlete. The set of weights and heights represents different classes of athletes. We want to assign the athlete to the class it is closest to. Finding the closest point requires calculating the Euclidean distance between the athlete’s parameters and each of the classes of athletes. Now, let’s define an athlete with $\(\left[\text{weight, height}\right] = \left[111.0, 188.0\right]\)\( and an array of 4 classes \)\([[102.0, 203.0],\)\( \)\([132.0, 193.0],\)\( \)\([45.0, 155.0],\)\( \)\([57.0, 173.0]]\)$ In the next cell, write some code which returns the index of the class of athletes that the athlete should be assigned to.

observation = np.array([111.0, 188.0])
codes = np.array([[102.0, 203.0],
               [132.0, 193.0],
               [45.0, 155.0],
               [57.0, 173.0]])
diff = codes - observation    # the broadcast happens here
print(diff.shape)
dist = np.sqrt(np.sum(diff**2, axis=-1))
print(np.argmin(dist))
(4, 2)
0

Linear algebra examples#

Linear algebra is at the core of Data Science. That’s why NumPy offers array-like data structures & dedicated operations and methods. Let’s first have a look together at the dot function as an example, which computes the matrix multiplication between two vectors or matrices.

a = np.array([[1,2,3],[2,0,3],[7,-5,1]])
b = np.array([[3,-1,5],[-2,-6,4], [0,4,4]])
print('a @ b: \n', np.dot(a,b))
print('a @ b: \n', a.dot(b))
a @ b: 
 [[-1 -1 25]
 [ 6 10 22]
 [31 27 19]]
a @ b: 
 [[-1 -1 25]
 [ 6 10 22]
 [31 27 19]]

Exercise 5. Define two random matrices, a and b, of sizes (4x2). Transpose b and save in c the matrix product between a and b transposed.

a = np.random.randn(4, 2)
b = np.random.randn(4, 2)
b = np.transpose(b)
c = np.dot(a, b)

Exercise 6. Can the c matrix be inverted? Check it out by computing its determinant and, if it exists, get the inverse matrix.

if np.abs(np.linalg.det(c)) > 1e-6:
    inv_c = np.linalg.inv(c)
    print(inv_c)
else: 
    print("The determinant is too small")
The determinant is too small

Exercise 7. Using the inverse matrix and the matrix-multiplication operator, you can now solve a matrix-vector equation. Let’s now find the vector \(x\) that solves the equation $\(Ax = b\)\( given \)A=\left(\begin{matrix} 2 & 1 & -2\ 3 & 0 & 1\ 1 & 1 & -1\end{matrix}\right)\( and \)b=\left(\begin{matrix}-3 \ 5 \ -2 \end{matrix}\right)$.

A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
A_inv = np.linalg.inv(A)
b = np.transpose(np.array([[-3,5,-2]]))
print(f"The shape of A is {A.shape}")
print(f"The shape of A_inv is {A_inv.shape}")
print(f"The shape of b is {b.shape}")
x = np.dot(A_inv, b)
print("The solution is:\n", x)
The shape of A is (3, 3)
The shape of A_inv is (3, 3)
The shape of b is (3, 1)
The solution is:
 [[ 1.]
 [-1.]
 [ 2.]]

Exercise 8. Computing the inverse could be very time-consuming. Therefore, it is always better to take advantage of the highly optimized NumPy functions to solve linear equations. Try to solve the same exercise as before but using NumPy’s function linalg.solve to compute \(x\).

x = np.linalg.solve(A,b)
print(x)
[[ 1.]
 [-1.]
 [ 2.]]

Branching operations#

if, else and elif#

In Python, similarly to all of the C-like languages, branching operations are implemented using the if keyword. If the expression is true, the statement following it will be executed. Otherwise, it is possible to specify the statement to execute in case of the expression is false, by using the else keyword. Both if and else need a colon (:) at the line, as in the following example:

r = np.random.randn()
if r > 0:
    print("The random number is positive")
else:
    print("The random number is negative")
The random number is negative

In case you want to create multiple branches by applying more than one condition, you can use the keyword elif as in the following example:

animal = "cat"

if animal == "cat":
    print("meow")
elif animal == "dog":
    print("woof")
elif animal == "cow":
    print("moo")
else:
    print(f"I don't know  the {animal}'s call, sorry :(")
meow

Exercise 9. Let’s try to implement a calculator using if, else and elif. The head of the calculator is already written as the following. You can input a, b and option when running the code. There should be 4 allowed operations:

  • addition (1)

  • subtraction (2)

  • multiplication (3)

  • division (4)

If the option is not one of the 4, the calculator should print “Invalid option”. Implement the missing code using the if, elif and else statements along with the appropriate operations.

print("Welcome to CALCULATOR!")

a = float(input("Enter the first number: "))
b = float(input("Enter the second number: "))

print("Choose one of the following operations:")
print("1 - addition")
print("2 - subtraction")
print("3 - multiplication")
print("4 - division")

option = int(input(""))

if (option == 1):
    result = a + b
elif (option == 2):
    result = a - b
elif (option == 3):
    result = a * b
elif (option == 4):
    result = a / b
if option > 0 and option < 5:
    print("result: %f" % (result))
else:
    print("Invalid option")
    
print("The result is ", result)
Welcome to CALCULATOR!
Choose one of the following operations:
1 - addition
2 - subtraction
3 - multiplication
4 - division
result: 2.000000
The result is  2.0

break and continue#

The break statement in Python terminates the current loop and resumes execution at the next statement, just like the traditional break found in C. On the other hand, the continue statement skips all the remaining code in the current iteration of the loop and moves the control back to the top of the loop.

Exercise 10. Try to use a for loop and the continue statement to remove all the "h"s in the string "hello, haha, python". The result should be stored in a new string.

str_1 = "hello, haha, python"
str_2 = ""
for letter in str_1:
    if letter == 'h':
        continue
    str_2 += letter
print(str_2)
ello, aa, pyton

Exercise 11. Try to use a for loop and the break statement to only keep the letters before "p" in the string "hello, haha, python". The result should be stored in a new string.

str_1 =  "hello, haha, python"
str_2 = ""
for letter in str_1:
    if letter == 'p':
        break
    str_2 += letter
print(str_2)
hello, haha,