Python basics
Objectives
Knowing what types exist
Knowing the most common data structures (collections): lists, tuples, dictionaries, and sets
Creating and using functions
Knowing what a library is
Knowing what
import
doesBeing able to “read” an error
Motivation for Python
Free
Huge ecosystem of examples, libraries, and tools
Relatively easy to read and understand
Similar in scope and use cases to R, Julia, and Matlab
Basic types
# int
num_measurements = 13
# float
some_fraction = 0.25
# string
name = "Bruce Wayne"
# bool
value_is_missing = False
skip_verification = True
# we can print values
print(name)
# and we can do arithmetics with ints and floats
print(5 * num_measurements)
print(1.0 - some_fraction)
Python is dynamically typed: We do not have to define that an integer is an
int
, we can use it this way and Python will infer it.However, one can use type annotations in Python (see also mypy).
Now you also know that we can add
# comments
to our code.
Data structures for collections: lists, dictionaries, sets, and tuples
# lists are good when order is important
scores = [13, 5, 2, 3, 4, 3]
# first element
print(scores[0])
# we can add items to lists
scores.append(4)
# lists can be sorted
scores.sort()
print(scores)
# dictionaries are useful if you want to look up
# elements in a collection by something else than position
experiment = {"location": "Svalbard", "date": "2021-03-23", "num_measurements": 23}
print(experiment["date"])
# we can add items to dictionaries
experiment["instrument"] = "a particular brand"
print(experiment)
if "instrument" in experiment:
print("yes, the dictionary 'experiment' contains the key 'instrument'")
else:
print("no, it doesn't")
Lists
are good when order is important, and it needs to be changedDictionaries
are mappings key→value.Sets
are useful for unordered collections where you want to make sure that there are no repetitions.There are also
tuples
that are similar to lists but their items cannot be modified.
You can put:
dictionaries inside lists
lists inside dictionaries
dictionaries inside dictionaries
lists inside lists
tuples inside …
…
Iterating over collections
Often we wish to iterate over collections.
Iterating over a list:
scores = [13, 5, 2, 3, 4, 3]
for score in scores:
print(score)
# example with f-strings
for score in scores:
print(f"the score is {score}")
We don’t have to call the variable inside the for-loop “score”. This is up to us. We can do this instead (but is this more understandable for humans?):
scores = [13, 5, 2, 3, 4, 3]
for x in scores:
print(x)
Iterating over a dictionary:
experiment = {"location": "Svalbard", "date": "2021-03-23", "num_measurements": 23}
for key in experiment:
print(experiment[key])
# another way to iterate
for (key, value) in experiment.items():
print(key, value)
Functions
Functions are like reusable recipes. They receive ingredients (input arguments), then inside the function we do/compute something with these arguments, and they return a result.
def add(a, b): result = a + b return result
Together we can write a function which sums all elements in a list (this is not the most elegant way to do this but it is easier at this point):
def add_all_elements(sequence): """ This function adds all elements. This here is a docstring, a documentation string for a function. """ s = 0.0 for element in sequence: s += element return s measurements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print(add_all_elements(measurements))
We reuse this function to write a function which computes the mean:
def arithmetic_mean(sequence): # we are reusing add_all_elements written above s = add_all_elements(sequence) n = len(sequence) return s / n measurements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] mean = arithmetic_mean(measurements) print(mean)
Functions can call other functions. Functions can also get other functions as input arguments.
Functions can return more than one thing:
def uppercase_and_lowercase(text): u = text.upper() l = text.lower() return u, l some_text = "SequenceOfCharacters" uppercased_text, lowercased_text = uppercase_and_lowercase(some_text) print(uppercased_text) print(lowercased_text)
Why functions?
Less repetition
Simplify reading and understanding code
Simpler re-use of code
Make it easier for Python to deallocate memory
Reading error messages
Here we introduce a mistake and we together try to make sense of the traceback:
Libraries
We can look at libraries as collections of functions. We can import the libraries/modules and then reuse the functions defined inside these libraries.
Try this:
import numpy
measurements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result = numpy.std(measurements)
print(result)
This means numpy
contains a function called std
which apparently computes the standard deviation
(check also its documentation).
Often you see this in tutorials (the module is imported and renamed to a shortcut):
import numpy as np
result = np.std([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
It is possible to create own modules to collect own functions for reuse.
Great resources to learn more
Real Python Tutorials (great for beginners)
The Python Tutorial (great for beginners)
The Hitchhiker’s Guide to Python! (intermediate level)
Effective Python (once you know the basics and want to write better code)
Exercises
Exercise: create a function that computes the standard deviation
Arithmetic mean:
\[\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i\]Standard deviation:
\[\sqrt{ \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2 }\]In other words the computation is similar but we need to sum over squares of differences and at the end take a square root.
Take this as a starting point:
# we have written this one together previously def arithmetic_mean(sequence): s = 0.0 for element in sequence: s += element n = len(sequence) return s / n def standard_deviation(sequence): # here we need to do some work: # mean = ? # s = ? n = len(sequence) return (s / n) ** 0.5
If this is the input list:
measurements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Then the result would be: 2.872…
Solution 1 (longer but hopefully easier to understand)
# we have written this one together previously
def arithmetic_mean(sequence):
s = 0.0
for element in sequence:
s += element
n = len(sequence)
return s / n
# notice how this function reuses the other
def standard_deviation(sequence):
mean = arithmetic_mean(sequence)
s = 0.0
for element in sequence:
s += (element - mean) ** 2
n = len(sequence)
return (s / n) ** 0.5
Solution 2 (more compact)
def arithmetic_mean(sequence):
return sum(sequence) / len(sequence)
def standard_deviation(sequence):
mean = arithmetic_mean(sequence)
s = sum([(x - mean) ** 2 for x in sequence])
n = len(sequence)
return (s / n) ** 0.5
Exercise: working with a dictionary
We have this dictionary as a starting point:
grades = {"Alice": 80, "Bob": 95}
Add the grades of few more (fictious) persons to this dictionary.
Print the entire dictionary.
What happens when you add a name which already exists (with a different grade)?
Print the grade for one particular person only.
What happens when you try to print the result for a person that wasn’t there?
Try also these:
print(grades.keys()) print(grades.values()) print(grades.items())
Solution
We can add more people like this:
grades["Craig"] = 56
grades["Dave"] = 28
grades["Eve"] = 75
Print the entire dictionary with:
print(grades)
We get:
{'Alice': 80, 'Bob': 95, 'Craig': 56, 'Dave': 28, 'Eve': 75}
Adding an entry which already exists updates the entry (please try it).
Printing the result for one particular person:
print(grades["Eve"])
Printing the result for a person which does not exists, gives a KeyError
.
The outputs of these three:
print(grades.keys())
print(grades.values())
print(grades.items())
… are either the only the keys or only the values, or in the case of items()
,
key-value pairs (tuples):
dict_keys(['Alice', 'Bob', 'Craig', 'Dave', 'Eve'])
dict_values([80, 95, 56, 28, 75])
dict_items([('Alice', 80), ('Bob', 95), ('Craig', 56), ('Dave', 28), ('Eve', 75)])
The exercises below use if-statements.
Optional exercise/ homework: removing duplicates
This list contains duplicates:
measurements = [2, 2, 1, 17, 3, 3, 2, 1, 13, 14, 17, 14, 4]
Write a function which removes duplicates from the list and sorts the list. In this case it would produce:
[1, 2, 3, 4, 13, 14, 17]
Solution 1 (longer but hopefully easier to understand)
The function sorted
sorts a sequence but it creates a new sequence.
This is useful if you need a sorted result without changing the original sequence.
We could have achieved the same result with list.sort()
.
def remove_duplicates_and_sort(sequence):
new_sequence = []
for element in sequence:
if element not in new_sequence:
new_sequence.append(element)
return sorted(new_sequence)
Solution 2 (more compact)
Converting to set removes duplicates. Then we convert back to list:
def remove_duplicates_and_sort(sequence):
new_sequence = list(set(sequence))
return sorted(new_sequence)
Optional exercise/ homework: counting how often an item appears
Back to our list with duplicates:
measurements = [2, 2, 1, 17, 3, 3, 2, 1, 13, 14, 17, 14, 4]
Your goal is to write a function which will return a dictionary mapping each number to how often it appears. In this case it would produce:
{2: 3, 1: 2, 17: 2, 3: 2, 13: 1, 14: 2, 4: 1}
Solution 1 (longer but hopefully easier to understand)
def how_often(sequence):
counts = {}
for element in sequence:
if element in counts:
counts[element] += 1
else:
counts[element] = 1
return counts
Solution 2 (more compact)
The point of this solution is to show that for such common operations, ready-made functions and objects already exist and is is worth to check out the documentation about the collections module.
from collections import Counter, defaultdict
def how_often_alternative1(sequence):
return dict(Counter(sequence))
def how_often_alternative2(sequence):
counts = defaultdict(int)
for element in sequence:
counts[element] += 1
return dict(counts)