Concepts in refactoring and modular code design
Starting questions for the collaborative document
What does “modular code development” mean for you?
What best practices can you recommend to arrive at well structured, modular code in your favourite programming language?
What do you know now about programming that you wish somebody told you earlier?
Do you design a new code project on paper before coding? Discuss pros and cons.
Do you build your code top-down (starting from the big picture) or bottom-up (starting from components)? Discuss pros and cons.
Would you prefer your code to be 2x slower if it was easier to read and understand?
Pure functions
Pure functions have no notion of state: They take input values and return values
Given the same input, a pure function always returns the same value
Function calls can be optimized away
Pure function == data
a) pure: no side effects
def fahrenheit_to_celsius(temp_f):
temp_c = (temp_f - 32.0) * (5.0/9.0)
return temp_c
temp_c = fahrenheit_to_celsius(temp_f=100.0)
print(temp_c)
b) stateful: side effects
f_to_c_offset = 32.0
f_to_c_factor = 0.555555555
temp_c = 0.0
def fahrenheit_to_celsius_bad(temp_f):
global temp_c
temp_c = (temp_f - f_to_c_offset) * f_to_c_factor
fahrenheit_to_celsius_bad(temp_f=100.0)
print(temp_c)
Pure functions are easier to:
Test
Understand
Reuse
Parallelize
Simplify
Optimize
Compose
Mathematical functions are pure:
Unix shell commands are stateless:
$ cat somefile | grep somestring | sort | uniq | ...
But I/O and network and disk and databases are not pure!
I/O is impure
Keep I/O on the “outside” of your code
Keep the “inside” of your code pure/stateless
From classes to functions
Object-oriented programming and functional programming both have their place and value.
Here is an example of expressing the same thing in Python in 4 different ways. Which one do you prefer?
As a class:
import math class Moon: def __init__(self, name, radius, contains_water=False): self.name = name self.radius = radius # in kilometers self.contains_water = contains_water def surface_area(self) -> float: """Calculate the surface area of the moon assuming a spherical shape.""" return 4.0 * math.pi * self.radius**2 def __repr__(self): return f"Moon(name={self.name!r}, radius={self.radius}, contains_water={self.contains_water})" europa = Moon(name="Europa", radius=1560.8, contains_water=True) print(europa) print(f"Surface area (km^2) of {europa.name}: {europa.surface_area()}")
As a dataclass:
from dataclasses import dataclass import math @dataclass class Moon: name: str radius: float # in kilometers contains_water: bool = False def surface_area(self) -> float: """Calculate the surface area of the moon assuming a spherical shape.""" return 4.0 * math.pi * self.radius**2 europa = Moon(name="Europa", radius=1560.8, contains_water=True) print(europa) print(f"Surface area (km^2) of {europa.name}: {europa.surface_area()}")
As a named tuple:
import math from collections import namedtuple def surface_area(radius: float) -> float: return 4.0 * math.pi * radius**2 Moon = namedtuple("Moon", ["name", "radius", "contains_water"]) europa = Moon(name="Europa", radius=1560.8, contains_water=True) print(europa) print(f"Surface area (km^2) of {europa.name}: {surface_area(europa.radius)}")
As a dict:
import math def surface_area(radius: float) -> float: return 4.0 * math.pi * radius**2 europa = {"name": "Europa", "radius": 1560.8, "contains_water": True} print(europa) print(f"Surface area (km^2) of {europa['name']}: {surface_area(europa['radius'])}")
How to design your code before writing it
Document-driven development can be a nice approach:
Write the documentation/tutorial first
Write the code to make the documentation true
Refactor the code to make it clean and maintainable
But also it’s almost impossible to design everything correctly from the start -> make it easy to change -> keep it simple