Concepts in refactoring and modular code design

Starting questions for the collaborative document

What does “modular code development” mean for you?
What best practices can you recommend to arrive at well structured, modular code in your favourite programming language?
What do you know now about programming that you wish somebody told you earlier?
Do you design a new code project on paper before coding? Discuss pros and cons.
Do you build your code top-down (starting from the big picture) or bottom-up (starting from components)? Discuss pros and cons.
Would you prefer your code to be 2x slower if it was easier to read and understand?

Pure functions

Pure functions have no notion of state: They take input values and return values
Given the same input, a pure function always returns the same value
Function calls can be optimized away
Pure function == data

a) pure: no side effects

def fahrenheit_to_celsius(temp_f):
    temp_c = (temp_f - 32.0) * (5.0/9.0)
    return temp_c

temp_c = fahrenheit_to_celsius(temp_f=100.0)
print(temp_c)

b) stateful: side effects

f_to_c_offset = 32.0
f_to_c_factor = 0.555555555
temp_c = 0.0

def fahrenheit_to_celsius_bad(temp_f):
    global temp_c
    temp_c = (temp_f - f_to_c_offset) * f_to_c_factor

fahrenheit_to_celsius_bad(temp_f=100.0)
print(temp_c)

Pure functions are easier to:

Test
Understand
Reuse
Parallelize
Simplify
Optimize
Compose

Mathematical functions are pure:

\[f(x, y) = x - x^2 + x^3 + y^2 + xy\]

\[(f \circ g)(x) = f(g(x))\]

Unix shell commands are stateless:

$ cat somefile | grep somestring | sort | uniq | ...

But I/O and network and disk and databases are not pure!

I/O is impure
Keep I/O on the “outside” of your code
Keep the “inside” of your code pure/stateless

Image comparing code that is mostly impure to code where the impure parts are on the outside

From classes to functions

Object-oriented programming and functional programming both have their place and value.

Here is an example of expressing the same thing in Python in 4 different ways. Which one do you prefer?

As a class:

import math


class Moon:
    def __init__(self, name, radius, contains_water=False):
        self.name = name
        self.radius = radius  # in kilometers
        self.contains_water = contains_water

    def surface_area(self) -> float:
        """Calculate the surface area of the moon assuming a spherical shape."""
        return 4.0 * math.pi * self.radius**2

    def __repr__(self):
        return f"Moon(name={self.name!r}, radius={self.radius}, contains_water={self.contains_water})"


europa = Moon(name="Europa", radius=1560.8, contains_water=True)

print(europa)
print(f"Surface area (km^2) of {europa.name}: {europa.surface_area()}")

As a dataclass:

from dataclasses import dataclass
import math


@dataclass
class Moon:
    name: str
    radius: float  # in kilometers
    contains_water: bool = False

    def surface_area(self) -> float:
        """Calculate the surface area of the moon assuming a spherical shape."""
        return 4.0 * math.pi * self.radius**2


europa = Moon(name="Europa", radius=1560.8, contains_water=True)

print(europa)
print(f"Surface area (km^2) of {europa.name}: {europa.surface_area()}")

As a named tuple:

import math
from collections import namedtuple

def surface_area(radius: float) -> float:
    return 4.0 * math.pi * radius**2

Moon = namedtuple("Moon", ["name", "radius", "contains_water"])

europa = Moon(name="Europa", radius=1560.8, contains_water=True)

print(europa)
print(f"Surface area (km^2) of {europa.name}: {surface_area(europa.radius)}")

As a dict:

import math

def surface_area(radius: float) -> float:
    return 4.0 * math.pi * radius**2

europa = {"name": "Europa", "radius": 1560.8, "contains_water": True}

print(europa)
print(f"Surface area (km^2) of {europa['name']}: {surface_area(europa['radius'])}")

How to design your code before writing it

Document-driven development can be a nice approach:
- Write the documentation/tutorial first
- Write the code to make the documentation true
- Refactor the code to make it clean and maintainable
But also it’s almost impossible to design everything correctly from the start -> make it easy to change -> keep it simple