Motivation
Objectives
Appreciate the importance of testing software
Understand various benefits of testing
Untested software can be compared to uncalibrated detectors
“Before relying on a new experimental device, an experimental scientist always establishes its accuracy. A new detector is calibrated when the scientist observes its responses to known input signals. The results of this calibration are compared against the expected response.”
[From Testing and Continuous Integration with Python, created by K. Huff]
With testing, simulations and analysis using software can be held to the same standards as experimental measurement devices!
What can go wrong when research software has bugs? Look no further:
Testing in a nutshell
In software tests, expected results are compared with observed results in order to establish accuracy. Why are we not comparing directly all digits with the expected result?:
def fahrenheit_to_celsius(temp_f):
"""Converts temperature in Fahrenheit
to Celsius.
"""
temp_c = (temp_f - 32.0) * (5.0/9.0)
return temp_c
# This is the test function: `assert` raises an error if something
# is wrong.
def test_fahrenheit_to_celsius():
temp_c = fahrenheit_to_celsius(temp_f=100.0)
expected_result = 37.777777
assert abs(temp_c - expected_result) < 1.0e-6
#include <cmath> // std::abs
#include <cstdlib>
#include <iostream>
using namespace std;
/* Converts temperature in Fahrenheit to Celsius. */
double fahrenheit_to_celsius(double temp_f) {
auto temp_c = (temp_f - 32.0) * (5.0 / 9.0);
return temp_c;
}
/* This is the test function: `throws` raises an error if something is wrong. */
void test_fahrenheit_to_celsius() {
auto temp_c = fahrenheit_to_celsius(100.0);
auto expected_result = 37.777777;
try {
if (abs(temp_c - expected_result) > 1.0e-6) throw "Error";
} catch (char const* err) {
cout << err;
}
}
int main() {
cout << fahrenheit_to_celsius(20);
test_fahrenheit_to_celsius();
return EXIT_SUCCESS;
}
# Converts temperature in Fahrenheit to Celsius.
fahrenheit_to_celsius <- function(temp_f)
{
temp_c <- (temp_f - 32.0) * (5.0/9.0)
temp_c
}
# This is the test function: `assertive::is_true` raises an error if something
# is wrong.
test_fahrenheit_to_celsius <- function()
{
temp_c <- fahrenheit_to_celsius(temp_f = 100.0)
expected_result <- 37.777777
assertive::is_true(abs(temp_c - expected_result) < 1.0e-6)
}
using Test
"""
fahrenheit_to_celsius(temp_f::Float)
Converts temperature in Fahrenheit to Celsius.
"""
function fahrenheit_to_celsius(temp_f)
temp_c = (temp_f - 32.0) * (5.0/9.0)
return temp_c
end
# This is the test section
@testset "Test fahrenheit_to_celsius" begin
temp_c = fahrenheit_to_celsius(100.0)
expected_result = 37.777777
@test abs(temp_c - expected_result) < 1.0e-6
end
program temperature_conversion
implicit none
call test_fahrenheit_to_celsius()
contains
function fahrenheit_to_celsius(temp_f) result(temp_c)
implicit none
real temp_f
real temp_c
temp_c = (temp_f - 32.0) * (5.0/9.0)
end function fahrenheit_to_celsius
subroutine test_fahrenheit_to_celsius()
implicit none
real temp_c
real expected_result
temp_c = fahrenheit_to_celsius(100.0)
expected_result = 37.777777
if( abs(temp_c - expected_result) > 1.0e-6) then
write(*,*) 'Error'
else
write(*,*) 'Pass'
end if
end subroutine test_fahrenheit_to_celsius
end program temperature_conversion
Or you can test whole programs:
$ python3 run-test.py
running: sample_data/set1.csv --output=tests/set1.txt
CORRECT
What can tests help you do?
Preserving expected functionality
Check old things when you add new ones
Help users of your code
Verify it’s installed correctly and works.
See examples of what it should do.
Help other developers modify it
Change things with confidence that nothing is breaking.
Warning if documentation/examples go out of date.
Manage complexity
If code is easy to test, it’s probably easier to maintain.
The next lesson Modular code development demonstrates this.
Discussion: When is it OK not to add tests?
Discussion: When is it OK not to add tests?
Vote in the notes and we’ll discuss soon. It is always a balance: there is no “always”/”never”.
Jupyter or R Markdown notebook which produces a plot and you know by looking at the plot whether it worked?
A short, “obviously correct” Python or R script which you never intend to reuse?
A simple short, “obviously correct” shell script?
Can you give other examples?
Discussion: What’s easy and hard to test?
Discussion: Testing in practice
Use the collaborative notes to answer these questions:
Give examples of things (from your work) that are easy to test.
Give examples of things (from your work) that are hard to test.
Types of tests
Test functions one at a time - Unit tests
Test how parts work together - Integration tests
Test the whole thing running - End-to-end tests
For example, running on sample data.
Check same results as before - Regression tests
Write test first (the output), then write code to make test pass - Test-driven development
GitHub or GitLab runs tests automatically - Continuous integration
Report that tells you which lines were/were not run by tests - Code coverage
Framework that runs test for you - Testing framework
See Quick Reference for some examples.
What should you do?
Not every code needs perfect test coverage.
If code is interactive-only (Jupyter Notebook), it’s usually hard to test.
But also hard to run: the next lesson will discuss!
At least end-to-end is often easy to add.
Add tests of tricky functions.
If you’d have to run it over and over to test while writing, why not make it a property test?
It’s easy to have Gitlab/Github run the tests.
It’s nice to push without thinking, and the system tells you when it’s broke.
Learning how to test well make the rest of your code better, too.
Where to start
A simple script or notebook probably does not need an automated test.
If you have nothing yet
Start with an end-to-end test.
Describe in words how you check whether the code still works.
Translate the words into a script.
Run the script automatically on every code change.
If you want to start with unit-testing
You want to rewrite a function? Start adding a unit test right there first.