Reproducible research

Computer programs are expected to produce the same output for the same inputs. Is that true for research software? Can you give some examples? What can we do about it?

Word-count example

Let’s look at an example project which follows the project structure guidelines given above.

Since we’ll continue working with this repo, import it to your GitHub namespace by clicking “Use this template”. This generates a fresh repository from a template.

This project is about counting the frequency distribution of words in a given text, plotting results and testing Zipf’s law. We have subdirectories for raw data, source files, documentation, processsed data and results, and README and LICENSE files.

What are the requirements.txt, Dockerfile, and Snakefile files for? Do you think this project is reproducible?

Research Software Problems:

What can we do about it?

Compare requirements.txt:





E: What problems do you forsee when you write down minimal version constraints like scipy>=1.0?

Related questions:

Exercise: snakemake

20 minutes, until xx:37 https://coderefinery.github.io/reproducible-research/05-workflow-management/#exercise-using-snakemake

Fair software

How to make your software reproducible

What are the other alternatives for Zenodo?


Social coding and open software

you can find the slides here: https://cicero.xyz/v3/remark/0.14.0/github.com/coderefinery/social-coding/master/talk.md/#1

Snakemake demonstration

