How to document your research software

The lesson

  • In-code documentation
  • Writing good README files
  • Sphinx and Markdown
  • Deploying Sphinx documentation to GitHub Pages
  • Popular tools and solutions
    • In-code documentation
    • README files
    • Plain Text formats: reStructuredText and Markdown
    • HTML static site generators
    • Hosting Documentation on the Web
    • Wikis
    • LaTeX/PDF
  • Motivation and wishlist
  • Summary

Supplementary material

  • Hosting websites/homepages on GitHub Pages

Reference

  • Shell crash course
  • List of exercises
  • Instructor guide
  • Credit and license

About

  • All lessons
  • CodeRefinery
  • Reusing
How to document your research software
  • Popular tools and solutions
  • Edit on GitHub

Popular tools and solutions

Questions

  • What tools are out there?

  • What are their pros and cons?

Objectives

  • Choose the right tool for the right reason.


In-code documentation

  • Comments, function docstrings, …

  • Advantages

    • Good for programmers

    • Version controlled alongside code

    • Can be used to auto-generate documentation for functions/classes

  • Disadvantage

    • Probably not enough for users of the code

For a closer look at this see the In-code documentation episode.


README files

  • Advantages

    • Versioned (goes with the code development)

    • It is often good enough to have a README.md or README.rst along with your code/script

  • If you use README files, use either RST or Markdown

  • A great guide to README files: MakeaREADME

For a closer look at this see the Writing good README files episode.


Plain Text formats: reStructuredText and Markdown

# This is a section in Markdown   This is a section in RST
                                  ========================

## This is a subsection           This is a subsection
                                  --------------------

Nothing special needed for        Nothing special needed for
a normal paragraph.               a normal paragraph.

                                  ::

    This is a code block          This is a code block


**Bold** and *emphasized*.        **Bold** and *emphasized*.

A list:                           A list:
- this is an item                 - this is an item
- another item                    - another item

There is more: images,            There is more: images,
tables, links, ...                tables, links, ...
  • Two of the most popular lightweight markup languages.

  • reStructuredText (RST) has more features than Markdown but the choice is a matter of taste.

  • There are (unfortunately) many flavors of Markdown.

  • Motivation to stick to a standard text-based format: They make it easier to move the documentation to other tools which also expect a standard format, as the project/organization grows.

  • We use MyST flavored Markdown in the Sphinx and Markdown episode and the Hosting websites/homepages on GitHub Pages example.

  • Nice resource to learn Markdown: Learn Markdown in 60 seconds

  • Pandoc can convert between MD and RST (and many other formats).


HTML static site generators

There are many tools generate documentation that can be viewed locally, or hosted on the web.

Here are some HTML static site generators, relevant in our communities. These tools offer some or all of these features:

  • API Reference generation: source code is read, scan for docstrings and render them

  • Search: they offer a “whole site” search feature (non trivial, when viewing only one page). (if you can download )

  • Validation: check that the code snipped in the documentation match the real behaviour of the code.

  • Continuous checks: regenerate automatically every time you save, so that you can catch errors early

  • Sphinx ← this is how this lesson material is built

    • Generate HTML/PDF/LaTeX from RST and Markdown (MyST)

    • Basically all Python projects use Sphinx but Sphinx is not limited to Python.

    • Read the docs hosts public Sphinx documentation for free!

    • API Reference generation: via autodoc or autoapi

    • Search:

      • limited, keyword-based client-side (Javascript that runs in browser)

      • Full-text server-side on Read the docs

    • Validation: via doctest

  • MkDocs: A Markdown-first static site generator.

    • API Reference generation: via mkdocstrings

    • Search: search plugin for client-side (Javascript that runs in the browser - lunr.js)

  • Doxygen:

    • API Reference generation: has also support for Python

  • pkgdown

    • API Reference generation: via roxygen2 and Rdconv

    • Uses RMarkdown and a LaTeX-like syntax

    • Search:

      • client-side (Javascript that runs in browser - fuse.js)

      • also typically available in RStudio

    Long-Form Documentation for R is typically contained in vignettes.

  • Doxygen

    • API Reference generation out of the box, generates static call graph

    • Focus on Documentation directly in the source code

    • MarkDown-like syntax, with its own flavour and special commands

    • Search:

      • limited keyword-based client-side

      • full text search server-side

  • Sphinx can be also used to generate documentation for C++ projects, using the XML output from Doxygen via Breathe

  • Doxygen:

    • API Reference generation out of the box, generates static call graph (but has limited Fortran parsing capabilities)

    • Focus on Documentation directly in the source code

    • MarkDown-like syntax, with its own flavour and special commands

    • Search:

      • limited keyword-based client-side

      • full text search server-side

  • FORD

    • Python-based

    • Search: client-side (Javascript that runs in the browser - lunr.js)

  • Documenter.jl

    • Using MarkDown (JuliaMarkdown flavour)

    • Parses Julia code and in-code documentation/docstrings

    • Search: client-side (but typically the whole site is loaded for search on every page)

    • Validation: runs the code and checks

RustDoc

  • Uses MarkDown (CommonMark flavour)

  • Search: client-side (Javascript that runs in the browser - elasticlunr.js)

  • Validation: validates code examples when run with --test

These are general-purpose static website generators that match the philosophy of the other tools presented so far, but might be better suited for blogging, reports or other kinds of publications:

  • Hugo

  • Hexo

  • Zola ← this is what we use for our project website and workshop websites

  • Jekyll, default for GitHub pages

  • Franklin.jl: focuses on technical blogging for the Julia community

  • Quarto converts markdown to websites, pdfs, ebooks and many other things (dynamic notebook-based documents)

Discussion

Do you know an awesome tool or feature that should be in this list? Let us know! (Open a PR)

Hosting Documentation on the Web

GitHub, GitLab, and Bitbucket make it possible to serve HTML pages:

  • GitHub Pages

  • Bitbucket Pages

  • GitLab Pages

Read The Docs is also free to use for open source code, and can be connected to common software forges.


Wikis

  • Popular solutions (but many others exist):

    • MediaWiki

    • Dokuwiki

  • Advantage

    • Barrier to write and edit is low

  • Disadvantages

    • Typically disconnected from source code repository (reproducibility)

    • Difficult to serve multiple versions

    • Difficult to check out a specific old version

    • Typically needs to be hosted and maintained


LaTeX/PDF

  • Advantage

    • Popular and familiar in the physics and mathematics community

  • Disadvantages

    • PDF format is not ideal for copy-pasting of examples

    • Possible, but not trivial to automate rebuilding documentation after every Git push


Keypoints

  • Some popular solutions make reproducibility and maintenance of multiple code versions difficult.

Previous Next

© Copyright CodeRefinery contributors.

Built with Sphinx using a theme provided by Read the Docs.