Sharing code and data

Questions

  • How can I share research code and data?

Objectives

  • Discuss the pros and cons of open science

  • Learn how to mint a DOI for your project

Instructor note

  • 10 min teaching

  • 15 min exercises


Sharing research data

The Open Science movement encourages researchers to share research output beyond the contents of a published academic article (and possibly supplementary information).

Research comic

Arguments in favor (from Wikipedia):

  • Open access publication of research reports and data allows for rigorous peer-review

  • Science is publicly funded so all results of the research should be publicly available

  • Open Science will make science more reproducible and transparent

  • Open Science has more impact

  • Open Science will help answer uniquely complex questions

Arguments against (from Wikipedia):

  • Too much unsorted information overwhelms scientists

  • Potential misuse

  • The public will misunderstand science data

  • Increasing the scale of science will make verification of any discovery more difficult

  • Low-quality science

Research comic

(This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. The image was obtained from https://zenodo.org/record/3332808.)


FAIR principles

Research comic

(This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. The image was obtained from https://zenodo.org/record/3332808.)

“FAIR” is the current buzzword for data management. You may be asked about it in, for example, making data management plans for grants:

  • Findable

    • Will anyone else know that your data exists?

    • Solutions: put it in a standard repository, or at least a description of the data. Get a digital object identifier (DOI).

  • Accessible

    • Once someone knows that the data exists, can they get it?

    • Usually solved by being in a repository, but for non-open data, may require more procedures.

  • Interoperable

    • Is your data in a format that can be used by others, like csv instead of PDF?

    • Or better than csv. Example: 5-star open data

  • Reusable

    • Is there a license allowing others to re-use?

Even though this is usually referred to as “open data”, it means considering and making good decisions, even if non-open.

FAIR principles are usually discussed in the context of data, but they apply also for research software.

Note that FAIR principles do not require data/software to be open.

Discuss open science

  • Do you share any other research outputs besides published articles and possibly source code?

  • Discuss pros and cons of sharing research data.


Exercise - Connecting repositories to Zenodo

Sharing-1: Get a DOI from Zenodo

Digital object identifiers (DOI) are the backbone of the academic reference and metrics system. In this exercise we will see how to make a GitHub repository citable by archiving it on the Zenodo archiving service. Zenodo is a general-purpose open access repository created by OpenAIRE and CERN.

  1. Sign in to Zenodo using your GitHub account. For this exercise, use the sandbox service: https://sandbox.zenodo.org/login/. This is a test version of the real Zenodo platform.

  2. Go to https://sandbox.zenodo.org/account/settings/github/.

  3. Find the repository you wish to publish, and flip the switch to ON.

  4. Go to GitHub and create a release by clicking the Create a new release on the right-hand side (a release is based on a Git tag, but is a higher-level GitHub feature). You will need to enter a tag name (e.g. v0.1) in the “Choose a tag” box.

  5. Creating a new release will trigger Zenodo into archiving your repository, and a DOI badge will be displayed next to your repository after a minute or two. You can include it in your GitHub README file: click the DOI badge and copy the relevant format (Markdown, RST, HTML).


Services for sharing and collaborating on research data

To find a research data repository for your data, you can search on the Registry of Research Data Repositories (re3data) platform and filter by country, content type, discipline, etc.

International:

  • Zenodo: A general-purpose open access repository created by OpenAIRE and CERN. Integration with GitHub, allows researchers to upload files up to 50 GB.

  • Figshare: Online digital repository where researchers can preserve and share their research outputs (figures, datasets, images and videos). Users can make all of their research outputs available in a citable, shareable and discoverable manner.

  • EUDAT: European platform for researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment.

  • Dryad: A general-purpose home for a wide diversity of datatypes, governed by a nonprofit membership organization. A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.

  • The Open Science Framework: Gives free accounts for collaboration around files and other research artifacts. Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.

Sweden:

Norway:

Denmark:

Finland:


Resources for data management


Further reading on reproducibility and open science

Keypoints

  • Consider sharing other research outputs than articles.