Social coding

Objectives

  • Get familiar with terminology around licensing.

  • Practical advice for software licensing and citations

These materials are adopted from CodeRefinery lesson Social Coding. This is not legal advice! Please always consult your organizations legal team when in doubt.

Relevance for data stewards

You may not be writing or releasing software yourself, but you are often asked to:

  • Advise on licensing choices

  • Interpret reuse conditions

  • Assess whether software outputs can be shared

  • Help make software citable and compliant with policy.

Social coding is as much about people, rules, and expectations as it is about tools.

Comparing sharing papers and sharing code

Image shows that we are motivated sharing our published papers since we get rewarded with academic credit in form of citations

Citation as one form of academic credit to motivate sharing papers.

Sharing papers and academic credit:

  • The goal is maximum visibility and maximum reuse.

  • The more interesting science is done referencing my paper, the better for me.

  • Nobody actively tries to limit the reach of their papers.

Getting improvements back and also getting citations can motivate us to share code

Different ways we can benefit from sharing code.

Sharing code:

  • “I did all the ground work and they get to do the interesting science?”

  • Sharing code and encouraging derivative work may boost your academic impact.

  • But will your work be visible if it is used two levels deep down?

Journal policies as motivation for sharing

From Science editorial policy:

“We require that all computer code used for modeling and/or data analysis that is not commercially available be deposited in a publicly accessible repository upon publication. In rare exceptional cases where security concerns or competing commercial interests pose a conflict, code-sharing arrangements that still facilitate reproduction of the work should be discussed with your Editor no later than the revision stage.”

From Nature editorial policy:

“An inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims. A condition of publication in a Nature Research journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript.”

These policies may surface during data stewardship consultations, not just at submission time.

However a study showed that despite these policies, many people still do not share their code 😞. This paper includes samples of charming author responses such as:

“When you approach a PI for the source codes and raw data, you better explain who you are, whom you work for, why you need the data and what you are going to do with it.”

Motivation for open source software

  • Enable derivative work

  • Do not lock yourself out of own code

  • Attract developers who want to be able to show the coding work on their CVs

  • Tightly regulated domains require open source

  • Open-source software (OSS) can lead to more engagement from industry which may lead to more impact

  • If it’s not open, it is not likely to become standard

Sharing software is also scary

  • Fear of being scooped: A license can avoid it, and you can release when you are ready. Anyway, it is very unlikely that others will understand your code and publish before you without involving you in a collaboration. Sharing is a form of publishing.

  • Exposes possibly “ugly code”: In practice almost nobody will judge the quality of your code. “Software, once written, is never really finished” (N. Asparouhova).

  • Others may find bugs and mistakes: Isn’t this good? Would you not like to use a code which gives people the chance to locate bugs? If you don’t release, people will assume there are bugs anyway.

  • Others may require support and ask too many questions: This can become a problem: use tools and community and protect your time. You aren’t required to support anyone. You can also “archive” a repository to disable most forms of interaction (issues, PRs). Also a note in README on support level helps.

  • Fear of losing control over the direction of the project: Open source does not mean everybody can change your version.

  • “Bad” derivative projects may appear: It will be clear which is the official version.

Code reusability: What contributes to reusability?

What contributes to you being able to reuse stuff that others make, and others (or you) being able to reuse your stuff? When you find a repository with code you would like to reuse, you may look at the following things to determine its reusability:

  • Date of last code change: Is the project abandoned?

  • Release history: How about stability and backwards-compatibility?

  • Versioning: Will it be painful to upgrade?

  • Number of open pull requests and issues: Are they followed-up?

  • Installation instructions: Will it be difficult to get it running?

  • Example: Will it be difficult to get started?

  • License: Am I allowed to use it?

  • Contribution guide: How to contribute and decision process?

  • Code of conduct: How to make clear which behaviors are unacceptable and discouraged? How violations of Code of conduct will be handled?

  • Trust and community: Somebody you trust recommended it?

… most of which you can learn in the CodeRefinery workshop!

How our work connects to the work of others

Our work depends on ideas, articles, data, and software

Whether and what we can share depends on how we obtained the components.

  • Our work depends on outputs from others. Research of others depends on our outputs.

  • Whether you can share your output depends on how you obtained your input.

  • A repository that is private today might become public one day.

  • Sometimes “OTHERS” are you yourself in the future in a different group/job.

  • Our code often starts by changing other code: derivative work.

  • Software licenses matter. And this is what we will discuss in the next episode.

Taxonomy of software licenses

“Software licensing and open source explained with cakes”.

European Union Public Licence (EUPL): guidelines July 2021

European Commission, Directorate-General for Informatics, Schmitz, P., European Union Public Licence (EUPL): guidelines July 2021, Publications Office, 2021, https://data.europa.eu/doi/10.2799/77160

Comments:

  • Arrows represent compatibility (A -> B: B can reuse A)

  • Proprietary/custom: Derivative work typically not possible (no arrow goes from proprietary to open)

  • Permissive: Derivative work does not have to be shared

  • Copyleft/reciprocal: Derivative work must be made available under the same license terms

  • NC (non-commercial) and ND (non-derivative) exist for data licenses but not really for software licenses

Great resource for comparing software licenses: Joinup Licensing Assistant

As we do here, data stewards should be careful not to give legal advice, but can:

  • Explain common licenses

  • Highlight compatibility issues

  • Point researchers to institutional or legal support when needed.

Software citation

Questions

  • Is putting software on GitHub/GitLab/… publishing?

  • Where to publish software?

  • How can software be cited?

  • How can I make my software citeable?

Is putting software on GitHub/GitLab/… publishing?

FAIR principles

FAIR principles. (c) Scriberia for The Turing Way, CC-BY.

Is it enough to make the code public for the code to remain findable and accessible?

  • No. Because nothing prevents me from deleting my GitHub repository or rewriting the Git history and we have no guarantee that GitHub will still be around in 10 years.

  • Make your code citable and persistent: Get a persistent identifier (PID) such as DOI in addition to sharing the code publicly, by using services like Zenodo or similar services.

How to make your software citable

Checklist for making a release of your software citable:

  • Assigned an appropriate license

  • Described the software using an appropriate metadata format

  • Clear version number

  • Authors credited

  • Procured a persistent identifier

  • Added a recommended citation to the software documentation

This checklist is adapted from: N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran, et al., Software Citation Checklist for Developers (Version 0.9.0). Zenodo. 2019b. (DOI)

Our practical recommendations:

This is an example of a simple CITATION.cff file:

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Doe
    given-names: Jane
    orcid: https://orcid.org/1234-5678-9101-1121
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11

More about CITATION.cff files:

Papers with focus on scientific software

Where can I publish papers which are primarily focused on my scientific software? Great list/summary is provided in this blog post: “In which journals should I publish my software?” (Neil P. Chue Hong)

How to cite software

Great resources

  • A. M. Smith, D. S. Katz, K. E. Niemeyer, and FORCE11 Software Citation Working Group, “Software citation principles,” PeerJ Comput. Sci., vol. 2, no. e86, 2016 (DOI)

  • D. S. Katz, N. P. Chue Hong, T. Clark, et al., Recognizing the value of software: a software citation guide [version 2; peer review: 2 approved]. F1000Research 2021, 9:1257 (DOI)

  • N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran, et al., Software Citation Checklist for Authors (Version 0.9.0). Zenodo. 2019a. (DOI)

  • N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran, et al., Software Citation Checklist for Developers (Version 0.9.0). Zenodo. 2019b. (DOI)

Recommended format for software citation is to ensure the following information is provided as part of the reference (from Katz, Chue Hong, Clark, 2021 which also contains software citation examples):

  • Creator

  • Title

  • Publication venue

  • Date

  • Identifier

  • Version

  • Type

FAIR for Research Software (FAIR4RS)

The FAIR4RS Principles adapt the original FAIR data principles to improve the Findability, Accessibility, Interoperability, and Reusability of research software. They provide guidelines to make research software more open, discoverable, citable, and reusable, which supports reproducibility and open science. The summary of the FAIR4RS below is a compressed version of the principles listed in Barker, Michelle, et al. “Introducing the FAIR Principles for research software.” Scientific Data 9.1 (2022): 622.

  • Findable: Software should have a globally unique and persistent identifier
    (e.g. clear versioning of releases, DOI from Zenodo when depositing a software release, software metadata in citation file).

  • Accessible: Software and metadata should be retrievable using open protocols
    (e.g. downloading from GitHub or Zenodo).

  • Interoperable: Software should use open formats and standards to work with other tools
    (e.g. input/output files in CSV, dependencies listed in requirements.txt or environment.yml, configuration files in yaml).

  • Reusable: Software should have clear licensing, documentation, and provenance
    (e.g. a standard license and a README with usage instructions, authors listed with ORCIDs).

There are great resources to self-evaluate the FAIRness of your research software:

  • FAIR Software Checklist which provides a questionnaire and even a badge of how FAIR the software is

  • FAIR Software NL highlights with nice visuals the five most important elemnts of FAIR4RS (1. Public accessible repository with version control; 2. License; 3. Software registry; 4. Software citation file; 5. Software quality checklist)

Keypoints

  • You cannot ignore licensing: default is “no one can make copies or derivative works”.

  • Citation.cff files can make it easier for others to cite software and provide credit.