Questions and notes from workshop day 5 (Social Coding)

How is the weather today at your location?

Have you shared some of your code with colleagues? Or has a colleague shared code with you? How did they share it?

Social coding

https://coderefinery.github.io/social-coding/social-coding/

Question 1: Why would I want to share my scripts/code/data?

Choose many. Vote by adding an o character:

Question 2: The most concerning thing for me, If I share my software now

Choose one. Vote by adding an o character:

Question 3: Why is software often treated differently from papers?

Free-form answers:

Question 4: When you find a repository with code/library you would like to reuse, what are the things you look at to decide whether you use it?

Free-form answers:

  1. Do reviewers ever check the code? Does anyone in academic have time to do that?

    • As a reviewer I do check it. And there are even good tools to anonymise the code when peer review requires anonymity (I will post the link later)
      • How about others? Is it most of them do, or most of them don't?
  2. I find it frustrating when someone has done part of the analysis in paid-for software and the remainder in open source, so the full analysis is not possible to replicate. +2

    • Indeed. Some tools allow for a "runtime" tool to run code written in their proprietary language. But often it is just "private"
  3. Are the codes checked for plagiarism?

    • We will get to that :)
  4. Are there tools that will review your code or suggest better formatting (e.g. markdown) aside from AI platforms

    • There are for example linters, tool that review code violations etc, not exactly the type of refactoring that an AI model might suggest https://aaltoscicomp.github.io/python-for-scicomp/productivity/
  5. on the importance of licenses: IIRC, in order to finally publish this code https://www.4c-multiphysics.org/history/ the legal department of a universityr had to chase up every contributor since the beginning of time so that a license could be agreed on by everybody. Is this a common problem?

    • I don't know how common it is, but some research groups have been developing code for years with multiple contributors, so I guess it would be a similar case if the code would suddenly need to become a product
  6. Are there ways to check that the code I am using from someone else (given the licenses allow) is original and not plagiarized from another source?

    • There are some code plagiarism checkers, but as far as I know they are only proprietary like: https://codequiry.com/
  7. Are you devoted to explaining your old codes to newly interested people? For how long would you do it on aregular basis?

    • I guess nobody should be required to explain their old code, it becomes more of an ethical rather than legal matter. As researchers we want to be transparent and if someone is doubting our implementation we should at least clarify their doubts.
  8. What if I find inspiration from someone's code but don't exactly use the same? +1

    • It depends. :) In a way the "clean room" design can be that (getting inspiration on how a certain function takes some inputs and produces some outputs, but not necessarily writing the content in the same way)
  9. What's the difference, from the legal point of view, between taking someone's else code and using it (with 'using', 'import', 'include' or link time) and actually modifying it for your own purposes or copy/pasting it into your codebase?

    • If I use an external code (proprietary or not) in my code, all I need to do ethically is to tell others how they can reproduce/import/link the external code (e.g. build a conda environment so that you can import the same dependencies I have). In the moment that somebody else's code becomes part of my code (in a subfolder or even with bits of code inside my own code) then I need to consider that their license is compatible with my license for such derivative work.
  10. Now that I've learned how to use GitHub from this course- If I've previously published a paper, how do I go about linking the paper to the newly available code (previously in supplemental info)? +1

    • You should publish it in a repository (like zenodo or institutional). Many repositories have integration with Github. Then you can get a DOI and licence. The metadata in the repository can contain the link to the paper. If the paper is already published it's difficult to link from the paper if the publisher often wont allow modifications of the paper.
    • I think it would be useful to have an overview of the different repositories? I thought GitHub was the repository... what's the difference between using that vs zenodo vs OSF (and probably many more).
    • overview of repositories: https://www.re3data.org/ but I think very code-specific repositories exist. A data repository like Zenodo will provide a DOI and mandatory metadata that makes your research output (more) FAIR. A repository should treat your published code with same value as paper. So you can't just delete it. If you change it, the DOI is versioned to be able to track individual versions.
    • Ok thanks. It really feels like there are way too many options... quite overwhelming to look at that website as a notice and attempt to decide which is the best to use. There seems like theres always more layers!
    • I totally understand. Ask your academic library for advice or try to find the repositories used in your discipline. If you have an institutional repository, use that one.
  11. I've seen citation options on OSF, for code that is linked to a paper. Is it better to cite the code directly rather than the paper?

    • I think it's field dependent, but in general citations to papers are the "metric" for academic careers, so researchers tend to maximise those rather than citations to code (or to data).
  12. I am very nervous about people seeing my messy code. How do I find the sweet spot between sharing my code and hiding my mess?

    • Start from day 1 with non-messy code by having lots of comments and structure. Pretend they are already reading your code.

Software Licensing

Link to the material: https://coderefinery.github.io/social-coding/software-licensing/

Question 5: Which of these are derivative works?

Choose many. Vote by adding an o character:

  1. Comment/question: It confuses me in the figure on licenses that the two horizontal arrows have different colours. Does the colour carry any meaning? I missed it if that was explained.

    • Yes red is the proprietary licenses (basically copyright/IPR/patents) and the green the spectrum of open source licenses. Public domain (zero license) is at the bottom in grey.
    • Hmm right I mean the arrows pointing across
      • Yes that the very permissive can become fully red (your changes to the library can stay private), while the weak copyleft allow you to use it, but if you change some bits of that library you need to "give back" to the community (green arrow, this is what Björn was mentioning about the mozzila license MPL). Sorry if this wasn't super clear :)
      • OK I think I see the point, still struggling with the graphical illustration of it. An arrow into the red seems to suggest it becoming closed.
  2. Are there cheat sheets for all these licences, explained in simple words "you can do this, you can't do that"?

    • Good question! Nothing that I remember, but check for example https://fair-software.nl/
  3. Are there any licences restrictive exclusively towards Big Giants?

    • They can do what they want. :) Look at all copyright materials they have been able to use for their AI models. :)
  4. Do you check licence for every single library you install/use in e.g. python?

    • our university legal team will say "yes" to this question
      • They are not the ones using it and writing codes, are they? Are you consulting them for every single one? :) - of course they are not, but they need to handle the legal claims. I'm not a researcher, but we need to handle this when assisting researchers in publishing data and code. In the spirit of this course, try to document libraries etc that you reuse in your code from the beginning :-)
    • Open source license are based on common sense :) so if you do not plan to modity the software, you can just download and use. The issue is when some products supposedly open add a "by the way, if you are an organisation with more than 200 employees, you need to pay us for using our software" (this has happened with Anaconda https://www.cdotrends.com/story/4173/anaconda-threatens-legal-action-over-licensing-terms; they recently removed non-profit/public sector from their license).

Comments on LLMs for coding

Questions continued

Exercise

Do you cite software that you use? How?

If I wanted to cite your code/scripts, what would I need to do?

Feedback :::info

Today was (vote for all that apply):

One good thing about today:

One thing to improve for next time:

Any other feedback?

General questions continued:


Funding

CodeRefinery is a project within the Nordic e-Infrastructure Collaboration (NeIC).

Privacy

Privacy policy

Follow us

Contact

support@coderefinery.org

Improve this page

Source code