Sharing data
Services for sharing and collaborating on research data
To find a research data repository for your data, you can search on the Registry of Research Data Repositories (re3data) platform and filter by country, content type, discipline, etc.
International:
Zenodo: A general-purpose open access repository created by OpenAIRE and CERN. Integration with GitHub, allows researchers to upload files up to 50 GB.
Figshare: Online digital repository where researchers can preserve and share their research outputs (figures, datasets, images and videos). Users can make all of their research outputs available in a citable, shareable and discoverable manner.
EUDAT: European platform for researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment.
Dryad: A general-purpose home for a wide diversity of datatypes, governed by a nonprofit membership organization. A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.
The Open Science Framework: Gives free accounts for collaboration around files and other research artifacts. Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.
Sweden:
Norway:
NSD - Norwegian Center for Research Data, for any kind of data
Dataverse.no - Dataverse network, based at University of Tromsø but open for other institutions
Denmark:
Finland:
Resources for data management
Licensing of datasets and databases
The EU has a database directive which restricts data mining on databases.
Has a somewhat similar effect to copyright, because copyright would not apply to data mining.
A good license also gives rights to data mine. So not a major concern.
When you can use datasets:
The license allows
Your country has exceptions for research
The data doesn’t come from the EU
License text, slides, images, and supporting information under a Creative Commons license, and get a DOI using Zenodo or Figshare or OSF other services.
Licensing and machine learning/ AI
Is it data? Is it software? We need to consider the AI solution, the training data, the production data, the AI output, and AI evolutions.
How about ethics? How about liability?
Models can be reverse-engineered and training data can be extracted
What if the model generates an outcome that is dangerous? .cite[Thanks to E. Glerean for pointing these issues out to us]
Some resources
Further reading
Good talks on open reproducible research can be found here.
“Top 10 FAIR Data & Software Things” are brief guides that can be used by the research community to understand how they can make their research (data and software) more FAIR.
Publishing research software A MIT libraries webpage on why to publish software, where to publish software, and how to make software citable.