Sharing data

Services for sharing and collaborating on research data

To find a research data repository for your data, you can search on the Registry of Research Data Repositories (re3data) platform and filter by country, content type, discipline, etc.

International:

  • Zenodo: A general-purpose open access repository created by OpenAIRE and CERN. Integration with GitHub, allows researchers to upload files up to 50 GB.

  • Figshare: Online digital repository where researchers can preserve and share their research outputs (figures, datasets, images and videos). Users can make all of their research outputs available in a citable, shareable and discoverable manner.

  • EUDAT: European platform for researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment.

  • Dryad: A general-purpose home for a wide diversity of datatypes, governed by a nonprofit membership organization. A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.

  • The Open Science Framework: Gives free accounts for collaboration around files and other research artifacts. Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.

Sweden:

Norway:

Denmark:

Finland:

Portugal:


Resources for data management


Licensing of datasets and databases

  • The EU has a database directive which restricts data mining on databases.

  • Has a somewhat similar effect to copyright, because copyright would not apply to data mining.

  • A good license also gives rights to data mine. So not a major concern.

When you can use datasets:

  • The license allows

  • Your country has exceptions for research

  • The data doesn’t come from the EU

License text, slides, images, and supporting information under a Creative Commons license, and get a DOI using Zenodo or Figshare or OSF other services.


Licensing and machine learning/ AI

This section is maybe more relevant to developers of AI models / AI systems rather than users of AI models / AI systems.

Is it data? Is it software? It depends. We need to consider the AI system as a whole, the training data, the production data, the AI output, and how it is put on service. AI models are like the engine of the car: they cannot do anything without the rest of the car infrastructure. AI systems are the whole car with the AI model and all the software and hardware to actually use it.

Depending on what you are going to share, there might be things to consider beyond the license.

For example large language models are often shared with open source software licenses, on HuggingFace which is like a GitHub/GitLab for AI models (see for example the OLMO model). Many so-called open-source models are actually just open-weights models: only the trained neural network weights are shared, while the training data, training code, and full documentation are often kept private. This lack of transparency raises concerns about reproducibility and accountability and this phenomenon is sometimes called “open washing” (ref). Models are also shared with a model card which is a documentation tool for transparency that provide a comprehensive snapshot of a model’s characteristics and ethical considerations (see Ch.8 Glerean 2025).

What about ethics? What about liability?

As AI models (e.g. the deep network weights) and AI systems (the model with all the software and infrastructure to query it) are becoming more available, there can be legal (and ethical!) requirements on the developer of the AI model/system by the EU AI Act. In general researchers do not need to worry, but ethically one should consider that if the research-purpose AI model/system could be used for something harmful, ethically (if not legally) one should consider if such model/system should be implemented at all.

What about the training data inside the model? Large models can memorize and unintentionally reveal parts of their training data. This raises concerns about copyright, trade secrets, and personal data. News publishers and artists are suing AI companies for unauthorized use of their content in training. It is still unclear how traditional data licenses can apply to data that has been transformed into model weights.

More resources


Further reading