Table of Contents

RDA 9th Plenary Posters from InfoLab (MCR + JRS)


Application MCR

Name & surname

Cristina Ribeiro
(in poster: MCR, JAC, YK, EMF, JRS)

Affiliation

INESC TEC—Engineering Faculty, University of Porto

Email address

mcr@fe.up.pt

RDA member ID

Cristina Ribeiro

Country

Portugal

Poster Title

Data Publication at the University of Porto with Dendro, CKAN and EUDAT B2Share

Abstract

A large research university has multiple domains and different models for researchers to deal with data generated or used in science. Some areas have international e-infrastructure initiatives that support data management, namely with resources for data storage, access and long-time preservation. This is usually not the case for groups in the long tail, with scarce resources but valuable and sometimes unique data. We define processes and toolsets for data management by researchers at the University of Porto. The strategy is based on providing researchers with 1) a platform for data organisation and description within project groups (Dendro), 2) support for defining the metadata models that suit their domains, and 3) interfaces with data repository platforms as preservation infrastructures and outlets for complete datasets.

At the University of Porto we are experimenting with a toolset comprising Dendro, an open-source data description platform developed in house (https://github.com/feup-infolab/dendro), the B2Share tool, part of the Common Data Infrastructure of EUDAT, and a CKAN instance for data deposit within the University. Several data deposit modes are proposed, depending on the readiness of the data, the availability of domain-specific metadata vocabularies, and the sensitivity of the data. We illustrate 3 cases in different domains using the tools differently. The first case only uses the CKAN repository, where datasets are recorded and described with the help of a data curator. In the second case the researcher uses Dendro for describing the data, then Dendro takes the datasets with associated metadata and uploads them to B2Share. The third case is similar to the second, but datasets stay within university boundaries and are deposited in the CKAN repository.

Scientific Discipline / Research Area

Research data management / Data repositories, data publication workflows, data publication platforms

The approach to data publication using data description and domain-dependent metadata models is currently addressed in several RDA WG and IG. Work by the team at the University of Porto will contribute to the RDA Plenary 9 along 3 lines: - At the EUDAT pre-plenary workshop we collaborate with the EUDAT Semantic Working Group to improve the discoverability and the interoperability of multi-disciplinary resources; - At the RDA Iberia pre-plenary workshop we discuss the mission and operational view on national and regional RDA groups; - In the Repository Platforms for Research Data IG we present the current work on data repositories and data description at U.Porto.

This poster reports work related to other WG and IG we are following and contributing to, namely the “Long tail of research data IG”, the “Metadata Standards Directory WG”, the “Metadata Standards Catalog WG”, and the “Research Data Provenance IG”.

Attachment

(no file yet)


Application JRS

Name & surname

João Rocha da Silva
(in poster: JRS, NP, Will, Bruuno, MCR)

Affiliation

INESC TEC—Engineering Faculty, University of Porto

Email address

joaorosilva@gmail.com

RDA member ID

João Rocha da Silva

Country

Portugal

Poster Title

The Dendro data management platform: Bringing data to life even before the repository

Abstract

It is very important to start the research data management process well before the end of the research workflow, ideally from the moment of data creation. Throughout the project, research teams can change and there is no guarantee that at the later stages it would be possible to describe the production context of those datasets—at least, well enough to allow their reuse. Dendro is an open-source research data management platform under development since 2014 at the the FEUP Information Systems Laboratory, aimed at assisting researchers in the collaborative storage, description and sharing of their datasets. From a preservation point of view, it is completely built on ontologies for the data model (for interoperability), fully open-source (including its dependencies) and implements standards for some common operations (OAI-PMH for exposing metadata or BagIt for file bundling are examples). It also integrates with the most widely used repository platforms (EUDAT's B2Share, CKAN, Zenodo, Figshare, DSpace, etc.). From a metadata quality point of view, Dendro tries to help researchers combine domain-specific with generic descriptors in the description of their data in order to yield higher quality descriptions.

We are currently implementing several Dendro instances as part of the data management workflow of U.Porto, INESC TEC and InBIO, as per our ongoing TAIL project, targeted at the long tail of science. Our goal is to provide an efficient and agile workflow that can enable our researchers to deposit data quickly and with the high-quality metadata that can allow their reuse and the reproducibility of their results. Ideally, we want to accelerate this process enough so that researchers who want to will be able to include persistent references to their datasets in the camera ready versions of their papers.

Dendro and its installation scripts for all major OS's are publicly available on GitHub: https://github.com/feup-infolab/dendro and https://github.com/feup-infolab/dendro-install.

Scientific Discipline / Research Area

Research data management / Data repositories, data management platforms, linked open data, open source tools

Dendro is an open-source platform fully targeted at research data management. At the RDA Plenary, we hope to gather connections and collaboration opportunities; these can be either for testing Dendro within research groups, or to entice any developers to help improve this open-source platform. From personal experience we know that the more users we contact, the more challenging but also interesting issues we can gather to help steer the development of Dendro. We need these inputs and we believe that we can also give back to the RDA by publishing our proposed solutions to these problems.

Coming up with integration scenarios will yield interesting lessons learned not just for our research group but for the entire data management community, because the problem of managing data before the deposit in a repository has been largely under-explored.

Attachment

(no file yet)