Knocking on data repositories’ doors - How to build an integrated search index for social and economic data
Wednesday, September 28, 2016 - 14:00
In information infrastructure projects and initiatives the development of a common culture and practice of data sharing is aspired. The achievement of this goal is largely dependent on internationally compatible infrastructures that facilitates sustainable data references as well as integrated search and retrieval capabilities within research data. The proposed presentation will introduce the approach of the German project da|raSearchNet. da|raSearchNet aims at establishing an integrated search network for social and economic research data that enables users to search up-to-date references of data holdings on an international basis and in a comfortable way. The point of departure and core of the network is the database of the da|ra registration agency for social and economic data (www.da-ra.de/en) that already includes searchable metadata from registered data publishers, among them the considerable holdings of the German GESIS data archive and the US-American ICPSR. Currently the content of the database is expanded significantly by continuing registration activities and by harvesting data references of relevant international data providers. In order to enable the integration of metadata from multiple sources in an automated and efficient way the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is used. da|raSearchNet harvests metadata records from several data providers around the world, generating a centralized index on da|ra quality measures and makes it searchable via a search interface.
The presentation will focus on the experiences and challenges of the OAI-PMH harvesting of metadata, investigating the quality of the metadata and creating the search index. Besides the description of the applied tools (elasticsearch, kibana etc.) it will discuss issues such as the criteria of the selection process of the data sources, the harvesting policy, the capability of the OAI-PMH service as well as the quality, formats and licenses of provided metadata. First results of the harvesting and indexing process will be shown in a beta version of the da|raSearchNet interface.
The outcome will be a report/paper. It is planned to provide it for the RDA Working Group “Research Data Repository Interoperability” (https://rd-alliance.org/group/research-data-repository-interoperability-...).
- Data providers and operators of data repositories, especially repositories that expose structured metadata via OAI-PMH
- Service providers that make OAI-PMH service requests to harvest metadata
- Data Managers
- Data Archivists
Benefits for Audience:
- Learn about a new service (da|raSearchNet) and sharing experiences
- Best practices on a technical level (OAI-PMH and elasticsearch)
- Quality assurance choosing the data
- Standardization and interoperability of research data repository platforms
Topic 4: Working with data
|Brigitte Hausstein||GESIS Leibniz Institute for the Social Sciences||htpp://www.da-ra.de|