Enabling direct data access to social science research data within the GESIS Data Catalogue DBK
Friday, September 30, 2016 - 09:00
This talk will highlight the idea of enabling researchers a direct machine-actionable access to social science research data from the GESIS Data Catalogue DBK by using the open source system The Data Tank.
Currently, researchers can access more than 5600 social science research datasets, mostly surveys, by registering an account at http://dbk.gesis.org/dbksearch/. Depending on the access conditions, most of the data is accessible via download immediately or can be accessed after a shopping cart order has been processed. Due to privacy regulations, some data require special user agreements or are only available under certain security provisions (e.g. on-site usage). The current predominant data formats provided are SPSS or STATA, and others may be obtained by request.
This setting fits well the traditional social science workflow, but may not be suited for more advanced research settings today. Some of the challenges for researchers are the needs for more diverse data formats, having to apply interdisciplinary analysis methods, linking data from different sources, and data which are rapidly changing. Some of these challenges could be addressed by providing an online API for researchers to directly access the data in different formats. One example is the open source data management system The Data Tank http://thedatatank.com/. With this technology is it possible to publish data by a RESTful API and enable data users to either access the data programmatically or to download specific pre-configured formats (e.g. XML, HTML, JSON, or CSV). The Data Tank software also comes with some support for data visualization and possibilities to use semantic web standards like RDF. By attaching this technology to a user library within the GESIS Data Catalogue DBK, most of the datasets would be available in more formats, for direct online processing, and in an always up-to-date version. Certainly, current access restrictions due to privacy regulations would have to be respected within this service.
This approach would follow the existing provision of metadata in different formats by the OAI-PMH protocol (https://dbk.gesis.org/dbkoai/?verb=Identify). This service delivers DDI-Codebook, DDI-Lifecycle, Dublin Core, and DCAT format of the metadata. A similar interface for the data like described above seems appropriate. For researchers, this personal data library within the DBK would be a seamless access point for manual and automatic data access. Furthermore, it could be extended to also enable data sharing directly. Currently, GESIS offers this functionality with the datorium service as a separate service (https://datorium.gesis.org). It is using DOI names as persistent identifiers for citation of the data and syntax files to make it easier for researchers to share and get credit at the same time. The integration of those functions into a personal data library for direct data access within the DBK could address the needs of social scientists for additional data formats and data linking possibilities in a rapidly evolving data world.