The UK Data Service’s Open Data Platform: enabling access to new and novels forms of data
Thursday, September 29, 2016 - 16:00
Theme: Challenges facing users and service providers
As part of capacity building programme to enable access to new and novel forms of data, the UK Data Service has invested in an ‘open data platform’ (ODP) for big data. This is built using the Hortonworks Data Platform 2.4 (please see figure 1). We have commissioned Hortonworks to help upskill the developer and data teams in the Service, facilitating enterprise data services from open technology and establish in the industrial and commercial sectors. This ensures that the UK Data Service is well placed to contribute value and impact for the academic, business, third sector and government. Hortonworks supports the UKDS in constructing and launching the ODP, but UK Data Service Big Data Team are upskilled not just to operate and maintain the ODP going forward, but to develop and support it as a best practice technical model enabling big data social science, which can be implemented by other institutions using an entirely open source platform.
The ODP promotes big data technologies based on open source software from the Apache Hadoop ecosystem and optimises testing among and across the ecosystem’s vendors e.g. the Hortonworks Data Platform 2.4, IBM Open Platform 4.0 with Apache Hadoop, and Pivotal HD 3.0. The UK Data Service ODP will work directly with specific Apache projects, adhering to the Apache Software Foundation (ASF) guidelines for the contribution of ideas and code built on the Hortonworks Data Platform 2.2. A key benefit of the ODP will be for team members to collaborate on various Apache projects as well as other open big data projects with a goal of meeting enterprise class requirements across other UK Research Council Big Data investments. The UK Data Service ODP is also intended to promote a set of standard open source technologies and services that will increase compatibility among big data solutions for the social sciences, and simplify the process for applications and tools to integrate with and run on any compliant system.
Figure 1: The Horton Data Platform
This flagship investment, specifically, will allow researchers to access and analyse data, in an environment which meets their needs, and, more importantly for us, will allow us to provide a data access point for all these new and novel forms of data. We will be able to service many shapes of data. Key to this is the use of HSBASE to flatten data into data granules and generate the supporting RDF metadata. We are working on being able to scale using micro data, aggregate and qualitative data.
Figure 2: HBASE (Data) and RDF (Metadata)
The UK Data Service is investing the big data service model to deliver large volumes of open data, implement our Secure Lab principles for sensitive data and to explore data model(s) that can scale for complex social science concepts. The initial part of this capacity building process is to develop a hybrid Open Data Platform approach using both cloud (tracking to open data) and on-premises solutions (tracking to secure data, ISO 27001 and Safe Rooms). Through 2016, running this hybrid model we will understand how to resource, optimise and deliver a complete big data service model for the social sciences. This service also includes dashboards built off Hive and Zeppelin .
This paper contributes to sharing of knowledge about the ongoing and operational development of an ODP to serve research data in the social sciences domain. Recomendations on configuration of the ODP and tools for dealing with complex data types will be shared.
Authors: Darren Bell and Nathan Cunningham, UK Data Archive, University of Essex
Darren Bell is involved in improving the effectiveness, efficiency and agility of data within the UK Data Service. He is responsible for a cohesive data framework providing a structured collection of processes, techniques, artefact descriptions, reference models and guidance for the production and use of the UK Data Service architecture description.
Nathan Cunningham is Functional Director of Big Data with strategic and operational responsibility for the UK Data Service support for the ESRC's Big Data Network. He leads a team of technologists and data scientists providing strategic and technical oversight of the UK Data Service's Open Data Platform (ODP).
Data repositories, research tool providers dealing with social science digital data collections, those wishing to exploit platforms for archiving and sharing and enabling use and linking of new and novel forms of data.
Benefits for Audience:
Detailed information of Horton Works ODP configuation, workflows, underlying metadata schemas and tools specifically designed for research data
Topic 1: Challenges facing users and service providers
|Darren Bell||UK Data Archive, University of Essex|