The Australian National Research Data Lifecycle Framework
Thursday, September 29, 2016 - 14:00
The short story:
The Data LifeCycle Framework (DLCF) is a nationwide effort to connect eResearch resources and services such that researchers can make use of existing national, state-based, local, and commercial eResearch tools including storage, compute, identity, and networks. Three national eResearch projects (Research Data Services (RDS), National eResearch Collaboration Tools and Resources (Nectar), and the Australian National Data Services (ANDS) have partnered with Australia's national research network (AARNet) and Identity (Australian Access Federation) providers to define a pathway and scaffold for linking together the immensely wide range of services they supply to the research community.
The DLCF aims, eventually, to connect critical elements of the data journey from grant approval through to project finalisation, results publication and archiving while leveraging existing eResearch investment to provide a flexible and dynamic national DLC framework to support research.
A shared information model capturing information throughout the DLC will be an enabler for institutions to effectively locate, process and reuse data, thereby reducing the institutional risk associated with non-location of data sources.
Institutions will have the flexibility to use elements of the framework that are common across institutions while continuing to maintain their internal business processes and procedures if desired. There will be the opportunity to positively impact the direction of ongoing framework development through continuous feedback, together with the option to opt-in or opt-out of components of the framework.
The DLCF isn't a national Data Management Plan, and it isn't a story addressing the complete research data lifecycle, but it does lay the foundations for enabling a national view and understanding of research data; it's location, relevant uses, owners, custodians, and provenance and aims to facilitate both institutional, organisational, and potentially broader DMP's. With our colleagues at ANDS, Nectar, AAF, and AARNet we've broken off a small part of the question for closer analysis and action, around defining API's and metadata enabling the connection of institutional Data Management Plans, called the DMP Connector program.
We excited that one of our collaborators on the project, Guido Aben, of AARNet, will also be speaking about this program. Where we will be talking about the "What" and the "Why", Guido and his team are tackling the "How" around the operational aspects of the proposed initial system, Data Managemen Plan Connectors, and specifically around the AARNet Cloudstor service and the substantial contribution to creating a ubiquitous, flexible, and accessible file storage environment it offers.
The DLCF intends to provide the connecting infrastructure between existing local data management processes and policies and the wide array of national, local, state-based, and commercial infrastructures available to researchers.
The big picture
The research data lifecycle is an immensely broad field of consideration. The slide deck below outines some of the considerations in play during the scoping phases of this body of work, identifying the component parts which are currently in use and available and how they might be connected. It includes a range of examples of connecting varied infrastructures to provide the ability to describe the lifecycle of research data.
This talk is aimed at research data management policy makers, organisational and national infrastructure providers across data generation, storage, manipulation, and management, as well as organisations aiming to raise their communities' awareness of and ability to exploit data management tools and processes.
Benefits for Audience:
This talk represents a perspective on the Australian academic research community's current thinking around a national approach to research data management; where we are aiming to provide uplift to the vast pools of un-managed data existing within organisations while, at the same time, avoiding negative impact on existing data processing workflows. Attendees will be provided with an overview of experiences, concepts, tools, and the discussions being held around the experience of layering a generalised national framework across many diverse and varied local and national communities.
- Topic 2: Services enabling research
- Topic 3: A changing environment, changing research
- Topic 4: Working with data
Outputs and Outcomes
Through an architecture able to perform across geographically separated locations the following outcomes are targeted:
• Automated provisioning of high-performance, durable, and reliable storage with access and authorisation rights based on unique project identifiers
• Ability for nominated primary contact (“owner”) of project data to share and control access to this space
• Automated and integrated workflow and/or quality assured data ingest processes, including generation of searchable metadata, from instruments, people, and other sources
• Immediate visibility of the storage to sector and public cloud resources and simplified data shipping to and management of data at national peak or local high-performance facilities or local stores across multiple locations.
• A comprehensive audit trail of events performed on datasets to a searchable metadata repository
• Upon completion of use of data, the facility to ‘package up’ the dataset(s) and ship them to one or more repositories while registering metadata with one or more data directories including the Australian National Data Services’ Research Data Australia index.
• Flexible data interrogation, discovery, rediscovery, and recovery via RDA and other directories.
The pilot program, with Australia’s academic network provider, AARNet, provides the foundation functional and technical designs upon which further workflows and services can be built, based on AARNet’s OwnCloud implementation called CloudStor. The core outcomes for this body of work, which are applicable across our current and potential future architectures, include:
• Simpler access for a wider audience of researchers to nationally funded and local storage resources
• Implementation and refinement of an extensible research data metadata schema based on international best practice
• Improved flexibility and utility of national and local shared and private storage resources for collaboration, engagement, interrogation, and re-use
• Improved data provenance through improved life-cycle tracking
• Evaluation of, and blueprint designs for, ingest and metadata harvesting tools
• Improved utility of the data through simplified access to sector and public cloud compute environments
• Improved durability and re-use of data through tracking of archiving to trusted repositories
• Development of enhanced federated identity services across infrastructures and services
The target output is a flexible, end-to-end research data management solution across multiple user, organization and domain groups and multiple geographical locations. Rather than a processing platform, this is a data library framework to be flexible in performance, interrogation, and sharing which can then feed into workflow driven analysis, processing, and access platforms.
|Ian Duncan||Research Data Australia|