DIGITAL INFRASTRUCTURES for RESEARCH 2018 | Serving the user base

The INDIGO-DataCloud Service portfolio and scientific usage scenarios

Date: 
Thursday, September 29, 2016 - 16:00

Overview:

The session will be focused on the description of the achievements of the INDIGO-DataCloud project and on their applicability to concrete implementations in scientific domains, e-infrastructures and resource providers. The INDIGO catalog of the MidnightBlue software release, is available at:

https://www.indigo-datacloud.eu/news/indigo-service-catalogue-online

Cloud computing has quickly evolved in the past few years in both the public and private sectors. However, there are numerous areas of interest to scientific communities where Cloud Computing uptake is currently lacking, because of a low integration level with scientific applications and because of complex workflows and constraints of the cloud provided services.

In this context, INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation - www.indigo-datacloud.eu), a project funded under the Horizon 2020 framework program of the European Union, is developing a data & computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over hybrid e-Infrastructures.

The INDIGO-DataCloud platform features contributions from leading European distributed resource providers, developers, and engages users coming from various Virtual Research Communities (VRCs). It is based on open source solutions and addresses scientific challenges in Grid, Cloud, HPC and local infrastructures and, in the case of Cloud platforms, provides IaaS, PaaS and SaaS solutions.

In particular, the PaaS layer aims at supporting geographic brokering and deployments across multiple sites. The PaaS computing core includes services for the deployment and management of jobs, long-running services and virtual infrastructures across multiple Cloud sites based on popular open-source Cloud Management Platforms (CMPs), namely OpenNebula and OpenStack. These services are coherently integrated with virtualized storage, federated AAI and enhanced networking.

A key aspect of the platform is that at the site level operations leverage scalable and interoperable solutions with minimal impact on the sites. So, for example, the deployment of customized virtual infrastructures is achieved at the site level by means of the IM in the case of OpenNebula, and Heat in the case of OpenStack. A common language to define the deployments is employed, by adopting and extending the TOSCA Simple Profile in YAML version 1.0 standard. The Orchestrator service, the entry point for the INDIGO-DataCloud PaaS, receives a TOSCA description which flows through the PaaS layer, interacting with other services (Monitoring, Brokering, QoS/SLA, etc.), to end up being processed by either IM or Heat at the site level to enact the required virtual infrastructure.

With respect to data management, the main goal of INDIGO-DataCloud is to provide users with a unified interface for management of their data in the form of a globally accessible distributed file system. Users will be able to access and manage private as well as research data in the same way, whether accessing them from their laptop or from virtual machines running in the cloud.

Another important aspect of the INDIGO platform is related to computation scheduling which in case of data intensive scientific computations can be a major issue. This is addressed in INDIGO by provisioning detailed information about the distribution of files and datasets. This information enables the INDIGO scheduling components to make the best possible decision on where to run computational tasks by taking into account, on the one hand, the penalty related to staging data to a site where the data is not available; on the other hand, the compute power available on the site where data is already replicated. In addition to information about file replica distribution, the platform provides detailed monitoring metrics, which are collected for each user, each space and each storage provider in order to choose the best sites to deploy services or applications.

At the Cloud site level, the computing platform provided by INDIGO enhances the IaaS layers with additional features that are currently missing. Firstly, by introducing TOSCA support for the CMPs it enables infrastructure orchestration at the site level, based on a common standardized language. Secondly, the adoption of Docker containers as first-class resources in the CMPs enables lightweight isolation among computing resources and easy integration with repositories of images (e.g. Docker Hub). In fact, components such as OneDock, which introduces Docker support in OpenNebula, have already been released with significant outreach in the OpenNebula community. The scheduling algorithms for both OpenStack and OpenNebula are being improved, adding support for preemptible instances (making it possible to evacuate workloads when higher priority workloads are needed) and queuing of requests (enabling users to perform HTC tasks on cloud resources). These two facts provide a better experience for end users and a more efficient utilization of the computational resources from the resource provider standpoint. The adoption of a two-level orchestration mechanism (at the PaaS level and within each IaaS Cloud) provides a scalable approach to provision customized computing resources across Cloud sites to support the computational requirements identified by user communities.

The proposed talks of this session will give an overview of the services that INDIGO provides at the level of: virtualizing resources, federating services at PaaS level, and Scientific Gateway.

Both the INDIGO service portfolio and a few of the most commons usage scenarios, in which services could be exploited, will be described.

The last slot gives the audience the opportunity to explore what are the most relevant services for them and

the project the possibility to better understand what it can be done to improve the service portfolio according to the needs of the different relevant actors (end user, resource provider, infrastructure manager or participant to other projects or initiatives).
 

AGENDA:

  • Introduction [Davide Salomoni] 10 min

  • Resource virtualization services (INDIGO development at the level of IaaS) [Patrick Fuhrmann] 15 min

  • Federation services (INDIGO development at the level of PaaS) [Giacinto Donvito] 15 min

  • INDIGO development in the context of the Scientific Gateway [Riccardo Bruno] 15 min

  • Scenarios for the deployment of INDIGO Services [Giacinto Donvito] 15 min

  • Q&A and User Perspective view  20min

 

The output of the session will be:

  • a report about the status and the features of the released services,

  • a set of best practices in terms of how to exploit the INDIGO services in concrete use-cases

 

The foreseen target Audience is:

  • Resource providers that are willing to implement advanced resource virtualization services that could enhance the flexibility and the features at the level of any single site

  • Advanced users/application and service developers that could benefit of the services INDIGO is providing in order to improve the applications, services or VRE

  • e-infrastructure managers, that could enrich the set of services, capabilities of the infrastructures that could be exploited by default from the users communities working in a give e-infrastructure.

 

Benefits for Audience:

The audience will have the opportunity to know the features and the status of the services provided by INDIGO-DataCloud project.

This session will not only provide technical details about the implementation of the services, but also how to exploit them in a real production scenario.

The session will also give the opportunity to discuss how those services could be used in the context of the already available European e-infrastructure.