EUBra-BIGSEA: cloud services with QoS guarantees for Big Data Analytics
Thursday, September 29, 2016 - 11:30
EUBra-BIGSEA (Europe - Brazil Collaboration on Big Data Scientific Research Through Cloud-Centric Applications) is a collaboration project funded under the third Europe-Brazil coordinated call aiming at developing advanced cloud services to support Big Data analytics.
The project addresses two main issues:
- Proactive, self-adaptive and intelligent resource allocation, estimating the rightmost configuration of virtual resources to each specific algorithm family and programming model, with automatic dynamic reallocation of resources and a learning system that models the algorithms’ behavior.
- Multi-platform support of data analytic frameworks, OLAP database queries and generic programming models that exploit the inner parallelism of the applications. Those programming models are executed on top of the above platform.
EUBra-BIGSEA cloud-based architecture will integrate:
- Container-based embedding of the application-specific software dependencies.
- Combination of Mesos and YARN schedulers to leverage the benefits of both schedulers for generic applications and hadoop-based workloads.
- Fine-grain monitoring through OpenStack MONASCA connected to reactive resource reallocation to fine tune the allocation of resources defined by the proactive policies.
- Application topology specification based on TOSCA blueprints that will ease to be agnostic to the Cloud Management Framework, using Infrastructure Manager.
- Vertical and Horizontal scaling APIs linked to the monitoring system by CloudVAMP and EC3.
- Applications programming through COMPSs and Spark.
- Jobs scheduling internally managed by Apache Chronos and Marathon.
- Privacy preserving policies to re-annotate data products depending on source data and the specific algorithms.
- Reactive and proactive policies triggered by performance models and optimisation algorithms capable to minimise infrastructure costs while providing QoS guarantees.
EUBra-BIGSEA is an API-oriented project which main aim is to develop open-source pieces of software on top of active and reasonable mature components. Therefore, their results will be of great interest for at least two audiences:
- Infrastructure providers, who would like to set up a framework that deals with multiple types of data analytic workload without statically partitioning their datacentres and with the capabilities of automatically scaling-up and down.
- Data scientists that will like to set-up an environment to run their applications in Spark, OPHIDIA or directly in high-level languages such as Python or Java on top of a scalable systems.
Benefits for Audience:
This contribution will present and discuss several topics that will be interesting for the target audience:
- A discussion on the suitable technologies for resource management of adaptive, container-based infrastructures of resources. The attendees will understand pros and cons of the different existing technologies and their complementarities, which could ease their decision.
- The model of resource estimation, provisioning, fine-tunning and undeployment with respect to the static user-driven provisioning as an alternative to reduce idle “zombie” instances and overestimated resource allocations.
- To learn and discuss who the application use cases on traffic management resemble their data analytic problems to leverage our approach.
Topic 4: Working with data
|Ignacio Blanquer||Universitat Politècnica de València||www.eubra-bigsea.eu|