Data processing approaches
Thursday, September 29, 2016 - 11:30
Chair: Paolo Manghi
In the data processing stack, major challenges lay at the level of hardware resources involved, e.g. data storage, hardware, and bandwidth (moving big data around). Resources are ''expensive'', hence cost must be sustained among stakeholders based on an economy of scale, but most importantly their proper usage and optimisation should be as transparent as possible to scientists, which may not necessarily be IT people. Scientists should be able to use their data processing services without being responsible of how storage and computation are elastically optimised at the iron level or how big data is moved on the Internet to support experiments as efficiently as possible. This session includes three experiences touching on these aspects: how an orchestrated and autonomic usage of Cloud services can help in the construction of data pipelines and guarantee QoS, and how scientists can be served with transparent ''share & sync'' functionalities, to optimise movement of data across the Australian National network of data repositories.
|EUBra-BIGSEA: cloud services with QoS guarantees for Big Data Analytics
|Dynamic creation of data pipelines in clouds
|Australian Data Lifecycle project: giving users a "data pump"