Serving the long tail
Wednesday, September 28, 2016 - 16:00
The long tail of science (also referred to as “LToS” in the rest of this abstract) refers to the large number of individual researchers and small laboratories who do not have access to dedicated computational resources and online services to manage and analyse large amount of data. In science terms, the long tail is made up of scientific/research projects handled by individual laboratories or small groups of researchers, as opposed to large, expensive collaborations such as the Worldwide LHC Computing Grid (WLCG) and other Research Infrastructures.
The long tail is almost invisible and most of its members lack the technical know-how and expertise in using e-Infrastructures technologies and are handicapped by the lack of resources dedicated to store, analyse and share scientific data. With limited resources and expertise, even simple data discovery, analysis and sharing is a difficult task. Long tail users are usually interested to access and use large computing, storage and data resources for a the short term (to perform a specific action), and do this through user friendly interfaces and processes that do not require deep understanding of distributed computing.
Processes are well established for several years in EGI to allocate compute and storage resources for structured user communities who have visible presence at the European scale. However, individual researchers and small research teams often struggle to access such services at national or regional compute centres because of the complex and long allocation processes, the lack of available local capacity. Recognising the need of long tail users for simpler, harmonised and guaranteed access to digital services, the EGI community started to design and develop a new platform in 2014. The first release of this platform – called the ‘EGI Platform for the long tail of science’ – was released by the EGI-Engage project in 2016. The talk will introduce this platform for e-infrastructures, Research Infrastructures and members of the scientific community, offering possibilities for engagement with the platform either as user, or as service provider.
The platform for the LToS extremely simple access to e-infrastructure resources and services that can be relevant for the long tail. From EGI the platform make available:
• High-throughput computing sites for running compute/data-intensive jobs.
• Cloud sites for compute/data intensive jobs and/or for hosting scientific services.
• Storage resources for storing job input and output data, and for setting up data catalogues for short/mid term use.
• Science gateways that operate as graphical web environments for designing and executing scientific applications in the platform.
• ‘Ready to use’ scientific applications that are already integrated with the underlying cloud and/or HTC services, and are offered ‘as services’ through the science gateways.
• Consultancy, support and training for users.
In the heart of the platform there is a 'user registration portal'. This is where new users enter to the platform. Login is possible in the portal with Google, Facebook and EGI SSO accounts. Within the registration portal the users can define his/her personal profile and submit resource access requests. User profiles are vetted, resource requests are validated by the platform support team (directly, or indirectly through the network of NGI support teams). Default allocations are usually approved when the user profile seems valid. Custom allocations and suspicious profiles are vetted by interacting with the user via the phone, in email, or even meeting him/her f2f by an EGI.eu member or someone from the local NGI.
After a user allocation is approved the user can consume the granted resources via any of the science gateways (can be also called VREs) that are connected to the platform. Each VRE/gateway is optimised for specific data/compute-intensive use cases, and guidance is given to the user to find the most suitable solutions for his/her own situation. Moreover the EGI.eu User Community Support Team and a network of people from various NGIs can provide direct consultancy and support for the users.
The platform was created by simplifying/customising some of the existing EGI access policies specifically for this platform, and by developing a few new service that are able to implement these simplified policies. Particularly the following developments were made:
• A User Registration Portal (URP) was developed which guides the user through the access workflow and which helps the support team run the user approval process, e.g. with web forms and email notifications. The portal provides the identity federation for the platform, i.e. a user can login to this portal and to any science gateway with the same user account.
• The so-called ‘per-user sub-proxy’ (PUSP) extension of X.509 robot certificates. This enables the creation of user-specific proxy certificates from robot certificates. Science gateways can use such user specific proxies to access cloud and HTC sites to manage compute and data tasks on the users’ behalf. The proxies ensure full traceability of user actions within the distributed infrastructure, and remain transparent for the users themselves. They see username-password based login to all the user-facing interfaces.
• A library and a REST web service – both offered for science gateway developers – as tools to generate PUSPs out of robot certificates. The library can be used in gateways that operate with local robot certificates (on USB smart-cards), the web service is for gateways that cannot (or prefer not to) have a robot locally.
• A cloud and HTC resource pool (defined as the ‘vo.access.egi.eu’ Virtual Organisation in EGI), that offers compute and storage capacity for the platform users. The providers in this pool accept PSUPs for interactions initiated from the platform science gateways. The pool currently includes HTC resources from INFN-Catania, INFN-Bari, CYFRONET-LCG2 and BEgrid-ULB-VUB and cloud resources from INFN-Catania and INFN-Bari.
• Science gateways that provide user-friendly interfaces to define and conduct scientific applications on platform resources (in cloud or HTC clusters). The gateways use the identity federation of the User Registration Portal to allow access to approved users, and user the PSUP mechanism to interact with cloud and HTC resources. The platform currently includes the following Science Gateways: WS-PGRADE/gUSE and the Catania Science Gateway .
• Integrating scientific applications with the platform, so these can be easily executed from the science gateways on the connected pool of resources. Currently the following applications are available in the platform: the molecular docking using AutoDock Vina , the Statistical R , Chipster , ClustalW2 and the Semantic Search Engine (SSE).
These innovations are integrated into a single offering within the LToS platform, and also are now reused in some community-specific service platforms that operate in EGI. For example the EGI training infrastructure, and the DARIAH Competence Centre of the EGI-Engage project also adopted the per-user sub-proxy mechanism into their infrastructure to have easy to use, traceable access for technically less developed users (particularly to newcomers of EGI, and for researchers in humanities).
The EGI LToS platform is directly targeting those researchers and research teams who do not fit into any established ‘Virtual Organization community’, and/or do not have suitable and usable services within their established VO community. The presentation is targeting such people, as well as those national/regional support teams who are working with the long tail of science. The talk will inform them on main capabilities of the platform so they can become active users.
A second target group of this talk are service providers (cloud, HTC, storage, VRE and applications) who would like to make their services easy to use by members of the long tail. The talk will provide them short guideline on how to become a service provider within the platform.
A third group targeted by the talk are managers or system architects from Research Infrastructure communities. For them the talk provides an example of a distributed platform that integrates and makes services easy to use by fragmented groups of users. By repurposing the components of the EGI LToS, these communities can build their own, similar platform to reach the long tail.
Benefits for Audience:
Participants will learn how profit from the LToS e-Infrastructure and become active users.
Topic 1: Challenges facing users and service providers
|Giuseppe La Rocca||EGI.eu|