DIGITAL INFRASTRUCTURES for RESEARCH 2018 | Serving the user base

E-Infrastructures for the Bioinformatics long tail of science: a user perspective

Wednesday, September 28, 2016 - 16:00


The advent of digital Infrastructures has, without doubt, transformed research in almost all scientific disciplines. This becomes more evident in Life Sciences, where the vast amounts of data produced daily demand the use of large scale Infrastructures. A multitude of different platforms, technologies and solutions provide researchers with services addressing fundamental needs to produce high-quality and high-impact research, such as access to literature, data and computational resources, all within the context of high-speed international connectivity. However, and despite their overall robustness and availability to the wider scientific community, knowledge and utilization of the offered services by researchers, especially in the long tail of science, is rather limited and fragmented at best. This is particularly evident in the field of Bioinformatics, mostly due to the remarkable diversity of background and competencies of the researchers working in the field as well as the intrinsic requirements and characteristics present in the involved research studies.

The aim of this talk is to provide both a clear overview of the current situation in Bioinformatics, as well as offer advice on potential strategies that could be adopted in order to make people knowledgeable on the use of the existing services. Particular focus will be given towards researchers belonging to the long tail of science, i.e. the individual researchers and small laboratories who - opposed to large, expensive collaborations - do not have easy access to local computational resources and online services.

It is clear that the fast-paced rate of data production makes Bioinformatics an exciting field to work in but at the same time, it creates a challenging environment where one needs to keep in-depth biological knowledge up to date. This is further confounded by the fact that there are anywhere between 2 and 100+ biologists for every bioinformatician (in SME/Academia and Large Pharma, respectively), i.e. a significant discrepancy in the number of people knowledgeable in biological Data analytics who systematically produce tools, applications and data structures, and the number of people that simply depend on the existence and/or use of said tools. This severe imbalance in expertise introduces significant lags in project progression, even for simple Data processing and getting the results from the teams. To complicate things further, besides the technical knowhow in the labs, not all data is the same. Since the data being produced is not all being made on the same machines with the same conditions nor even measuring the same parameters; the challenge of integrating the data presents a further hurdle to the ability to make decisions using this growing resource. Finally, there is surprisingly little application of a standardized workflow in the projects coming through most bioinformatics teams. This further leads to less time being spent on developing new algorithms and resources that not only merit their own publications, but are essential to speed data processing, integration, and analysis and, hence, research. In this scenario, E-infrastructure services and solutions can greatly facilitate standardization and reproducibility in the everyday work of bioinformaticians, thus allowing them to increase the time devoted to new and innovative research.

This is a far cry from a synergic and innovative collaboration within the digital Infrastructures hand-in hand with researchers and citizens as envisioned the Open Science Cloud Initiative; it is therefore becoming imperative to increase the uptake of the digital Infrastructure solutions in bioinformatics research. E-Infrastructures already offer services that are being used in different bioinformatics studies; the next step should be to provide researchers with the right toolkit to guide them through the process. Tools would need to automatically analyze the Data with meaningful algorithms, display results in a manner that was interpretable and most importantly be easily reproducible without the need to write a series of small novels capturing the settings and processes that were required to produce it in the first pass.

The way forward will definitely require a stronger training process that will eventually allow for a wider adoption of the tools and for the presence of Data scientists and engineers in research labs working in close collaboration with other researchers. Educating scientists on the use of solutions offered by the e-Infrastructures should clearly reflect their particular needs. As such, careful design of domain specific training materials that can be readily shared and used during and after the training process and guidelines accessible to a wider audience, will greatly improve both the impact of the involved tools as well as the overall uptake of the corresponding solutions.

In this context, organizations such as ELIXIR (, RDA (, CODATA ( and GOBLET ( have a key role in the process of surveying the needs of the Life Science community; making researchers aware of existing services and resources and encouraging their use; sensitizing service providers on the difficulties the Life Science researcher can encounter in accessing and utilizing e-infrastructures; delivering ad hoc training, and developing related training materials and guidelines to help researchers become familiar with e-Infrastructures.


Target Audience:

This talk aims to provide a clear overview of the current status of e-Infrastructure use in the long tail of science for Bioinformatics. As such it would be of particular interest to (a) service providers, i.e. stakeholders offering digital solutions that can be exploited in the field of Bioinformatics (e.g. Data management, storage, processing and analysis), (b) training institutions who produce and/or offer training services for Life Scientists in e-Infrastructures (e.g. Data Science and Engineering, Bioinformatics), and finally (c) Life Science researchers active in the field of Bioinformatics but with limited use of the current e-Infrastructure services.


Benefits for Audience:

This talk will attempt to elaborate on the particular requirements and specifications evident in the daily work of Bioinformatics teams in the long tail of science. Coupled with the vision of the Open Science Initiative, this talk will highlight the key aspects that need to be addressed in order to achieve a convergence of the existing e-Infrastructure solutions in the field of Bioinformatics, while at the same time, increase the uptake of these solutions by the wider community.


Topic 1: Challenges facing users and service providers

Topic 4: Working with data


Presenter Organisation
Fotis Psomopoulos Aristotle University of Thessaloniki
Co-authors Organisation
Allegra Via IBBE-CNR
Pedro Fernandes IGC
Eija Korpelainen CSC
Afonso Duarte ITQB-NOVA
Download presentation: