BioExcel/MolSSI Workshop on Workflows in Biomolecular Simulations

As part of the collaborative frameworkBioExcel CoE and MolSSI are organizing a series of workshops on the application of workflow solutions for biomolecular modeling and simulations. This first event will be hosted by BioExcel in Barcelona with a planned follow-up in 2019 in the US hosted by MolSSI.

Background

Workflows are an increasingly important aspect of biomolecular simulation science. Over the years there have been a variety of projects around the world to develop workflow systems that would be well suited to the demands of this type of research, but to date it has been difficult to achieve widespread community uptake, or to develop a good model for sustainability.

The purpose of this workshop is to bring together current projects from both sides of the Atlantic that involve, as an element of their activity, the development and application of workflow systems. The aim is to foster a dialogue between them that supports the development of interoperable and/or harmonized solutions that have the best chance of being long-term sustainable products, because they have a worldwide community of molecular simulation scientists using them day-to-day and supporting their continued development.

It is not the intention that this workshop is a place for projects to make “sales pitches” – rather, we seek participants willing and able to discuss both the strengths and weaknesses of their approaches, what opportunities could come for synergizing with other work, and where there are still gaps in provision and unmet need.

The process

In order to focus discussion on issues close to end-user needs, participants are asked to come prepared to discuss how their particular workflow system might be applied to two common workflow patterns, but there is no expectation that participants arrive with the example “solved”.

Example 1: The simulation-analysis loop

Workflows for enhanced sampling methods are often of this type – e.g. those that aim to calculate free energies or identify rare states (maybe, cryptic ligand binding sites). Key features are: looping, gather/scatter operations, large numbers of independent parallel simulations (the number of which may, or may not, be known in advance), and decision points.

Example 2: The pipeline

Workflows designed to perform complex analyses on large datasets are often of this type – e.g. those analyzing the effects of protein mutations on drug efficacy, or reverse docking procedures (searching for the most likely protein target for a given ligand). Key features are: independent execution of the pipeline for each member of the input dataset, a requirement for the interfacing of a heterogenous collection of tools that may not have been designed, originally, to work with each other, and a significant likelihood that some input combinations will “fail” somewhere along the pipeline.

In both cases, participants are invited to consider their approach to these workflows from two angles:

  1. a)  How would the workflow be written, and how easy would it be for a new user to write, from scratch, their own workflow that was of this pattern? Where do you see strengths, weaknesses, and gaps in provision?
  2. b)  How would this workflow be executed, and how easy would it be for a new user to execute it on their own computational infrastructure, whatever that might be? Where do you see strengths, weaknesses, and gaps in provision?

Date/Time
11 Dec, 2018 – 14 Dec, 201

Location
Barcelona Supercomputing Center