Selectome (phase 2)

Long Title: Selectome: positive Darwinian selection on the Grid - Phase 2
Université de Lausanne
Universität Zürich
Swiss Institute of Bioinformatics (unfunded partner)
Domain: Grid
Status: finished
Start Date: 01.01.2012
End Date: 31.12.2012
Project Leader: H. Stockinger
Deputy Project Leader: S. Maffioletti
Website: http://selectome.unil.ch

The second phase of this project extends the existing software based on gc3pie to run the Selectome workflow in a full production environment.

See also Selectome Phase 1


Component Description
Software package: gcodeml Source code of gcodeml
gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution Publication by Sébastien Moretti, Riccardo Murri, Sergio Maffioletti, Arnold Kuzniar, Briseis Castella, Nicolas Salamin, Marc Robinson-Rechavi and Heinz Stockinger in Biological Evolution. HealthGrid Conference, Amsterdam, The Netherlands, May 21-23, 2012

The project provides a software tool (called gcodeml) that taps into the power of a computational Grid (particularly Switzerland's SMSCG infrastructure and related ARC-based systems) to run CPU intensive calculations that support the study of evolution of species. In this way, life scientists will have an up-to-date Selectome database (http://selectome.unil.ch) which provides an easy-to-use web interface to biological knowledge.
Life science researchers will have access to much more knowledge about evolution of genes than now. Given the modularity of the gcodeml applications, similar bioinformatics applications can also be enabled on the same infrastructure with minimal modification of the code.

The project objectives have been met to provide a computational engine for the Selectome database. This will enable the Selectome team at UNIL to create future versions of Selectome using computational Grid resources of Swiss scientific and academic partners. The main results of the project are:

  • Stable, production-ready software called gcodeml available in both source code as well as RPM
  • New scientific workflow for the pre-processing and final processing of data to feed the Selectome database

A new Selectome database is planned to be made available in the first semester of 2013. Many of the required codeml calculations will be on on SMSCG using gcodeml.


The project will work in collaboration with NGI-CH to take advantage of additional computational resources by extending the software to use the Grid infrastructure operated by the EGI project. This will considerably improve the previous work. The second phase will:

  • provide an efficient solution to satisfy the computational needs imposed by the Selectome application.
  • further develop and deploy in production mode existing code from the first phase of the project that has resulted from a very successful collaboration between UNIL, SIB and UZH (GC3). Additionally, this application can be considered as one of the few Swiss life science applications that are already well established and make true use of a Grid environment.
  • make regular full Selectome releases possible, which will be of value for the life science community working in the domain of evolutionary biology and phylogenetics.

Given that life science applications are major drivers in Grid computing, the proposed project will both help to promote the Swiss Grid infrastructure, and provide dedicated support to life scientists.


The project includes two technical work packages:

  • Enhancements of Grid-enabled workflow for Selectome and Codeml: command line-based, fault-tolerant gridification of Codeml as the computational back-end to Selectome.
    1. Improvements of current gc3pie-based submission system for Codeml (gcodeml)
    2. Extending the existing submission system to EGI's middleware
    3. Software testing and evaluation (incl. scalability tests)
  • Full production runs of Selectome/Codeml: full Selectome releases will be run on both SMSCG and EGI.
    1. Extension and improvements of the Grid-based workflow for Selectome using Codeml
    2. Prepare and run Selectome releases based on the existing and developed software