Selectome (phase 1)
|Long Title:||Selectome: positive Darwinian selection on the Grid - Phase 1|
|Université de Lausanne|
Swiss Institute of Bioinformatics
|Project Leader:||H. Stockinger|
|Deputy Project Leader:||N. Salamin, M. Robinson-Rechavi|
See also Selectome Phase 2
The project provides a software tool (called gcodeml) that taps into the power of a computational
Grid to support CPU intensive calculations that support the study of evolution of species.
It enables the Selectome team at UNIL to create future versions of Selectome using computational Grid resources
of Swiss scientific and academic partners. In this way, life scientists will have an up-to-date Selectome database
which provides an easy-to-use web interface to biological knowledge.
Grid/Selectome has selected GC3's gc3pie framework as the underlying software to build a fault-tolerance submission system for codeml jobs.
|Software package gcodeml||The software is part of the gc3pie distribution and available in the SVN repository of the gc3pie project. Additionally, a copy of the gcodeml code can be found in the private SVN of SIB/Vital-IT. It is available on demand: firstname.lastname@example.org|
|Selectome: a Database of Positive Selection||Selectome web site: The main entry point|
UNIL, SIB/Vital-IT and UZH/GC3 will extend the existing software based on gc3pie from UZH to run the Selectome workflow in a full production environment during Phase 2.
The overall scientific goal is to:
Positive selection (i.e. the usage of the PAML/Codeml package) is applied by several thousands of scientists in the life science domains (PAML has more than 2700 citations). Therefore, there is a large potential user community of Selectome within and outside Switzerland. The SMSCG project is seen as a major driver to calculate new data sets and to provide a better service to the user community.
The envisaged scientific application is based on the concept of Darwinian selection, which is the force
that drives evolutionary diversification and functional changes for living beings. The group of Prof.
Marc Robinson-Rechavi has developed and operates a database of such Darwinian selection, which is called
Selectome and is freely available to scientists
of the life science community world-wide.
The actual data behind the database is pre-calculated using phylogenetic approaches and in particular with the reference software called Codeml (PAML package).
Even if the database/service is already online, there are data to update on a regular basis, and many more data sets to be pre-calculated. The group does not have sufficient CPU resources to process all the data. It is therefore of major interest to tap the computation engine into the SMSCG computing Grid since the problem is embarrassingly parallel. In addition to being CPU intensive, the application also generates considerably big amounts of data (in the order of several hundreds of GBs) that need to be properly managed.
The technical programme is divided into three technical work packages which are accompanied by a management work package: