ETHZ.7 |
VM-MAD |
Long Title: | Virtual Machines Management and Advanced Deployment |
Leading Organization: |
ETH Zürich |
Participating Organizations: |
Universität Zürich
SWITCH - Teleinformatikdienste für Lehre und Forschung |
Domain: | Grid |
Status: | finished |
Start Date: | 15.02.2011 |
End Date: | 15.07.2012 |
Project Leader: | P. Kunszt |
Deputy Project Leader: | Ch. Panse |
Component | Description |
Public Resource | Repository of the code produced by the VM-MAD project (public) |
Online Documentation | Documentation wiht installation instructions, module description and available commands (public) |
Project Wiki | Internal project wiki (protected, open by request to project partners and universities) |
Orchestrator | 'Passive' component, that monitors the state of the LRMS (Compute Cluster) and adds or removes compute nodes based on a set of policies defined by the system admin (implemented as set of python classes) |
VM Policies | Example of local policy (FGCZ), defined in a python class (start, stop, running jobs) |
Batch System interface | Sun Grid Engine monitoring module and module reading accounting files from LRMS for simulation cloud/grid behavior using real world data and different system configurations and parameters. |
Provider Interface | using apache LibCloud which supports Amazon EC2, RackSpace, Go Grid and others. It can also use SMSCG through the gc3pie framework. |
The project delivered a solution and a set of procedures and best practices allowing a local resource
provider to cloud-burst towards a public/private cloud provider and towards the SMSCG infrastructure.
The final result is a mechanism to seamlessly and dynamically expand a computational batch cluster to
respond to peak of loads and/or to urgent situations where the immediate availability of computational resource is of a critical importance.
Know-how and experiences on how to configure and operate a cloud-bursting computational cluster as well as
how to monitor and control the cloud-bursting features have been acquired. A set of technical documents
and procedural guidelines have been published.
The project has been focusing on the FGCZ use-case and a set
of site-specific virtual appliances have been created. These appliances have been used to cloud-burst
the FGCZ computational cluster towards both the Amazon EC2 and SMSCG infrastructures. The approach,
the software components as well as the methodology have been made publicly available so other resource providers could benefit.
A cloud-burst simulator has been also developed as part of the software stack that dynamically controls
the cloud-bursting capability of a site. Such a simulator could be used by a local provider to test and
verify the potential advantages of a cloud-bursting policy with own real usage data.
In order to manage the increasingly complex software stacks, and to reduce the effort needed to migrate
software to the latest environments, virtualization offers many benefits to cluster administrators and
to the end-users. Especially for the many high throughput applications that do not need parallel processing
(and this is the vast majority of the scientific codes) the advantages make virtualization a very useful
technology in that context.
Once the cluster environment can be virtualized, the same can be done in a Grid context and be extended to
commercial cloud environments. In order to reuse virtual machines locally and across multiple Grid sites
(and commercial clouds), a repository of Virtual Machines (VM-Repo) should be established as well as a
mechanism to select the right VMs and to submit them to the individual Grid sites. Finally the VMs must be made
available to the end user through a dedicated, dynamic batch queue.
In summary, we will
The following components will be established: