ETHZ.13 |
RSNAS |
Long Title: | Remote Scalable Network Attached Storage |
Leading Organization: |
ETH Zürich |
Participating Organizations: |
Swiss National Supercomputing Centre
Universität Zürich |
Domain: | Grid |
Status: | finished |
Start Date: | 10.07.2012 |
End Date: | 28.02.2013 |
Project Leader: | P. Kunszt |
Deputy Project Leader: | M. De Lorenzi |
The result of this project is a working system to provide NAS services anywhere in the country, linking it directly to the CSCS
storage, or in fact any other storage that runs the same software based on GPFS.
The results have been presented at the HPC Advisory Council and will be written up also in other publications later in the year,
including technical details of the work.
Remote Scalable Network Attached Storage was exploring a technology by IBM that would make the large
remote storage at the Swiss National Supercomputing Center CSCS in Lugano available to the SystemsX.ch projects
in Zurich and Basel as if it was local storage. Network Attached Storage systems are easy to use storage
systems usually for smaller amounts of data that can be attached to any computer and device.
The storage at CSCS is a very large (currently 8PB) hierarchical storage system (this means data
is stored more than once such that it cannot be lost). Due to its very large size, the individual
Terabyte cost is quite low. Several 100TB can be rent for a very low cost for research projects that
need it just for 3-4 years if the data can be made available "locally".
Such a system with 200TB was prototyped at the ETH and the University of Zurich.
Know-how about how to design and configure was built up. The setup was tested extensively, in a close collaboration with CSCS and IBM.
A tested technology for data sharing can now be offered for other projects. CSCS is already offering this technology to other large projects: the new Human Brain Project will make use of this technology to access data stored in Lugane from Lausanne.
The main use case for RSNAS came from SystemsX.ch, who
will make use of it also in the next 4 years of operations, also for new projects at other universities.
Depending on the demand further NAS head nodes are now easily deployable.
Also, this effort will be rolled into the CRUS Bridge project for the Swiss Academic Compute Cloud (SwissACC), led by the UZH.
Currently exist two user communities - one at the ETH and one at the UZH. The storage is used as the project
storage for imaging and proteomics platforms. These in turn support several labs at both the ETH and UZH,
counting on the order of 100 people.
Within RSNAS a remote storage setup between CSCS and ETHZ will be developed and tested, and a working
instance at the UZH will be set up. The basic idea is to expose the scalable HSM that is operated at
the CSCS over the WAN as a local Network Attached Storage remotely at the ETHZ.
This prototype will prove that expose the user's share as a local Network Attached Storage NAS inside
the University's own network can be replicated as technology and be used in many other projects that
need to share data across the country, making use of storage for Systems Biology research or in general
being able to access remote large storage systems. In a sense, this can be seen as essential technology
necessary to build a federated Swiss Storage 'Cloud' in the future.
Currently, remote users need to copy large amounts of data to their local storage systems after their projects
at CSCS have completed or before their projects start. Through new technology it is possible to expose
the user's share (up to several hundred Terabytes) as a local Network Attached Storage NAS inside the
University's own network (in this case the ETH Zurich and University of Zurich). A first installation
is already in place at the ETHZ, directly connecting to the CSCS HSM storage, with a SystemsX.ch share of
currently 100TB.
Scalable storage is an essential prerequisite for modern research, especially in the Life Sciences.
Many projects underestimate the needs for their data repositories and the groups are not well prepared
to deal with the large volumes that are generated within large collaborative projects.
In addition, there are no good concepts how to share and access the data across institutions collaborating
with each other. Copying terabytes of data over the wide area network is a slow process, and both sides
need large data storage infrastructures to accommodate all the data. Changes and additions need to be
carefully synchronized inside collaborations. Local policies often prevent easy access to collaborative
data as well, there are many historically grown firewall rules that make sharing and collaborations difficult.
Several research groups will need large data analytics resources in the future, like the ones currently
available at the CSCS from Cray and SGI. How computational resources of the ETHZ Brutus Cluster and
the UZH Schroedinger Cluster can be attached (if the data is available at all sites) will be tested within this project.
A technology was prototyped to remotely access the CSCS storage from the ETH in Zurich as if it was
a local Network Attached Storage (NAS). A server at the ETHZ (called storagex.ethz.ch) hands requests
through to CSCS and provides NFS access to the data at CSCS, but many open questions remain.
This project is not a completely new one from scratch but it extends an already working setup in order
to make it more usable, covering a wider range of use cases.
The technology used comes from IBM: General Parallel File System (GPFS) and parallel Network File System (pNFS).
The following technologies will be considered: