Friday, February 24, 2012

Interesting Discussions at CloudScape IV

There have been interesting discussions at CloudScape IV yesterday.

Two of these discussions have been 'HPC in the Cloud' and 'Data Management in the Cloud'.

The topic 'HPC in the Cloud' is discussed again and again at numerous events as yesterday.

The following top 3 take-away messages include:

  • There is an inflation in terms - what the term HPC means in context of statements like 'HPC in clouds' must be more carefully defined. Can we consider 32 cores used with parallel computing techniques really as HPC today? Where is the boundary of speaking about HPC that was traditionally more towards using large-scale systems being at the TOP500 list (e.g. towards 300.000 cores and more emerging when we look towards Exascale)? 
  • Scientific 'ready-to-run applications' exist (e.g. blast, namd, etc.) and are useful in the cloud (e.g. scaling up to ~1024 cores or more generally a few hundreds of cores). However, many HPC applications are developed during the active use of HPC systems meaning that the code evolves over time (e.g. linking new libraries, testing new mathematical models, etc.) with numerous compiling steps and tunings in between the overall process. Applications that are re-submitted with the exact code (like blastp for example) are not the majority of those HPC applications scaling up to thousands of cores on HPC systems. The cloud approach is not completely suited for this 'scientific application development process' but has benefits for small-scale HPC applications that are mature. 
  • Key 'application enabling support' is an important ingredient of getting large-scale HPC applications efficiently running. This means that the knowledge of scaling up with scientific codes based on numerical laws and mathematical models is typically provided by unique support staff being present at HPC centers (e.g. Simulation Labs in Juelich). It seems to be unclear who provides this unique knowledge about partly hardware-level tunings (e.g. network topologies, shapes, etc.) when public cloud systems are used. In many cases this is experience obtained  by experts over years working closely with the systems with scientific applications. Is there a helpdesk at Amazon where you can ask for a scaling workshop or asking where in the HPC codes are bottlenecks in MPI communication (e.g. performance analysis teams)?
But these are only parts of the issues. An interesting report in context is [1].

Another topic was 'Data Management in the Cloud' often discussed at events like yesterday.

The following top 3 take-away messages here include:

  • Trust issues still remain. Law and certifications can help to get a higher degree of trust in emerging cloud offerings that store sensitive or unique data for mankind (e.g. Gutenberg bible, etc.) for decades to come. Possible directions might be also a clever mix of private and public clouds.
  • Data transfer between clouds w/o the need for paying much (e.g. downloading shared data by a research group on a daily basis over years) and transforming data formats is important (e.g. especially on PAAS and SAAS levels). Here the transparency of what is possible and what not is not clear and open standards broadly adopted by key vendors might help to improve such situations. However, these adoptions needs to be guided by a reference model and derived architectures highlighting the combination of standards available for different technical areas on different levels. Mainly because standards alone are not sufficient to ensure interoperability.
  • Underestimated complexity of research datasets that are different from 'the simple flickr image' that is being optimal for data clouds today. This includes complex directory structures surrounding datasets and huge data frameworks covering persistent identifiers, data policies (e.g. moving data when data silo is 70% filled), and complex metadata frameworks in context reaching in the field of ontologies specific to user communities. Standards exist on many levels and in many fields reaching from DICOM, CIM, NETCDF to cloud data standards such as CDMI. But how all fits together with the given complexity needs to be much better explored especially in setups that include emerging research infrastructures, existing community data centers with data silos that provide data trust guarantees (e.g. 50 years), and emerging cloud offerings.
Also these are only parts of the issues and the reason why the European data infrastructure EUDAT [2] is working not with cloud offerings in the moment when it comes to concrete data replication in this federated infrastructure across Europe.

Nevertheless, the cloud surely provides a lot of opportunities for the community some of which perhaps just needs to be better explored and communicated over time.

Finally, please note that these views are personally-biased views from CloudScape IV and not necessarily reflect the agreement of the whole audience of the event.

Best regards from Brussels to the community,
Morris Riedel
Juelich Supercomputing Centre
EMI Strategic Director &
EUDAT Data Replication Task Force Leader



No comments: