Today's session on MPI has been quite interesting and fruitful. A lot of views both from the user and application point of view as well as the Grid administrator point of view on the problems they face in using MPI on the Grid have been laid out during the session and a panel discussion on the way forward followed afterwards. The point that people focused upon during this discussion is the utilization of a robust monitoring mechanism of the EGEE infrastructure and the enforcement on sites that don't comply with the agreed recommendations and rules regarding MPI to stop supporting and/or to stop publishing that they support MPI.
In my humble opinion effort on providing MPI on the Grid should be focused on three axes. Job management, monitoring of resources and documentation of MPI. Regarding Job Management users should be in a position to choose the exact resources that match their needs something that is yet in a nurturing stage. At the moment a user may only define how many MPI processes he or she wants to use and no information regarding fine grained cpu topology may be driven through the WMS on the CE. With respect to monitoring a robust mechanism that will allow for both the alerting of Grid site administrators and the dynamic notification of users regarding appropriate resources should be put in place. Finally, with respect to documentation once again the target should be both site administrators and users each from a different point of view. Thus best practices should be enforced both for installing and configuring MPI and for running MPI jobs.
All of these or similar ideas that would lead us to a Grid with working installations of MPI should not be too difficult to implement. All it should take is a strong initiative and a community momentum and after today's session I am confident that these are in place.
No comments:
Post a Comment