Data clustering research in CMS

Koen Holtman

Speaker: Koen Holtman

  The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a good match between the object clustering and the database access patterns of the physics application. We discuss the results and conclusions of a 3-year research project on clustering and reclustering, that has been performed by CMS as part of its contribution to RD45. We focus on the implications of the project results for the long term LHC computing strategy and risk analysis. We give an overview of the risks related to the I/O capacity needs for LHC physics analysis, and discuss how the use of automatic reclustering systems can mitigate some of these risks. Based on our project experience, we also speculate on which risks can be successfully handled, for example through large scale simulation studies.

