Visualization Tools for Monitoring and Evaluation of Distributed Computing Systems

Ray Cowan1, Gilbert Grosdidier2, (for the BaBar Prompt3
  1. Laboratory for Nuclear Science, M.I.T.
  2. Lab de l'Accelerateur Lineaire, Orsay
  3. Reconstruction and Computing groups)

Speaker: Ray Cowan

  We describe several tools used to evaluate the operation of distributed computing systems. Included are tools developed for visual presentation of data accumulated from several sources. Examples are taken from the BaBar Prompt Reconstruction system, which consists of more than 200 individual nodes and transmits 500 gigabytes/day to an object-oriented data store. Each node records its actions in a log file, and along with other performance logs, these supply the data required. One tool, a log analyzer and browser based on Perl/PerlTk, was developed to spot failures in the log files. It was built primarily to narrow the search for synchronous events ("hickups") across the nodes to a few useful lines per node instead of a full log file of several megabytes each. It is also used to navigate through these log files and other failure reports, and as a presenter for the monitoring of the whole system. Another presents each node's activities in a parallel manner to help detect situations where resource demands by one node affect the activities on others. These tools have contributed to the understanding of several problems encountered during this system's development.

Presentation:  Adobe Acrobat pdf Short Paper:  Adobe Acrobat pdf 

