Last update: Apr 1, 2000
Speaker: Antonia GhiselliCondor is a software project to support High Throughput Computing in a large collection of computing resources, developed by the Computer Science Institute of the Wisconsin University. In collaboration with the Condor team, a Condor Pool on Wide Area network has been deployed as general purpose computing resource for INFN.
The characteristics of the INFN WAN Condor pool have been defined as result of a experimental phase. The phase objectives were to identify and address specific INFN requirements: suitability for INFN computing, policy and rules for execution machine access, network aware checkpointing.
The article describes how these requirements have been satisfied through a specific sub-pool and checkpoint domain design. In particular the network aware checkpointing mechanism allows to optimize checkpoint function for large jobs and to limit and control network traffic.
A network monitoring system evaluates link bandwidth between checkpoint servers and execution machines and updates the bandwidth parameter on each execution machines. This mechanism allows a dynamic and reliable association between execution machine and checkpoint server according to the job network demand.
The present Condor WAN pool consists of 200 machines, distributed in 20 sites, connected to the national research network Garr-B, and 6 checkpoint domains. It is foreseen that the number of machines will grow up to about 1000, with one sub-pool and one checkpoint domain per site.
|| | Home | Bulletins | Committees | Scientific Program | Docs by topics | Social Event | Conference Location | Secretariat |