Monday, June 20, 2005

Improving scalability of submission node in Condor by using BitTorrent peer-to-peer downloading protocol

DSL Projects 2005 by Mark Silberstein: "Improving scalability of submission node in Condor by using BitTorrent peer-to-peer downloading protocol.

Condor is a high-throughput computing system, which allows running distributed applications on thousands of computers, harvesting their idle cycles (see www.condorproject.org for more details).

Jobs are submitted for execution by Condor via submission nodes, called schedds. It is a common case that multiple jobs are invoked with the same (sometimes very large) input files and the same executables, but different input parameters. The input files should be transferred to the remote computer for remote execution to succeed. This transfer can impose a significant load on the schedd, making it a bottleneck for a file transfer.

BitTorrent ( bittorrent.com ) is a protocol which allows to drastically reduce the load on an HTTP server when downloading large files from it. It achieves that by utilizing the fact that multiple simultaneous downloaders can collaborate by exchanging the parts that they have and reduce their bandwidth to the server.

This project will attempt to apply the protocol and its existing implementation to the schedd.

The evaluation will be performed on the Condor pool in the Technion (100 computers) and in the University of Wisconsin, Madison ( containing thousands of computers )

Requirements: CDP, Introduction to Networks, Distributed Systems

Advantage: Knowledge of Python, experience in web technologies, C

Duration: Project is a one-semester project with an option to continue in summer"

No comments: