Wednesday, May 25, 2005

Kademlia

Kademlia - Wikipedia, the free encyclopedia: "Kademlia is a P2P overlay protocol designed for decentralized peer to peer computer networks. It specifies the structure of the network, regulates communication between nodes and how the exchange of information has to take place. Kademlia nodes communicate among themselves using the transport protocol UDP (see OSI model). Kademlia nodes store data by implementing a Distributed Hash Table (DHT). Over an existing LAN/WAN (like the Internet) a new, virtual network, is created in which each network node is identified by a number ('Node ID'). This number serves not only as its identification, but the Kademlia algorithm uses it for further purposes.

A node that would like to join the net must first go through a bootstrap process. In this phase, the node needs to know the IP address of another node (obtained from the user, or from a stored list) that is already participating in the Kademlia network. If the bootstrapping node has not yet participated in the network, it computes a random ID number that is not already assigned to any other node. It uses this ID until leaving the network.

The Kademlia algorithm is based on the calculation of the 'distance' between two nodes. This distance is computed as the exclusive or of the two node IDs, taking the result as an integer number.

This 'distance' does not have anything to do with geographical conditions, but designates the distance within the ID range. Thus it can and does happen that, for example, a node from Germany and one from Australia are 'neighbours'.

Information within Kademlia is stored in so called 'values', every value being attached to a 'key'.

When searching for some key, the algorithm explores the network in several steps, each step approaching closer to the searched-for key, until the contacted node returns the value, or no more closer nodes are found. The number of nodes contacted during the search is only marginally dependent on the size of the network: If the number of participants in the net doubles in number, then a user's node must query only one more node per search, not twice as many.

Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these 'holes'."

No comments: