CS 268 Computer Networks: P2P DHT Intro

Hari balakrishnan, Frank Kaashoek, David Karger, Robert Morris, Ion Stoica, "Looking up Data in P2P Systems"

General

A really nice introduction to P2P systems. It mostly focuses on how to look up data in a decentralized and scalable manner without discussing how to build a system like file-sharing on top of it.
P2P has other aspects than illegal music sharing to it
Some more words on resilence and definition of correctness would be nice: network will fail, any key could be manipulated or only node's share of keys will be lost or manipulated in case of a failing or malicious node
the paper mixed desired features (like scalability) of DHT and mechanisms (like distance function)

Desired Features of DHT

hash function: name -> hash (in "ID space"), DHT: hash -> node, node stores value for key, application decides what value means (value could be reference where file is stored)
no centralized servers, hierarchy
load-balanced key lookup
"routing" information at each hop scales logarithmically
"routing" hops scale logarithmically
adaptive to nodes concurrently leaving and joining
low overhead
correctness while "routing" tables are still converging, but performance penalty allowed
fault tolerance
advanced: proximity routing
advanced: isolating the impact of malicious nodes
advanced: queries like keyword search

How are DHT's different or similar from normal routing?

routing forwards packets until reach destination, DHT's redirect key lookups to a different node until the right node is found
the table of a DHT maps ID space to nodes, the routing table at A maps space of nodes to nodes connected to A
the table of a DHT can map an ID to an arbitray node, a routing table is constrained to connected nodes

Mechanisms

All DHT algorithms are inspired by putting structure on the ID space and equipping with a distance function
chord: one-dimensional ring, distance is numeric difference (not absolute value, i.e. unidirectional), "routing" table points to nodes at distance 1, 2, 4, 8, 16, 32, 64, 128, ...
tree-like routing: nodes are leaves of (binary) tree, distance is prefix-length with numerical difference as tie-breaker, "routing" table at node has an entry for each prefix-length pointing to a random node who is not in that prefix
n-dimensional space: each node covers a hypercube of ID's, "routing" table contains neighboring hypercubes, distance probably=Euclidean distance in n-space (was not explained in paper)
algorithms maintaining keys, routing information for node joins, leaves, load-balancing. Algorithms maintain invariants for correctness, converge to efficient routing tables for performance

CS 268 Computer Networks