Wednesday, October 29, 2008

P2P DHT Intro

Hari balakrishnan, Frank Kaashoek, David Karger, Robert Morris, Ion Stoica, "Looking up Data in P2P Systems"

General
  • A really nice introduction to P2P systems. It mostly focuses on how to look up data in a decentralized and scalable manner without discussing how to build a system like file-sharing on top of it.
  • P2P has other aspects than illegal music sharing to it
  • Some more words on resilence and definition of correctness would be nice: network will fail, any key could be manipulated or only node's share of keys will be lost or manipulated in case of a failing or malicious node
  • the paper mixed desired features (like scalability) of DHT and mechanisms (like distance function)
Desired Features of DHT
  • hash function: name -> hash (in "ID space"), DHT: hash -> node, node stores value for key, application decides what value means (value could be reference where file is stored)
  • no centralized servers, hierarchy
  • load-balanced key lookup
  • "routing" information at each hop scales logarithmically
  • "routing" hops scale logarithmically
  • adaptive to nodes concurrently leaving and joining
  • low overhead
  • correctness while "routing" tables are still converging, but performance penalty allowed
  • fault tolerance
  • advanced: proximity routing
  • advanced: isolating the impact of malicious nodes
  • advanced: queries like keyword search
How are DHT's different or similar from normal routing?
  • routing forwards packets until reach destination, DHT's redirect key lookups to a different node until the right node is found
  • the table of a DHT maps ID space to nodes, the routing table at A maps space of nodes to nodes connected to A
  • the table of a DHT can map an ID to an arbitray node, a routing table is constrained to connected nodes
Mechanisms
  • All DHT algorithms are inspired by putting structure on the ID space and equipping with a distance function
  • chord: one-dimensional ring, distance is numeric difference (not absolute value, i.e. unidirectional), "routing" table points to nodes at distance 1, 2, 4, 8, 16, 32, 64, 128, ...
  • tree-like routing: nodes are leaves of (binary) tree, distance is prefix-length with numerical difference as tie-breaker, "routing" table at node has an entry for each prefix-length pointing to a random node who is not in that prefix
  • n-dimensional space: each node covers a hypercube of ID's, "routing" table contains neighboring hypercubes, distance probably=Euclidean distance in n-space (was not explained in paper)
  • algorithms maintaining keys, routing information for node joins, leaves, load-balancing. Algorithms maintain invariants for correctness, converge to efficient routing tables for performance

No comments: