Thursday, October 30, 2008

DHT versus DNS

Today was a great class discussion about Chord, DNS and co. Here is my list of questions and thoughts, I never got to ask in class, some got at least partially answered during the discussion.

Pro DHT:
  • google does not use chord and uses more traditional hierarchical systems, but should we trust all of our data to single companies as hierarchical systems do?
  • Authority: Can we trust the data if we replaced hierarchical DNS by Chord? Yes, if we have a hierarchical system of digital signatures (data and public keys signed by the root, country registrars and domain, subdomain owners), bootstrapped by root signatures which correspond to us needing to know the IP addresses of root servers.
  • DHTs like chord do (theoretically) a beautiful job on load balancing and scalability (compare to BGP where each router needs the 200000 prefix entries routing information whereas chord only needs 31 fingers for 2^32 IDs)
Pro DNS:
  • Malicious nodes can still spoof the non-existence of a certain domain in DHT.
  • DHT provides no geographic locality, doesn't allow for techniques like anycast
  • DNS recursers can be placed at strategic places, e.g. my provides can put a DNS recursers between its clients and and the DNS name servers, so DNS querries would follow how packets would flow instead of detouring which would mean latencies
  • DNS has designated servers
  • DNS caching is very efficient because 99% of all querries are probably for the same 1000 domains. This also plays to the importance of putting htem at good strategic places, i.e. each ISP should put it its own cache close to its edge routers

Some more words on Active Networks

  • It seems to me that Active Networks can only be a compromise between security and usability.
  • In the paper the API was so restricted that the only thing an active node could do was really only forward packets (maybe modify the header a little bit). Using code for this decision is superflous, source routing would give the client actually the same control.
  • This would be more interesting if active nodes could actually in some way process data like aggregate information, so maybe Active Networks are interesting for Sensor-Networks like applications.

Wednesday, October 29, 2008

Chord and tree-like diagrams

I made a picture to show how tree-like DHT forward queries.



and how chord forward queries

The Beauty of Chord

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balikrishnan, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications"

Chapter 3 motivates applications.

Underlying geometry
  • ID space is a ring
  • each ID in ID space is mapped to a node. A node corresponds to a block of consecutive IDs. For redundancy, these blocks can overlap.
  • a node maintains a "routing" table (called fingers) which has an entry to reach the node which is responsible for the ID with distance 1, 2, 4, 8, 16, 32, 64, 128, ...
  • for handling node failures, node also maintains a table of its immeadite k successor in the ID space
  • this really is a small-world network: the entries to its immeadiate successors makes it highly clustered, the routing table entries with long distances make the path length short (log size of ID space)
Mechanisms
  • Node (say A) joins: it is assigned a block in ID space. It computes the IDs with distance 1, 2, 4, 8 ,16, 32, ... from its own block. It uses the existing routing tables of other nodes to query which nodes are responsible for these IDs and constructs a routing table. It also traces back which routing tables of nodes have to be changed (at most log n, querying each costs log n) and updates the existing nodes. Last it has to move data which is left to the application.
  • every node periodically runs stabilization to correct routing tables to account for new nodes
The routing table could list several nodes per n where the distances in ID space are less but close to 2^n. That way lookup failures could be reduced if nodes fail.

P2P DHT Intro

Hari balakrishnan, Frank Kaashoek, David Karger, Robert Morris, Ion Stoica, "Looking up Data in P2P Systems"

General
  • A really nice introduction to P2P systems. It mostly focuses on how to look up data in a decentralized and scalable manner without discussing how to build a system like file-sharing on top of it.
  • P2P has other aspects than illegal music sharing to it
  • Some more words on resilence and definition of correctness would be nice: network will fail, any key could be manipulated or only node's share of keys will be lost or manipulated in case of a failing or malicious node
  • the paper mixed desired features (like scalability) of DHT and mechanisms (like distance function)
Desired Features of DHT
  • hash function: name -> hash (in "ID space"), DHT: hash -> node, node stores value for key, application decides what value means (value could be reference where file is stored)
  • no centralized servers, hierarchy
  • load-balanced key lookup
  • "routing" information at each hop scales logarithmically
  • "routing" hops scale logarithmically
  • adaptive to nodes concurrently leaving and joining
  • low overhead
  • correctness while "routing" tables are still converging, but performance penalty allowed
  • fault tolerance
  • advanced: proximity routing
  • advanced: isolating the impact of malicious nodes
  • advanced: queries like keyword search
How are DHT's different or similar from normal routing?
  • routing forwards packets until reach destination, DHT's redirect key lookups to a different node until the right node is found
  • the table of a DHT maps ID space to nodes, the routing table at A maps space of nodes to nodes connected to A
  • the table of a DHT can map an ID to an arbitray node, a routing table is constrained to connected nodes
Mechanisms
  • All DHT algorithms are inspired by putting structure on the ID space and equipping with a distance function
  • chord: one-dimensional ring, distance is numeric difference (not absolute value, i.e. unidirectional), "routing" table points to nodes at distance 1, 2, 4, 8, 16, 32, 64, 128, ...
  • tree-like routing: nodes are leaves of (binary) tree, distance is prefix-length with numerical difference as tie-breaker, "routing" table at node has an entry for each prefix-length pointing to a random node who is not in that prefix
  • n-dimensional space: each node covers a hypercube of ID's, "routing" table contains neighboring hypercubes, distance probably=Euclidean distance in n-space (was not explained in paper)
  • algorithms maintaining keys, routing information for node joins, leaves, load-balancing. Algorithms maintain invariants for correctness, converge to efficient routing tables for performance

Monday, October 27, 2008

Are Active Networks just Computer Viruses?

David Wetherall, "Active network vision and reality: lessons from a capsule-based system"

If I think about it: the way capsules propagate through a network is by injecting code into the router software to spread to the next router. So this is really how a worm/virus spreads through a network. There have been implemented several active networks already: code red worm, melissa, happy99.exe, Chernobyl. Some of this research lead to imprisonment eventually, apparently the FBI doesn't like research on Active Networks. There are even organizations fighting Active networks, for example Symantec. And unfortunately, Windows is a little ahead of Linux in terms of Active Network implementations.

Active Network basics
  • a packet is called a capsule and carries code (perhaps by reference) executed on the router (active node) to process and eventually forward it
  • benefits: flexible network layer, decoupling services from the underlying infrastructure, enables fast deployment of new features across WANs
  • concerns: mobile code raises security and performance issues
Characterization of Active networks
  • capsule model of programmability
  • accessibility of that model to all users
  • applications that can be constructed
Mechanisms in the paper
  • code is carried by reference which is MD5 encrypted hash = "type of service" field, reduces capsule header to be comparable to IP header
  • code also given a human readable name, distributed through DNS-like directory service
  • code cached locally
  • capsules are forwarded as IP packets between conventional routers, processed as capsules by active routers: this allows for a gradual deployment, deployment on heterogenous network, router can be split into layer processing some packets in hardware fast, and some in software slow
  • code is written in type-safe language (e.g. Java ) or with proof
  • code is executed in isolation from capsules of different type, in sandbox
  • code's runtime and resources are limited
  • code may not change type of service when routing or creating new packets
  • code must lower TTL
  • code can only create limited number of new capsules (e.g. limited by total TTL)
  • but still: code is certified by trusted authority (IETF)
  • API for capsule's code: query node environment, soft-state storage, routing, possibly loss information
  • tests run on ANTA kit, in Java, as overlay network on UDP
Findings
  • code distribution provides insignificant overhead
  • analysis of protection attempts to answer the following questions: 1. can the node runtime be corrupted by service code? 2. can distributed service code be corrupted or sppofed? 3. Can the state cached be inadvertently manipulated by another service?
  • analysis of resource management: 1. Can a capsule consume a large, possibly unbounded amout of resource at a single node? 2. Across a large number of nodes? 3. Can an application running on an end-system be used to inject a large possibly unbounded number of capsules into the network?
Possible Applications
  • Characterization of servies: expressible, compact, fast, incrementally deployable
  • new services based on network layer state
  • multicast, reliable multicast support, ECN, PIP, anycast
Comments
  • The idea of active networks is very interesting, particular regarding the many algorithms and techniques like fair queueing, ECN, XCP, RED, multicast, QoS, anycast which exists but are difficult to deploy on existing infrastructure.
  • But unfortunately the security and resource concerns restrict the abilities of the capsule's code (e.g. isolation from other flows) to implement some of these algorithms. The list of remaining techniques did not convince me. Anycast works reasonably implemented in BGP and through DNS. Maybe it is the revival of multicast?

RON

David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris, "Resilient Overlay Networks"

Shortcomings of today's routing structure
  • A quarter of randomly picked connections between two hosts suffers outages of more than half an hour in a three day period
  • BGP filters and summarizes routing information in the interest of scalability and policy enforcement, delays information to prevent oscillations
  • BGP deems connection alive if BGP session is still alive, however persisten congestion a link might degrade E2E performance
  • BGP policies often do not use private peering links to provide failover capabilities
Proposal
  • Overlay Network of nodes routing packets between them based on RON header
  • Size: a couple of dozen nodes (cluster maintained through RON membership manager)
  • Virtual link between each pair of distinct nodes
  • optimizes different application-specific metrics: API provides explicit way for application to prioritize among throughput, delay, packet loss rate and to define failures
  • fast failure detection and recovery (<20sec) through constant active and passive monitoring of virtual links
  • expressive policy routing
  • Data path: application --conduit API--> entry node --> RON routers --> exit node
  • link state, hop-by-hop routing
  • RON routers and RON membership managers communicate through RON network (versus IP)
  • RON headers have tag identifying flow, keeps flow to a chosen path
  • hysteresis to avoid route flapping
Results
  • outages for no longer than 20secs
  • maximal 30% packet loss rate
  • in most cases RON reduces latency
  • in some cases, RON can significantly improve TCP performance

Comments
  • Exact routing mechanism unclear, looks like hop-by-hop routing, but wouldn't source rouing be better to optimize the application-specific metric, and optimize it only once at the source? (doesn't really matter because most of the time the chosen route has only one RON hop between E2E)
  • What would large-scale deployment of RON mean for the Internet? Will RON further inflate data paths? Will it RON nodes deployed in two peering ASs turn these into ASs defacto transiting traffic?
  • How does RON's forwarding compare to a TCP split connection (split at RON forwarder)?
  • The user experience does contradict the statement that 1/4 of the the routes experience outages of half-an hour in three days. Does this mean that the Internet is really partitioned into Clients and Servers with reliable routes between Clients and Servers, but actually less reliable communication between Clients and Clients? Or is the user experience due to the fact that websites deploy servers across multiple different networks and our querries in fact do not go through the "Inter-"net?