CS 268 Computer Networks

Thursday, October 30, 2008

DHT versus DNS

Today was a great class discussion about Chord, DNS and co. Here is my list of questions and thoughts, I never got to ask in class, some got at least partially answered during the discussion.

Pro DHT:

google does not use chord and uses more traditional hierarchical systems, but should we trust all of our data to single companies as hierarchical systems do?
Authority: Can we trust the data if we replaced hierarchical DNS by Chord? Yes, if we have a hierarchical system of digital signatures (data and public keys signed by the root, country registrars and domain, subdomain owners), bootstrapped by root signatures which correspond to us needing to know the IP addresses of root servers.
DHTs like chord do (theoretically) a beautiful job on load balancing and scalability (compare to BGP where each router needs the 200000 prefix entries routing information whereas chord only needs 31 fingers for 2^32 IDs)

Pro DNS:

Malicious nodes can still spoof the non-existence of a certain domain in DHT.
DHT provides no geographic locality, doesn't allow for techniques like anycast
DNS recursers can be placed at strategic places, e.g. my provides can put a DNS recursers between its clients and and the DNS name servers, so DNS querries would follow how packets would flow instead of detouring which would mean latencies
DNS has designated servers
DNS caching is very efficient because 99% of all querries are probably for the same 1000 domains. This also plays to the importance of putting htem at good strategic places, i.e. each ISP should put it its own cache close to its edge routers

Some more words on Active Networks

It seems to me that Active Networks can only be a compromise between security and usability.
In the paper the API was so restricted that the only thing an active node could do was really only forward packets (maybe modify the header a little bit). Using code for this decision is superflous, source routing would give the client actually the same control.
This would be more interesting if active nodes could actually in some way process data like aggregate information, so maybe Active Networks are interesting for Sensor-Networks like applications.

Wednesday, October 29, 2008

Chord and tree-like diagrams

I made a picture to show how tree-like DHT forward queries.

and how chord forward queries

The Beauty of Chord

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balikrishnan, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications"

Chapter 3 motivates applications.

Underlying geometry

ID space is a ring
each ID in ID space is mapped to a node. A node corresponds to a block of consecutive IDs. For redundancy, these blocks can overlap.
a node maintains a "routing" table (called fingers) which has an entry to reach the node which is responsible for the ID with distance 1, 2, 4, 8, 16, 32, 64, 128, ...
for handling node failures, node also maintains a table of its immeadite k successor in the ID space
this really is a small-world network: the entries to its immeadiate successors makes it highly clustered, the routing table entries with long distances make the path length short (log size of ID space)

Mechanisms

Node (say A) joins: it is assigned a block in ID space. It computes the IDs with distance 1, 2, 4, 8 ,16, 32, ... from its own block. It uses the existing routing tables of other nodes to query which nodes are responsible for these IDs and constructs a routing table. It also traces back which routing tables of nodes have to be changed (at most log n, querying each costs log n) and updates the existing nodes. Last it has to move data which is left to the application.
every node periodically runs stabilization to correct routing tables to account for new nodes

The routing table could list several nodes per n where the distances in ID space are less but close to 2^n. That way lookup failures could be reduced if nodes fail.

P2P DHT Intro

Hari balakrishnan, Frank Kaashoek, David Karger, Robert Morris, Ion Stoica, "Looking up Data in P2P Systems"

General

A really nice introduction to P2P systems. It mostly focuses on how to look up data in a decentralized and scalable manner without discussing how to build a system like file-sharing on top of it.
P2P has other aspects than illegal music sharing to it
Some more words on resilence and definition of correctness would be nice: network will fail, any key could be manipulated or only node's share of keys will be lost or manipulated in case of a failing or malicious node
the paper mixed desired features (like scalability) of DHT and mechanisms (like distance function)

Desired Features of DHT

hash function: name -> hash (in "ID space"), DHT: hash -> node, node stores value for key, application decides what value means (value could be reference where file is stored)
no centralized servers, hierarchy
load-balanced key lookup
"routing" information at each hop scales logarithmically
"routing" hops scale logarithmically
adaptive to nodes concurrently leaving and joining
low overhead
correctness while "routing" tables are still converging, but performance penalty allowed
fault tolerance
advanced: proximity routing
advanced: isolating the impact of malicious nodes
advanced: queries like keyword search

How are DHT's different or similar from normal routing?

routing forwards packets until reach destination, DHT's redirect key lookups to a different node until the right node is found
the table of a DHT maps ID space to nodes, the routing table at A maps space of nodes to nodes connected to A
the table of a DHT can map an ID to an arbitray node, a routing table is constrained to connected nodes

Mechanisms

All DHT algorithms are inspired by putting structure on the ID space and equipping with a distance function
chord: one-dimensional ring, distance is numeric difference (not absolute value, i.e. unidirectional), "routing" table points to nodes at distance 1, 2, 4, 8, 16, 32, 64, 128, ...
tree-like routing: nodes are leaves of (binary) tree, distance is prefix-length with numerical difference as tie-breaker, "routing" table at node has an entry for each prefix-length pointing to a random node who is not in that prefix
n-dimensional space: each node covers a hypercube of ID's, "routing" table contains neighboring hypercubes, distance probably=Euclidean distance in n-space (was not explained in paper)
algorithms maintaining keys, routing information for node joins, leaves, load-balancing. Algorithms maintain invariants for correctness, converge to efficient routing tables for performance

Monday, October 27, 2008

Are Active Networks just Computer Viruses?

David Wetherall, "Active network vision and reality: lessons from a capsule-based system"

If I think about it: the way capsules propagate through a network is by injecting code into the router software to spread to the next router. So this is really how a worm/virus spreads through a network. There have been implemented several active networks already: code red worm, melissa, happy99.exe, Chernobyl. Some of this research lead to imprisonment eventually, apparently the FBI doesn't like research on Active Networks. There are even organizations fighting Active networks, for example Symantec. And unfortunately, Windows is a little ahead of Linux in terms of Active Network implementations.

Active Network basics

a packet is called a capsule and carries code (perhaps by reference) executed on the router (active node) to process and eventually forward it
benefits: flexible network layer, decoupling services from the underlying infrastructure, enables fast deployment of new features across WANs
concerns: mobile code raises security and performance issues

Characterization of Active networks

capsule model of programmability
accessibility of that model to all users
applications that can be constructed

Mechanisms in the paper

code is carried by reference which is MD5 encrypted hash = "type of service" field, reduces capsule header to be comparable to IP header
code also given a human readable name, distributed through DNS-like directory service
code cached locally
capsules are forwarded as IP packets between conventional routers, processed as capsules by active routers: this allows for a gradual deployment, deployment on heterogenous network, router can be split into layer processing some packets in hardware fast, and some in software slow
code is written in type-safe language (e.g. Java ) or with proof
code is executed in isolation from capsules of different type, in sandbox
code's runtime and resources are limited
code may not change type of service when routing or creating new packets
code must lower TTL
code can only create limited number of new capsules (e.g. limited by total TTL)
but still: code is certified by trusted authority (IETF)
API for capsule's code: query node environment, soft-state storage, routing, possibly loss information
tests run on ANTA kit, in Java, as overlay network on UDP

Findings

code distribution provides insignificant overhead
analysis of protection attempts to answer the following questions: 1. can the node runtime be corrupted by service code? 2. can distributed service code be corrupted or sppofed? 3. Can the state cached be inadvertently manipulated by another service?
analysis of resource management: 1. Can a capsule consume a large, possibly unbounded amout of resource at a single node? 2. Across a large number of nodes? 3. Can an application running on an end-system be used to inject a large possibly unbounded number of capsules into the network?

Possible Applications

Characterization of servies: expressible, compact, fast, incrementally deployable
new services based on network layer state
multicast, reliable multicast support, ECN, PIP, anycast

Comments

The idea of active networks is very interesting, particular regarding the many algorithms and techniques like fair queueing, ECN, XCP, RED, multicast, QoS, anycast which exists but are difficult to deploy on existing infrastructure.
But unfortunately the security and resource concerns restrict the abilities of the capsule's code (e.g. isolation from other flows) to implement some of these algorithms. The list of remaining techniques did not convince me. Anycast works reasonably implemented in BGP and through DNS. Maybe it is the revival of multicast?

RON

David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris, "Resilient Overlay Networks"

Shortcomings of today's routing structure

A quarter of randomly picked connections between two hosts suffers outages of more than half an hour in a three day period
BGP filters and summarizes routing information in the interest of scalability and policy enforcement, delays information to prevent oscillations
BGP deems connection alive if BGP session is still alive, however persisten congestion a link might degrade E2E performance
BGP policies often do not use private peering links to provide failover capabilities

Proposal

Overlay Network of nodes routing packets between them based on RON header
Size: a couple of dozen nodes (cluster maintained through RON membership manager)
Virtual link between each pair of distinct nodes
optimizes different application-specific metrics: API provides explicit way for application to prioritize among throughput, delay, packet loss rate and to define failures
fast failure detection and recovery (<20sec) through constant active and passive monitoring of virtual links
expressive policy routing
Data path: application --conduit API--> entry node --> RON routers --> exit node
link state, hop-by-hop routing
RON routers and RON membership managers communicate through RON network (versus IP)
RON headers have tag identifying flow, keeps flow to a chosen path
hysteresis to avoid route flapping

Results

outages for no longer than 20secs
maximal 30% packet loss rate
in most cases RON reduces latency
in some cases, RON can significantly improve TCP performance