CS 268 Computer Networks

Tuesday, December 2, 2008

Power Laws again

I just read that wars are also scale-free: the distribution of number of deaths in a conflict follows a power law with coefficient 2.5.

This result was found by Neil Johnson (Miami). http://www.dradio.de/dlf/sendungen/forschak/885102/

Monday, November 24, 2008

MapReduce Again!

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica, "Improving MapReduce Performance in Hereogenous Environments"

Intro

Hadoop is an open-source implementation of Map Reduce (and distributed filesystem?)
new deployment: virtualized environments, datacenters with several generations of hardware display heterogenity
want to optimize scheduler to finish fastest without wasting unnecceary resources

Scheduling the next empty task slot

Priority 1: failed tasks
Priority 2: non-running tasks
Priority 3: Speculative execution

Speculative execution, the native Hadoop way

Speculative execution depends on progress score [0,1] devided equally between copy, sort and reduce phase.
process is marked straggler if its progress <>
a task of a straggler is given to another machine (but at most one speculative execution) based on data locality

(underlying assumptions see 2.2)

New speculative execution: LATE=Longest Approximate Time to End

Idea: speculatively execute the task that will finish fathest into the future (greatest opportunity to save time)
different methods for estimating time left: basic is linear extrapolation from progress score
speculative tasks scheduled on fast nodes (through threshold how much work a node already has performed)
rank tasks by time finished, schedule speculative task if it is likely to finish before other node
refinements to account for costs of speculative execution: SpeculativeCap on number of concurrent speculative tasks and SlowTaskThreshold

Result: LATE reduces running time by a quarter to a half

Policy-aware Switching Layer for Data Centers

Dilip Antony Joseph, Arsalan Tavakoli, Ion Stoica, "A Policy-aware Switching Layer for Data Centers"

Ideas

shares some ideas with "Middleboxes no longer considered harmful": traffic can be directed to traverse middleboxes without the middlebox being on the physical path
but: differs because it is not introducing new properties to IP, but instead replaces Layer II by PLayer
usual approach: put middleboxes on the physical path, hard to configure, possibly multiple physical paths between two nodes, different traffic supposed to go through different middleboxes
new approach: among Layer II switches, add (policy-aware) pswitches which route the packets through the middleboxes on Layer II based on policy specification

Goals

Correctness: Traffic should traverse middleboxes in the sequence specified by the nework administrator
Flexibility: sequence of middleboxes should be easily reconfigured
Efficiency: traffic should not traverse unnecessary middleboxes

Principles

Separating policy from (physical) reachability
taking middleboxes off the physical network data path
policies: [Start,(5-tuple traffic selector)] -> sequence
translated into rules ([Previous Hop, Traffic Selector]: Next Hop) which is used by pswitch
policy schemas (versioned) are created by administrator and pushed onto the pswitches

Mechanisms

decoupling switch core (can use traditional core, usual techniques to learn MAX addresses and build spanning trees) from policy core (where the new rules are applied) with additional failure detection
encapsule Ethernet frames in Ethernet-II frames (which contain policy version): the algorithm to build the spanning tree needs another Source-MAC (a more physical one) than the FireWall (a more logical one)
SrcMacRewrite to decapsulate Ethernet-II frames before entering (legacy) middlebox
in some situations pswitches require per-flow state

Other

implementation in Click, results
detailed discussions on various topologies
formal analysis

I like this more than the "Middleboxes no longer considered harmful" because

it integrates better into exisiting layerII structures instead of adding yet another identifier
in "Middleboxes no longer considered harmful" nodes had to be configured to use the off the physical path FireWall, it was unclear how an IP packet destined for a computer behind a off-the physical path Firewall will actually be hindered from reaching the computer. In this paper this is clear.

I got a bit confused because MAC addresses are used in two different ways, onc address is used for forwarding between pswitches, but another can be used when the packet is handed to a middlebox. I think the paper should have given these MACs different names, if possible.

Monday, November 17, 2008

Internet Data Transfer

Nirja Tolia, Michael Kaminsky, David G. Andersen, Swapnil Patil, "An Architecture for Internet Data Transfer"

Principles

separating content negotiation from data transfer (motivation: 70% of Internet traffic is bulk transfer)
common interface between various applications and transfer service architecture, application developer does not need to reinvent transfer service for each new application
Figure 3 explains how the application works in the DOT framework. The application calls the GTC for the data transfer

Data transfer methods can even include physically moving hard disks. One benefit is that the same data can be cached across different applications like Web and electronic email.

Delay Tolerant Networks

Kevin Fall, "A Delay-Tolerant Network Architecture for Challenged Internets"

This paper is abstract. It suggests a overlay network which is running on networks different from the usual Internet and hence providing a very different service model. But there is no implementation (except one mentioned in one sentence in the conclusion) and no results section, no new algorithms.
I wonder whether the picture in Figure 1 really is a typical example: are there really applications where challenged networks are used for transit, or aren't most applications (e.g. sensor networks, satellite connections, ...) just having challenged networks at one or both ends?
And regarding the US Postal System as model for challenged Internets: looking back at my experiences so far, I kind of doubt whether this is a good idea...

Targeted networks

Terrestial Mobile Networks
Excotic Media Networks (e.g. deep space, acoustic links)
military Ad-Hoc Networks
Sensor Networks

Violated Assumptions

E2E path exist, no disconnection
max RTT not excessive, reasonable queuing time
drop probability <<>

Possibilities to deal with the violated assumptions

fix link-layer
attach challenged networks at the edge of Internet through proxy, e.g. sensor networks, protocol boosters (e.g. HTTP proxy for satellite connections)
electronic mail (mostly reliable, likely failure notification)
DTN

New principles of DTN

Message-switching (instead of packets)
Regions with DTN gateways inbetween
name includes region and entity name within that region
several services (motivated by USPS)
path selection and scheduling, trying to predict when the next contact to a certain region takes place
custody transfer and reliability, e.g. if a message is sent from one node to another node, does the old node need to keep a copy until confirmation the message was delivered
convergence layers
time synchronization
congestion and flow control

Internet Measurements

Vern Paxson, "End-to-End Internet Packet Dynamics"

I found this paper hard to read, not because it is badly written, but because it discusses the measurements and results in extraordinary depth, discussing the measured effect, coming up with potential explanations, possible effects on higher layers and even suggestions for implementations. The lecture on Thursday luckily turned my frustration when reading the paper in admiration for the pioneering role Paxson played in introducing internet measurement methodology and the enormous body of results he produced.

Methodology

large-scale experiment: 37 sites, N to N bulk transfer, done at Dec 1994 and 1995
TCP bulk transfer (versus ICMP packet probing): 1. this is how most Internet traffic really looks like, 2. TCP adapts to avoid unduly loading the network
TCP measurements intertwine network and transport, tcpanaly separates these behaviors, TCP implementation specific (Vegas versus Reno, Windows NT versus Linux), tcpanaly can not recover time series analysis

Measurements

Pathologies: Reordering (on certain paths as high as 36%, due to route flaps), Replication (rare) and Corruption (1 in 5000 packets)
Bottleneck bandwidth (difficult to estimate, basic idea: send two consecutive packets, at bottleneck bandwidth, the second packet has to wait for the transmission of the first packet, limited by time resolution, refined through bunches)
packet loss (distinguishes between data and ack loss, loss at loaded and unloaded links) and effect on TCP (causes of a redundant retransmission: unavoidable, coarse feedback (no SACK), bad RTO)
packet delay (due to queing/bandwidth, but even compression observed)

Conclusion

Justification of measurement (TCP) methodology over other measurement techniques
wide variety of path properties
underlying assumptions about the network often broken
robust TCP implementations can work non the less

Thursday, November 13, 2008

X-Trace

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica, "X-Trace: A pervasive Network Tracing Framework"

Motivation

current diagnostic tools limited to one particular proctocol, e.g. traceroute
need for comprehensive view of the system's behavior
complex systems: e.g. wikipedia has different sites, web caches, DNS round-robin, load balancers, web servers, database servers (and memcached)
tracing across different administrative domains needed

Ideas and design principles

integrated tracing framework
network protocols modified to propagate X-Trace metadata
works inter-layer
works inter-Administrative Domains
decouples client of application and recipient of tracing data (Design principle 3), destination part of the X-Trace metadata
trace initiated by inserting X-Trace metadata by user application or network operator
trace identified by task identifier
X-Trace data send to report server (can be client application or delegated server)
X-Trace constructs task tree offline, two axis: one across "layers" (an event causes another event in lower layer), one across "time" (an event causes another in the same layer), each node in the task tree has an ID, children link to their parents
Design principle 1: trace request are sent in-band
Design principle 2: trace data are sent out-of-band
ASCII report format
report library, report collection thorugh e.g. Postgres
visualization of task tree

Deployment

API for application has pushNext() and pushDown() to propagate X-trace MetaData across the two axis, device reports information accessible at its own layer, can include additional information like load
gradual deployment: for legacy clients, devices in the network can add X-Trace metadata
retrofitting X-Trace into exisiting applications faces difficulties: change to various protocols (IP options, TCP, HTTP headers, SQL), partial deployment impairs ability to trace parts of the network, lost trace reports can be interpreted as false positives
certain request topologies cannot be captured, e.g. requests spreads through the network and rendezvous at a node
unique() function returning identifier for task tree not specified in paper

Uses and Experiences

low performance overhead
Web request and recursive DNS queries
Web hosting site (LAMP), user could intiate traces through JavaScript/PHP library
overlay network
Tunnels, ISP connectivity

I really liked the framework this paper suggests. I think it is very useful. Though there has been a lot of experience, scalabe websites are still non-trivial to setup, still require some manual work to tune and integrate caches, and work across a lot of different layers as mentioned in the introduction. Some difficulties not mentioned in the introduction are: there are even more caching layers like SQL query cache and memcache, relational databases don't scale and huge websites shard the data across multiple machines, Brewer's theme on performance versus consistency (a website is not truely transactional, just enough so that the users don't preceive it as bad, but when a user sees it, it is hard to track down). This paper introduces the debugging tool I am aware of which is addressing all these things together.
The only thing I would add to this framework is the ability to send encrypted X-Trace data.