CS 268 Computer Networks: X-Trace

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica, "X-Trace: A pervasive Network Tracing Framework"

Motivation

current diagnostic tools limited to one particular proctocol, e.g. traceroute
need for comprehensive view of the system's behavior
complex systems: e.g. wikipedia has different sites, web caches, DNS round-robin, load balancers, web servers, database servers (and memcached)
tracing across different administrative domains needed

Ideas and design principles

integrated tracing framework
network protocols modified to propagate X-Trace metadata
works inter-layer
works inter-Administrative Domains
decouples client of application and recipient of tracing data (Design principle 3), destination part of the X-Trace metadata
trace initiated by inserting X-Trace metadata by user application or network operator
trace identified by task identifier
X-Trace data send to report server (can be client application or delegated server)
X-Trace constructs task tree offline, two axis: one across "layers" (an event causes another event in lower layer), one across "time" (an event causes another in the same layer), each node in the task tree has an ID, children link to their parents
Design principle 1: trace request are sent in-band
Design principle 2: trace data are sent out-of-band
ASCII report format
report library, report collection thorugh e.g. Postgres
visualization of task tree

Deployment

API for application has pushNext() and pushDown() to propagate X-trace MetaData across the two axis, device reports information accessible at its own layer, can include additional information like load
gradual deployment: for legacy clients, devices in the network can add X-Trace metadata
retrofitting X-Trace into exisiting applications faces difficulties: change to various protocols (IP options, TCP, HTTP headers, SQL), partial deployment impairs ability to trace parts of the network, lost trace reports can be interpreted as false positives
certain request topologies cannot be captured, e.g. requests spreads through the network and rendezvous at a node
unique() function returning identifier for task tree not specified in paper

Uses and Experiences

low performance overhead
Web request and recursive DNS queries
Web hosting site (LAMP), user could intiate traces through JavaScript/PHP library
overlay network
Tunnels, ISP connectivity

I really liked the framework this paper suggests. I think it is very useful. Though there has been a lot of experience, scalabe websites are still non-trivial to setup, still require some manual work to tune and integrate caches, and work across a lot of different layers as mentioned in the introduction. Some difficulties not mentioned in the introduction are: there are even more caching layers like SQL query cache and memcache, relational databases don't scale and huge websites shard the data across multiple machines, Brewer's theme on performance versus consistency (a website is not truely transactional, just enough so that the users don't preceive it as bad, but when a user sees it, it is hard to track down). This paper introduces the debugging tool I am aware of which is addressing all these things together.
The only thing I would add to this framework is the ability to send encrypted X-Trace data.

CS 268 Computer Networks

Thursday, November 13, 2008

X-Trace

No comments:

Related Sites

Blog Archive

About Me