Saturday, November 1, 2008

DNS more contemporary

Jaeyeon Jung, Emil Sit, Hari Balakrishnan, Robert Morris, "DNS Performance and the Effectiveness of Caching"

The paper makes and mentions a lot of interesting measurements and results. There are two prime investigations:
  • overall DNS performance (from a user perspective and from an amount of traffic perspective)
  • the impact of caches and TTL. The motivation comes from the dichotomoy: load-balancing applications respond with queries with short validity limiting the use of cache BUT the scalability of DNS is said to arise (besides the hierarchical organization) from queries answered from cache
Methodology
  • collect Internet traces: DNS packets and TCP SYN/FIN/RST packets
  • trace through 60 second window the process of iterating lookups until the answer is foun
  • track TCP connections associated to a DNS query
  • group clients' IP addresses and simulate a common DNS cache for them
Performance results
  • distribution of types of DNS lookups (mostly A records hostname -> IP address)
  • half of the DNS lookups are associated to a TCP connection
  • DNS query latency has median of 1/10 second but a significant portion takes up to 10s of seconds, distribution of number of referals
  • 70% of the querries do not hit a root/server gTLD (i.e. cached NS improve performance and greatly reduce load on root servers)
  • a successful DNS query needs on average ~1.3 packets
  • unanswered queries (due to e.g. NS records to no longer existing servers) might cause substantially more traffic per query due to loops and retransmissions
Effect of Caching
  • name popularity is Zipf distribution, 10% of names account for 68% of answers + long tail
  • current TTL distributions
  • most caching benefit is achieved with 10-20 clients per cache
  • most caching benefit is achieved for TTL~several minutes for A records of Webservers (I think though that in Figure 13, they should plot the cache miss rate. A hit rate of 97% and 99% sound the same, but mean a three times lower miss rate, but that implies we can serve 3 times as many clients from the same e.g. gTLD server)
  • effect of eliminating A-record caching, per-client caching...
Concluding Remark: "The reasons for the scalability of DNS are due less to the hierarchical design of its name space or good A-record caching (as seems to be widely believed); rather, the cacheability of NS-records efficiently partition the name space and avoid overloading any single name server in the Internet." I think that is a rather provocative statement. If you cache no single IP address, you have to start every query at a root server, so it is not scalable, so how do I have to understand their remark?

I think the numbers in the abstract are fairly useless unless one has read the paper, because only later in the paper one finds the answers to the following questions:
  • it is mentioned what fraction of lookups are unsuccessful: but are these network failures due to overload or dropped packets (which would be bad) or just because a user typed a domain name wrong (for which we expect the lookup to result in no answer)?
  • is the cache miss rate meant for a query destined for a root, gTLD or domain server? E.g. is the conclusion we can draw from the measurements that if we didn't have caches anymore, suddenly the 13 root servers would be just horribly underscaled, or that the name servers for some popular sites would just hit a little more load?
  • The browser also has cache for DNS entries, so is the cache miss rate with respect to each TCP connection the browser makes, or just for every DNS lookup the browser cannot answer itself?
One thing which doesn't make sense to me is the fact that root servers get more lookups than gTLD (TABLE VI).

No comments: