CS 268 Computer Networks: MapReduce Again!

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica, "Improving MapReduce Performance in Hereogenous Environments"

Intro

Hadoop is an open-source implementation of Map Reduce (and distributed filesystem?)
new deployment: virtualized environments, datacenters with several generations of hardware display heterogenity
want to optimize scheduler to finish fastest without wasting unnecceary resources

Scheduling the next empty task slot

Speculative execution, the native Hadoop way

Speculative execution depends on progress score [0,1] devided equally between copy, sort and reduce phase.
process is marked straggler if its progress <>
a task of a straggler is given to another machine (but at most one speculative execution) based on data locality

(underlying assumptions see 2.2)

New speculative execution: LATE=Longest Approximate Time to End

Idea: speculatively execute the task that will finish fathest into the future (greatest opportunity to save time)
different methods for estimating time left: basic is linear extrapolation from progress score
speculative tasks scheduled on fast nodes (through threshold how much work a node already has performed)
rank tasks by time finished, schedule speculative task if it is likely to finish before other node
refinements to account for costs of speculative execution: SpeculativeCap on number of concurrent speculative tasks and SlowTaskThreshold

Result: LATE reduces running time by a quarter to a half

CS 268 Computer Networks