Intro
- Hadoop is an open-source implementation of Map Reduce (and distributed filesystem?)
- new deployment: virtualized environments, datacenters with several generations of hardware display heterogenity
- want to optimize scheduler to finish fastest without wasting unnecceary resources
- Priority 1: failed tasks
- Priority 2: non-running tasks
- Priority 3: Speculative execution
- Speculative execution depends on progress score [0,1] devided equally between copy, sort and reduce phase.
- process is marked straggler if its progress <>
- a task of a straggler is given to another machine (but at most one speculative execution) based on data locality
New speculative execution: LATE=Longest Approximate Time to End
- Idea: speculatively execute the task that will finish fathest into the future (greatest opportunity to save time)
- different methods for estimating time left: basic is linear extrapolation from progress score
- speculative tasks scheduled on fast nodes (through threshold how much work a node already has performed)
- rank tasks by time finished, schedule speculative task if it is likely to finish before other node
- refinements to account for costs of speculative execution: SpeculativeCap on number of concurrent speculative tasks and SlowTaskThreshold