Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud
Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, Philip Levis
In ACM Transactions on Graphics (TOG), 37, 2, Article 24, June 2018, Presented at SIGGRAPH 2018
Abstract
Distributing a simulation across many machines can drastically speed
up computations and increase detail. The computing cloud provides
tremendous computing resources, but weak service guarantees
force programs to manage significant system complexity: nodes,
networks, and storage occasionally perform poorly or fail.
We describe Nimbus, a system that automatically distributes
grid-based and hybrid simulations across cloud computing
nodes. The main simulation loop is sequential code
and launches
distributed computations across many cores. The simulation on each
core runs as if it is stand-alone: Nimbus automatically
stitches these simulations into a single, larger
one. To do this efficiently, Nimbus introduces a four-layer
data model that translates between the contiguous, geometric objects
used by simulation libraries and the replicated, fine-grained objects
managed by its underlying cloud computing runtime.
Using PhysBAM particle level set fluid simulations, we
demonstrate that Nimbus can run higher detail simulations faster,
distribute simulations on up to 512 cores, and run enormous
simulations (10243 cells). Nimbus automatically manages
these distributed simulations, balancing load across nodes and recovering
from failures. Implementations of PhysBAM water and smoke simulations
as well as an open source heat-diffusion simulation show that Nimbus is
general and can support complex simulations.
Nimbus can be downloaded from https://nimbus.stanford.edu.
Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics
Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
In proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC '17)
Abstract
Control planes of cloud frameworks trade off between scheduling
granularity and performance. Centralized systems schedule at task
granularity, but only schedule a few thousand tasks per second.
Distributed systems schedule hundreds of thousands of tasks per
second but changing the schedule is costly.
We present execution templates, a control plane abstraction that can
schedule hundreds of thousands of tasks per second while supporting
fine-grained, per-task scheduling decisions. Execution templates
leverage a program’s repetitive control flow to cache blocks of
frequently-executed tasks. Executing a task in a template requires
sending a single message. Large-scale scheduling changes install new
templates, while small changes apply edits to existing templates.
Evaluations of execution templates in Nimbus, a data analytics
framework, find that they provide the fine-grained scheduling
flexibility of centralized control planes while matching the strong
scaling of distributed ones. Execution templates support complex,
real-world applications, such as a fluid simulation with a triply
nested loop and data dependent branches.
Presentations
Scalable, Fast Cloud Computing with Execution Templates
Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
Poster presentation at 2016 USENIX Symposium on Operating Systems Design and Implementation (OSDI'16)
Technical Reports
Distributed Graphical Simulation in the Cloud
Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, Philip Levis
arXiv:1606.01966 [cs.DC], 2016
Abstract
Graphical simulations are a cornerstone of modern media and
films. But existing software packages are designed to run on HPC
nodes, and perform poorly in the computing cloud. These
simulations have complex data access patterns over complex data
structures, and mutate data arbitrarily, and so are a poor fit for
existing cloud computing systems. We describe a software
architecture for running graphical simulations in the cloud that
decouples control logic, computations and data exchanges. This
allows a central controller to balance load by redistributing
computations, and recover from failures. Evaluations show that the
architecture can run existing, state-of-the-art simulations in the
presence of stragglers and failures, thereby enabling this large
class of applications to use the computing cloud for the first
time.
Scalable, Fast Cloud Computing with Execution Templates
Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
arXiv:1606.01972 [cs.DC], 2016
Abstract
Large scale cloud data analytics applications are often CPU
bound. Most of these cycles are wasted: benchmarks written in C++ run
10-51 times faster than frameworks such as Naiad and Spark. However,
calling faster implementations from those frameworks only sees
moderate (4-5x) speedups because their control planes cannot schedule
work fast enough.
This paper presents execution templates, a control plane abstraction
for CPU-bound cloud applications, such as machine learning. Execution
templates leverage highly repetitive control flow to cache scheduling
decisions as templates. Rather than reschedule hundreds of
thousands of tasks on every loop execution, nodes instantiate these
templates. A controller's template specifies the execution across all
worker nodes, which it partitions into per-worker templates. To ensure
that templates execute correctly, controllers dynamically patch
templates to match program control flow. We have implemented
execution templates in Nimbus, a C++ cloud computing framework.
Running in Nimbus, analytics benchmarks can run 16-43 times faster
than in Naiad and Spark. Nimbus's control plane can scale out to run
these faster benchmarks on up to 100 nodes (800 cores).