Nimbus

Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud: Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, Philip Levis; In ACM Transactions on Graphics (TOG), 37, 2, Article 24, June 2018, Presented at SIGGRAPH 2018

Abstract: Distributing a simulation across many machines can drastically speed up computations and increase detail. The computing cloud provides tremendous computing resources, but weak service guarantees force programs to manage significant system complexity: nodes, networks, and storage occasionally perform poorly or fail. We describe Nimbus, a system that automatically distributes grid-based and hybrid simulations across cloud computing nodes. The main simulation loop is sequential code and launches distributed computations across many cores. The simulation on each core runs as if it is stand-alone: Nimbus automatically stitches these simulations into a single, larger one. To do this efficiently, Nimbus introduces a four-layer data model that translates between the contiguous, geometric objects used by simulation libraries and the replicated, fine-grained objects managed by its underlying cloud computing runtime. Using PhysBAM particle level set fluid simulations, we demonstrate that Nimbus can run higher detail simulations faster, distribute simulations on up to 512 cores, and run enormous simulations (1024³ cells). Nimbus automatically manages these distributed simulations, balancing load across nodes and recovering from failures. Implementations of PhysBAM water and smoke simulations as well as an open source heat-diffusion simulation show that Nimbus is general and can support complex simulations. Nimbus can be downloaded from https://nimbus.stanford.edu.

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics: Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis; In proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC '17)

Abstract: Control planes of cloud frameworks trade off between scheduling granularity and performance. Centralized systems schedule at task granularity, but only schedule a few thousand tasks per second. Distributed systems schedule hundreds of thousands of tasks per second but changing the schedule is costly. We present execution templates, a control plane abstraction that can schedule hundreds of thousands of tasks per second while supporting fine-grained, per-task scheduling decisions. Execution templates leverage a program’s repetitive control flow to cache blocks of frequently-executed tasks. Executing a task in a template requires sending a single message. Large-scale scheduling changes install new templates, while small changes apply edits to existing templates. Evaluations of execution templates in Nimbus, a data analytics framework, find that they provide the fine-grained scheduling flexibility of centralized control planes while matching the strong scaling of distributed ones. Execution templates support complex, real-world applications, such as a fluid simulation with a triply nested loop and data dependent branches.

Scalable, Fast Cloud Computing with Execution Templates: Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis; Poster presentation at 2016 USENIX Symposium on Operating Systems Design and Implementation (OSDI'16)

Distributed Graphical Simulation in the Cloud: Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, Philip Levis; arXiv:1606.01966 [cs.DC], 2016

Abstract: Graphical simulations are a cornerstone of modern media and films. But existing software packages are designed to run on HPC nodes, and perform poorly in the computing cloud. These simulations have complex data access patterns over complex data structures, and mutate data arbitrarily, and so are a poor fit for existing cloud computing systems. We describe a software architecture for running graphical simulations in the cloud that decouples control logic, computations and data exchanges. This allows a central controller to balance load by redistributing computations, and recover from failures. Evaluations show that the architecture can run existing, state-of-the-art simulations in the presence of stragglers and failures, thereby enabling this large class of applications to use the computing cloud for the first time.

Scalable, Fast Cloud Computing with Execution Templates: Omid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis; arXiv:1606.01972 [cs.DC], 2016

Abstract: Large scale cloud data analytics applications are often CPU bound. Most of these cycles are wasted: benchmarks written in C++ run 10-51 times faster than frameworks such as Naiad and Spark. However, calling faster implementations from those frameworks only sees moderate (4-5x) speedups because their control planes cannot schedule work fast enough. This paper presents execution templates, a control plane abstraction for CPU-bound cloud applications, such as machine learning. Execution templates leverage highly repetitive control flow to cache scheduling decisions as templates. Rather than reschedule hundreds of thousands of tasks on every loop execution, nodes instantiate these templates. A controller's template specifies the execution across all worker nodes, which it partitions into per-worker templates. To ensure that templates execute correctly, controllers dynamically patch templates to match program control flow. We have implemented execution templates in Nimbus, a C++ cloud computing framework. Running in Nimbus, analytics benchmarks can run 16-43 times faster than in Naiad and Spark. Nimbus's control plane can scale out to run these faster benchmarks on up to 100 nodes (800 cores).

Papers