Nimbus is a scalable cloud computing framework for computations with sub-millisecond tasks.
Nimbus's overhead is negligible even at high scales.
Nimbus has a novel control plane abstraction, called Execution Templates, which allows the runtime system handle orders of magnitude higher task rates compared to the best available centralized (Spark) and distributed (Naiad) cloud computing frameworks. At a high level, Execution Templates are dynamically generated execution plans that are installed by the controller at the workers. The key idea behind execution template is that long running applications are iterative in nature (What is an example for iterative patterns within an application?). Instead of scheduling the computations for each iteration from scratch, controller installs the execution plan on each worker once, and instantiates it with new parameters for later iterations. If the load changes or stragglers appear, controller generates new templates to adopt to the dynamic behaviour of the cloud, on the fly. In many ways, execution templates resemble JIT Compilers (How similar are JIT compilers and execution templates?).
Nimbus runs up to 40x faster than Spark.
Nimbus's speedup comes from two major factors: 1) implementing efficient tasks in C++, and 2) optimized control plane enabled by Execution Template. Nimbus's core system is implemented in C++, and there are handy APIs for machine learning and physical simulations applications. Implementing tasks in C++, in some cases, brings a factor of 50x speedup compared to Scala implementation in Spark (Why is Scala implementation 50x slower than C++?). Short, efficient tasks, however, pressure the runtime system to deliver higher task throughput with lower latency. Available frameworks cannot keep up with the task rate demands of the optimized applications. So, even if the tasks are optimized, the overall completion time does not see the same speedup, since the control plane becomes a bottleneck (What if other frameworks run tasks at the speed of C++). Execution Templates allow Nimbus to deliver orders of magnitude higher task rate at even higher scales (How does Nimbus's task throughput scale?). This enables Nimbus to schedule optimized tasks with negligible overhead and realize the actual speedup coming from efficient tasks.
Nimbus enables running HPC applications in the cloud.
Physical simulations are traditionally classified under HPC applications. They usually consists of a series of dense computations followed by data exchanges. The computations are in the order of hundreds of microseconds to few milliseconds (What is the task length distribution of physical simulations?), which induces high task throughputs on the runtime systems. Nimbus, for the first time, enables running HPC applications within a cloud framework. We have ported PhysBAM, which is a physics based simulations library in to Nimbus. Nimbus performance is within 15% of the hand tuned MPI code for simulations with billions of cells. Nimbus provides fault tolerance and load balancing required for the cloud environments while keeping the control plane overhead low. Without Execution Templates, the control plane overhead dominates the runtime and slows down computation by 520%.