An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

Tao Chen, Shreesha Srinath, Christopher Batten, and G. Edward Suh
MICRO 2018 [IEEE] [PDF] [Slides]

Abstract

In this paper, we propose ParallelXL, an architectural framework for building application-specific parallel accelerators with low manual effort. The framework introduces a task-based computation model with explicit continuation passing to support dynamic parallelism in addition to static parallelism. In contrast, today’s high-level design frameworks for accelerators focus on static data-level or thread-level parallelism that can be identified and scheduled at design time. To realize the new computation model, we develop an accelerator architecture that efficiently handles dynamic task generation and scheduling as well as load balancing through work stealing. The architecture is general enough to support many dynamic parallel constructs such as fork-join, data-dependent task spawning, and arbitrary nesting and recursion of tasks, as well as static parallel patterns. We also introduce a design methodology that includes an architectural template that allows easily creating parallel accelerators from high-level descriptions. The proposed framework is studied through an FPGA prototype as well as detailed simulations. Evaluation results show that the framework can generate high-performance accelerators targeting FPGAs for a wide range of parallel algorithms and achieve an average of 4.0x speedup over an eight-core out-of-order processor (24.1x over a single core), while being 11.8x more energy efficient.