My research group aims to develop new algorithms, architectures, methodologies, and tools to extend the frontiers of design automation for high-performance and energy-efficient computer systems. We investigate cross-cutting research topics at the intersection of computer-aided design, compilers, and computer architecture at multiple scales, from circuit-level building blocks, to chip-level processor & co-processor cores, as well as system-level heterogeneous compute nodes that integrate CPUs, GPUs, and reconfigurable logics. In particular, we are currently tackling the following important and challenging problems:
- Highly Intelligent High-Level Synthesis
- Software-Defined Reconfigurable Computing
- Principled Hardware Specialization
Specialized hardware manually created by traditional register-transfer-level (RTL) design can yield high performance but is also usually the least productive. As specialized accelerators become more integral to achieving the performance and energy goals of future hardware, there is a crucial need for above-RTL design automation to enable productive modeling, rapid exploration, and automatic generation of customized hardware based on high-level languages. Along this line, there has been an increasing use of high-level synthesis (HLS) tools to compile algorithmic descriptions (e.g., C/C++, Python) to RTL designs for quick ASIC or FPGA implementation of hardware accelerators [J5][J4].
While the latest HLS tools have made encouraging progress with much improved quality-of-results (QoR), they still heavily rely on designers to manually restructure source code and insert vendor-specific directives to guide the synthesis tool to explore a vast and complex solution space. The lack of guarantee on meeting QoR goals out-of-the-box presents a major barrier to non-expert users. To this end, we are developing a new generation of HLS techniques that feature scalable cross-layer synthesis [C26][C25], complexity-effective runtime optimization [C28][J7][C33], and trace-based analysis [C36] to enable a radically accelerated and greatly simplified hardware design experience, while retaining QoR on par with that of "ninja" designers.
Recent advances in modern FPGAs have made reconfigurable computing platforms attractrive to accelerate many compute-intensive applications, such as bioinformatics, data compression, image and video processing, financial analytics, and machine learning. While the heterogeneous CPU+FPGA platforms are becoming commercially available to a wide user base, they remain very difficult to program. As a result, the use of such platforms has been limited to a small subset of programmers with specialized knowledge on the low-level hardware details.
To tackle this challenge, we are exploring several important aspects of FPGA-based computing in the context of both small-scale private cluster as well as public cloud systems. In particular, we are developing a number of novel FPGA-based accelerators using C-based design entries [C32][C35]. We are also investigating domain-specific languages and the associative compilation infrastructure [C34] that have the potential to make FPGAs easily accessible for application programmers.
Power and energy efficiency are now first-order design constraints across the entire computing spectrum. The recent trend towards multicore scaling is not a long-term solution as it is already infeasible to aggressively activate all transistors on a general-purpose die due to power constraints, a phenomenon commonly known as dark silicon. Hardware specialization is a promising approach to improve both performance and energy efficiency. Notably, modern SoCs or even chip multiprocessors have moved toward inclusion of many specialized accelerators, which are built with custom architectures, largely to reduce power compared to using multiple general-purpose processors. On the flip side, the increasing use of hardware specialization further complicates the programming effort and lowers the software portability.
The central theme of our research is to resolve the tension between software flexibility and hardware efficiency to enable productive design specialization for mainstream computing. We are employing a co-design approach that involves architectures, compilers, and applications to explore a variety of specialization options, including instruction set specialization, loop specialization [C24], and polymorphic specialization for algorithms and data structures [C31]. We are closely collaborating with Prof. Christopher Batten on many of these topics.
Research conducted by my group is currently sponsored by Defense Advanced Research Projects Agency (DARPA), National Science Foundation (NSF), Semiconductor Research Corporation (SRC), Intel Corporation, and Xilinx, Inc. Their support is greatly appreciated.