I am applying for faculty positions in electrical engineering and computer science for the 2018-2019 hiring season.
office: 471-B Rhodes Hall, Ithaca, NY 14853
email: clt67 at cornell edu
I am a final-year PhD student in electrical and computer engineering working under Professor Christopher Batten at Cornell University. I am a computer architect, but my research approach emphasizes cross-stack co-design across software, architecture, and VLSI to unify emerging applications with emerging technologies.
Throughout my PhD, I have been involved with six research test chips that support my research, and I was the project lead or university student lead for three of the chips including BRGTC2 (2018), Celerity (2017), and BRGTC1 (2016). My activities have resulted in a selection as a Rising Star in Computer Architecture (2018) by Georgia Tech and an IEEE MICRO Top Pick from Hot Chips (2018).
In the future, I plan to co-design across software, architecture, and VLSI (1) to build new accelerator-centric SoCs that enable new applications based on intelligence on the edge, (2) to explore new accelerator-centric SoCs that are easy to build using novel methodologies supporting a tile-based abstraction, and (3) to build new SoCs that can be embedded into cyber-physical systems.
Efficient Task-Based Parallel Runtimes
Task-based parallel runtimes underpin the parallelization of frameworks for machine learning, graph analytics, and other domains. State-of-the-art graph analytics frameworks like GraphIt and Ligra are designed on top of these runtimes to enable efficient task distribution using dynamic work-stealing algorithms. A cross-stack research approach can expose runtime-level information to hardware to influence both architecture-level and VLSI-level decisions to improve performance and energy efficiency of the runtime (ISCA'16). However, walls of abstraction often make it challenging to pass information through layers of the computing stack. I worked on a systematic approach to convey the abstraction of a "task" from the runtime directly to the underlying hardware (MICRO'17). I designed and fabricated BRGTC2, a 6.7M-transistor chip in TSMC 28nm, to collect performance, area, and energy numbers in an advanced technology node to support future research projects based on hardware acceleration for task-based parallel runtimes (RISCV'18).
Integrated Voltage Regulation
Voltage regulators are responsible for efficiently converting one voltage level into another (e.g., board-level to chip-level). Recent technology trends are making it feasible to replace discrete voltage regulators with integrated voltage regulators, which can significantly reduce system cost by eliminating expensive board-level components. The enabling trends include energy storage elements with better energy densities as well as faster on-chip switches with lower parasitic losses. However, integrated voltage regulators are very large (e.g., similar area as the core it supplies). Together with my colleagues in the circuits field, I applied a cross-stack research approach to explore a novel technique that dynamically shares capacitance across multiple loads for a 40% reduction in regulator area while still enabling fine-grain DVFS (MICRO'14). I also contributed to the fabrication of a switched-capacitor-based prototype in 65nm CMOS resulting in a journal publication in a top-tier circuits venue (TCASI'18).
Rapid ASIC Design
Rising SoC design costs have created a formidable barrier to hardware design when using traditional design tools and methodologies. It is exceedingly difficult for small teams with a limited workforce to build meaningfully complex chips for business ventures (e.g., chip-based startups in machine learning), in academia (i.e., research groups), and even for government goals (e.g., U.S. Department of Defense). I have been involved in a range of efforts to reduce the costs and challenges of ASIC design for small teams based on productive toolflows and open-source hardware. I was the Cornell University student lead on the Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric resulting in top-tier publications in chip-design venues (HOTCHIPS'17), architecture venues (IEEE-MICRO'18), and various workshops. I was also the project lead for BRGTC1 and BRGTC2, which are silicon prototypes in IBM 130nm and TSMC 28nm designed and implemented using a new open-source Python-based hardware modeling framework called PyMTL developed by my research group. Finally, I contributed to an effort at NVIDIA Research on a modular digital VLSI flow for high-productivity SoC design based on high-level synthesis tools (DAC'18).
Top-tier architecture venues in this list
IEEE MICRO, ISCA, MICRO
Top-tier chip / design automation venues in this list
Top-tier circuits venues in this list
IEEE TCAS I
An Open-Source Python-Based Hardware Generation, Simulation, and
Shunning Jiang, Christopher Torng, and Christopher Batten
WOSET 2018 – First Workshop on Open-Source EDA Technology held in conjunction with ICCAD-37. San Diego, CA. November 2018.
A New Era of Silicon Prototyping in Computer Architecture
Christopher Torng, Shunning Jiang, Khalid Al-Hawaj, Ivan Bukreyev, Berkin Ilbeyi, Tuan Ta, Lin Cheng, Julian Puscar, Ian Galton, and Christopher Batten
RISC-V Day 2018 – RISC-V Day Workshop held in conjunction with MICRO-51. Fukuoka, Japan. October 2018.
Four Monolithically Integrated Switched-Capacitor DC-DC Converters
with Dynamic Capacitance Sharing in 65-nm CMOS
Ivan Bukreyev, Christopher Torng, Waclaw Godycki, Christopher Batten, and Alyssa Apsel
IEEE TCAS I 2018 – IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS I), vol. 65, no. 6, pp. 2035-2047, June 2018.
A Modular Digital VLSI Flow for High-Productivity SoC Design
Brucek Khailany, Evgeni Krimer, Rangharajan Venkatesan, Jason Clemons, Joel Emer, Matthew Fojtik, Alicia Klinefelter, Michael Pellauer, Nathaniel Pinckney, Yakun Sophia Shao, Shreesha Srinath, Christopher Torng, Sam (Likun) Xi, Yanqing Zhang, Brian Zimmer
DAC 2018 – 55th ACM/IEEE Design Automation Conference. San Francisco, CA. June 2018.
The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric:
Fast Architectures and Design Methodologies for Fast Chips
Scott Davidson, Shaolin Xie, Christopher Torng, Khalid Al-Hawaj, Austin Rovinski, Tutu Ajayi, Luis Vega, Chun Zhao, Ritchie Zhao, Steve Dai, Aporva Amarnath, Bandhav Veluri, Paul Gao, Anuj Rao, Gai Liu, Rajesh K. Gupta, Zhiru Zhang, Ronald G. Dreslinski, Christopher Batten, and Michael B. Taylor
IEEE Micro 2018 – Volume 38(2):30–41, Mar/Apr. 2018. Special issue for top picks from Hot Chips 29.
Using Intra-Core Loop-Task Accelerators to Improve the
Productivity and Performance of Task-Based Parallel Programs
Ji Kim, Shunning Jiang, Christopher Torng, Moyang Wang, Shreesha Srinath, Berkin Ilbeyi, Khalid Al-Hawaj, Christopher Batten
MICRO 2017 – 50th IEEE/ACM Int'l Symposium on Microarchitecture. Boston, MA. October 2017.
Experiences Using the RISC-V Ecosystem to Design an
Accelerator-Centric SoC in TSMC 16nm
Tutu Ajayi, Khalid Al-Hawaj, Aporva Amarnath, Steve Dai, Scott Davidson, Paul Gao, Gai Liu, Anuj Rao, Austin Rovinski, Ningxiao Sun, Christopher Torng, Luis Vega, Bandhav Veluri, Shaolin Xie, Chun Zhao, Ritchie Zhao, Christopher Batten, Ronald G. Dreslinski, Rajesh K. Gupta, Michael B. Taylor, Zhiru Zhang
CARRV 2017 – First Workshop on Computer Architecture Research with RISC-V. Boston, MA. October 2017.
Celerity: An Open Source RISC-V Tiered Accelerator Fabric
Tutu Ajayi, Khalid Al-Hawaj, Aporva Amarnath, Steve Dai, Scott Davidson, Paul Gao, Gai Liu, Atieh Lotfi, Julian Puscar, Anuj Rao, Austin Rovinski, Loai Salem, Ningxiao Sun, Christopher Torng, Luis Vega, Bandhav Veluri, Xiaoyang Wang, Shaolin Xie, Chun Zhao, Ritchie Zhao, Christopher Batten, Ronald G. Dreslinski, Ian Galton, Rajesh K. Gupta, Patrick P. Mercier, Mani Srivastava, Michael B. Taylor, Zhiru Zhang
Hot Chips 2017 – 29th Symposium on High Performance Chips. Cupertino, CA. August 2017.
Experiences Using A Novel Python-Based Hardware Modeling Framework
For Computer Architecture Test Chips
Christopher Torng, Moyang Wang, Bharath Sudheendra, Nagaraj Murali, Suren Jayasuriya, Shreesha Srinath, Taylor Pritchard, Robin Ying, and Christopher Batten
Hot Chips 2016 (Poster) – 28th Symposium on High Performance Chips. Cupertino, CA. August 2016.
Asymmetry-Aware Work-Stealing Runtimes
Christopher Torng, Moyang Wang, and Christopher Batten
ISCA 2016 – 43rd ACM/IEEE Int'l Symp. on Computer Architecture. Seoul, Korea. June 2016.
Enabling Realistic Fine-Grain Voltage Scaling with Reconfigurable
Power Distribution Networks
Waclaw Godycki*, Christopher Torng*, Ivan Bukreyev, Alyssa Apsel, and Christopher Batten (* = equally contributing co-first authors)
MICRO 2014 – 47th IEEE/ACM Int'l Symposium on Microarchitecture. Cambridge, UK. December 2014.
Microarchitectural Mechanisms to Exploit Value Structure in SIMT
Ji Kim, Christopher Torng, Shreesha Srinath, Derek Lockhart, and Christopher Batten
ISCA 2013 – 40th ACM/IEEE Int'l Symposium on Computer Architecture. Tel Aviv, Israel. June 2013.
Additional Talks and Research Presentations
- Software, Architecture, and VLSI Co-Design for Task-Based
Parallel Runtimes [site / abstracts]
Rising Stars in Computer Architecture (RISC-A) Workshop 2018 – Georgia Tech
Atlanta, GA. October 2018.
- Towards Rapid Chip Development with Celerity and BRGTC1 [abstract]
CMU Computer Architecture Lab (CALCM)
Pittsburgh, PA. April 2018.
- Celerity: An Open Source RISC-V Tiered Accelerator Fabric
Cornell Electron Devices Society
Ithaca, NY. September 2017. Presented with Khalid Al-Hawaj and Ritchie Zhao.
- On-Chip Reconfigurable Power Distribution Networks
Cornell STEM Graduate Student Summer Colloquium
Ithaca, NY. July 2013. Presented to about fifty audience who were non-experts.
- Reconfigurable Power Distribution Networks for Embedded Multicore
Qualcomm Innovation Fellowship Finals (QInF 2013)
Bridgewater, NJ. March 2013. Presented with Waclaw Godycki.
- 2018 – Rising Stars in Computer Architecture (RISC-A)
- 2018 – IEEE Micro Top Pick from Hot Chips (for Hot Chips '17 paper)
- 2014 – NSF GRFP Honorable Mention
- 2013 – Finalist for Qualcomm Innovation Fellowship (QInF)
- 2012 – H.C. Torng Fellowship (Cornell graduate fellowship) (no familial relation)
Test Chips and Prototyping
Project Lead for the BRGTC2 test chip (2018)
(annotated chip plot,
block diagram) --
BRGTC2 is the BRG research group's second computer architecture test chip. It is a 1x1.25mm 6.7M-transistor chip in TSMC 28nm designed and implemented using our new PyMTL hardware modeling framework. The chip includes four RISC-V RV32IMAF cores which share a 32KB instruction cache, 32KB data cache, and single-precision floating point unit along with microarchitectural mechanisms to mitigate the performance impact of resource sharing. The chip also includes a fully synthesizable high-performance PLL originally designed for the DARPA CRAFT project by Ian Galton and Julian Puscar from UC San Diego. Project was led by Christopher Torng with contributions from Shunning Jiang (core RTL design, verification), Khalid Al-Hawaj (cache RTL design, verification), Ivan Bukreyev (PLL porting), Berkin Ilbeyi (Bloom filter and FPU design), Tuan Ta (CL simulation, arbiter RTL design), and Lin Cheng (microbenchmark development).
Digital ASIC Lead for the PCOSYNC test chip
(annotated chip plot)
PCOSYNC is a 1.1x2.1mm test chip in TSMC 180nm implementing a low-power and scalable baseband synchronizer aimed at enabling low-power and long-range P2P communication for IoT nodes. One of the key application features of this chip is low-power synchronization of N nodes so that they synchronize and then continue to "tick" at the same time. This digital test chip is a follow-on project for recent work by my colleagues on pulse-coupled oscillators in the analog domain (where I was previously not involved). The project was led by Ivan Bukreyev from Professor Alyssa Apsel's research group, and I led the digital ASIC physical design.
Cornell Lead for the Celerity system-on-chip
(annotated chip plot,
Celerity is a 5x5mm 385M-transistor chip in TSMC 16nm designed and implemented by a large team of over 20 students and faculty from UC San Diego, University of Michigan, and Cornell as part of the DARPA Circuit Realization At Faster Timescales (CRAFT) program. The chip includes a fully synthesizable PLL, digital LDO, five modified Chisel-generated RISC-V Rocket cores, a 496-core RISC-V tiled manycore processor, tightly integrated Rocket-to-manycore communication channels, complex HLS-generated BNN (binarized neural network) accelerator, manycore-to-BNN high-speed links, sleep-mode 10-core manycore, top-level bus interconnect, high-speed source-synchronous off-chip I/O, and a custom flip-chip package. Cornell led the Rocket+BNN accelerator logical/physical design and also made key contributions to the top-level logical/physical integration and design/verification methodology.
Project Lead for the BRGTC1 test chip
(annotated chip plot,
block diagram) --
Poster Abstract at HOTCHIPS'16
BRGTC1 is the BRG research group's first computer architecture test chip. It is a 2x2mm 1.3M-transistor chip in IBM 130nm designed and implemented using our new PyMTL hardware modeling framework. The chip includes a simple pipelined 32-bit RISC processor, custom LVDS clock receiver, 16KB of on-chip SRAM, and application-specific accelerators generated using commercial C-to-RTL high-level synthesis tools. Other students who worked on this project: Moyang Wang (co-lead), Bharath Sudheendra and Nagaraj Murali (physical design), Suren Jayasuriya and Robin Ying (full-custom design), Shreesha Srinath (accelerator design), Mark Buckler (toolflow), and Taylor Pritchard (FPGA emulation).
Support Designer for the DCS analog test chip
(annotated chip plot) --
DCS is an acronym that stands for dynamic capacitance sharing, a novel circuits technique for dynamically sharing small units of capacitance across multiple on-chip switched-capacitor voltage regulators for significantly reduced on-chip area and order-of-magnitude faster voltage transition times. The DCS analog test chip features four monolithically integrated switched-capacitor DC-DC converters in 65-nm CMOS. As a young PhD student, I hand-designed the digital configuration components in Cadence Virtuoso while adhering to a traditional track-based organization and also supported the post-silicon validation. The project was in collaboration between Professor Christopher Batten and Professor Alyssa Apsel. The chip design was led by Waclaw Godycki and Ivan Bukreyev, resulting in a circuits journal paper (TCAS'18). I co-led an architecture conference paper (MICRO'14) exploring the architectural applications of the technique.
- (Lead) Teaching Assistant - ECE 2400 / ENGRD 2140 Computer Systems Programming - Fall 2017
- (Lead) Teaching Assistant - ECE 4750 / CS 4420 Computer Architecture - Fall 2014
- (Lead) Teaching Assistant - CURIE Academy - Summer 2014 - Educational outreach program for high school girls focusing on exploring STEM fields and taking a special, deep dive into computer engineering
- Teaching Assistant - ENGRG 1060 Exploration in Engineering Seminar - Summer 2013 - Educational outreach targeted at introducing high school students to STEM fields through an Arduino-based robotics lab
- Undergraduate Teaching Assistant - ECE 4750 / CS 4420 Computer Architecture - Fall 2011
- Graduate Research Intern - NVIDIA ASIC/VLSI Research Group - Austin, TX, USA - Summer 2017
- Graduate Technical Intern - Intel Many Integrated Core (MIC) - Hillsboro, OR, USA - Summer 2012
- Undergraduate Technical Intern - Intel Many Integrated Core (MIC) - Hillsboro, OR, USA - Summer 2011
- Conference Shadow PC Member: ASPLOS 2018
- Journal Reviewer: IEEE TCAS I 2016
My Open-Source Projects
- The Modular VLSI Build System [github] – An open-source, modular approach to ASIC flow organization that encourages greater reuse and productivity when using these flows to explore computer architecture research, VLSI research, or when building chips.
Other Light Contributions to Open-Source Projects
- gem5 [small feature] – Enabled fast-forwarding for MIPS inorder and out-of-order cores in The gem5 Simulator System - Spring 2014 [link]
- gem5 [small feature] – Added support for dynamic frequency scaling in single core and multicore architectures in The gem5 Simulator System - Fall 2013 [link]
- gem5 [bug fix] – Fixed floating point convert instruction signedness bug for MIPS architectures in The gem5 Simulator System - Fall 2013 [link]
- Figure Skating – I am an avid figure skater, and I have six (single) jumps and spins. The axel jump is the next jump and professionals make it look easy, but landing it is very technically difficult.
- Music – I was a music director of a Cornell University a cappella singing group for two years. I arranged our music on the command line, compiling text-based music notation into sheet music PDF with open-source rendering software and a makefile.