Research Description (David H. Albonesi)

Research Description

The primary focus of our research is power-efficient computer systems. Recent work has addressed applications to smart buildings. More detail is found in the accompanying principal publications (a more complete list of publications can be found here).

Research Projects

Accelerator Architectures
Power-Efficient Adaptive and Reconfigurable Architectures and Algorithms
Interconnect Architectures Exploiting Silicon Nanophotonics
Smart Buildings
GALS Microarchitectures
Power and Reliability-Aware Computing
Clustered Multi-threaded Architectures
Dynamic Data Dependence Tracking

Accelerator Architectures

Our work in accelerator architectures spans specialized processor architectures, memory formats, and configurable multipliers, among others.

MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product, N. Srivastava et al., 53rd International Symposium on Microarchitecture, October 2020.

Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations, N. Srivastava et al., 26th International Symposium on High-Performance Computer Architecture, February 2020.

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations, N. Srivastava et al., 27th International Symposium on Field-Programmable Custom Computing Machines, April 2019.

DeepRecon: Dynamically Reconfigurable Architecture for Accelerating Deep Neural Networks, T. Rzayev, S. Moradi, D.H. Albonesi, and R. Manohar, International Joint Conference on Neural Networks, May 2017.

Fractured Arithmetic Accelerator for Training Deep Neural Networks, T. Rzayev, S. Moradi, D.H. Albonesi, and R. Manohar, Workshop on Hardware and Algorithms for On-chip Learning, held at the International Conference on Computer-Aided Design, November 2016.

Power-Efficient Adaptive and Reconfigurable Architectures and Algorithms

Applications go through phases of execution in which their fundamental characteristics may vary widely. Conventional microprocessors are fixed at design time and therefore are inevitably a compromise: a particular design may be best overall for some given workload, but for any given application, or even a phase of an application, a different microarchitecture is often preferable in terms of performance and power dissipation. Our group investigates both adaptive and reconfigurable approaches to address this phase-level application variation. Adaptive architectures dynamically tune major microprocessor resources during execution to better match varying phase behavior. Reconfigurable architectures embed programmable logic among clusters of cores on a multi-core die and dynamically manage these shared resources during execution. Both approaches require implementing efficient control algorithms that rapidly find the optimal solution.

CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores, N. Kulkarni et al., 53rd International Symposium on Microarchitecture, October 2020.

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control, A. Majumdar, L. Piga, I. Paul, J.L. Greathouse, W. Huang, and D.H. Albonesi, 23rd International Symposium on High Performance Computer Architecture, February 2017.

Flicker: A Dynamically Adaptive Architecture for Power Limited Multicore Systems, P. Petrica, A.M. Izraelevitz, D.H. Albonesi, and C.A. Shoemaker, 40th International Symposium on Computer Architecture, June 2013.

A Phase Adaptive Cache Hierarchy for SMT Processors, S. Lopez, O. Garnica, D.H. Albonesi, S. Dropsho, J. Lanchares, and J.I. Hidalgo, Microprocessors & Microsystems, Vol. 35, No. 8, pp. 683-694, November 2011.

ReMAP: A Reconfigurable Architecture for Chip Multiprocessors, M.A. Watkins and D.H. Albonesi, IEEE Micro, Special Issue on the Top Picks from the Computer Architecture Conferences, January/February 2011.

ReMAP: A Reconfigurable Heterogeneous Multicore Architecture, M.A. Watkins and D.H. Albonesi, 43rd International Symposium on Microarchitecture, December 2010.

Dynamically Managed Multithreaded Reconfigurable Architectures for Chip Multiprocessors, M.A. Watkins and D.H. Albonesi, 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 41-52, September 2010.

Scalable Thread Scheduling and Global Power Management for Heterogeneous Many-Core Architectures, J.A. Winter, D.H. Albonesi, and C.A. Shoemaker, 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 29-39, September 2010.

Adaptive Cache Memories for SMT Processors, S. López, O. Garnica, D.H. Albonesi, S. Dropsho, J. Lanchares, and J.I. Hidalgo, 13th Euromicro Conference on Digital System Design, September 2010.

Enabling Parallelization via a Reconfigurable Chip Multiprocessor, M.A. Watkins and D.H. Albonesi, Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, held at the 37th International Symposium on Computer Architecture, June 2010.

Shared Reconfigurable Architectures for CMPs, M.A. Watkins, M.J. Cianchetti, and D.H. Albonesi, 18th IEEE International Conference on Field Programmable Logic and Applications, September 2008. (Best Paper Award nomination)

Dynamic Capacity-Speed Tradeoffs in SMT Processor Caches, S. López, S. Dropsho, D.H. Albonesi, O. Garnica, and J. Lanchares, International Conference on High Performance Embedded Architectures and Compilers, January 2007.

Dynamically Tuning Processor Resources with Adaptive Processing, D.H. Albonesi, R. Balasubramonian, S.G. Dropsho, S. Dwarkadas, E.G. Friedman, M.C. Huang, V. Kursun, G. Magklis, M.L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P.W. Cook, and S.E. Schuster, IEEE Computer, Special Issue on Power-Aware Computing, Vol. 36, No. 12, pp. 49-58, December 2003.

A Dynamically Tunable Memory Hierarchy, R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, IEEE Transactions on Computers, pp. 1243-1258, October 2003.

Energy Efficient Co-Adaptive Instruction Fetch and Issue, A. Buyuktosunoglu, T. Karkhanis, D.H. Albonesi, and P. Bose, 30th International Symposium on Computer Architecture, pp. 147-156, June 2003.

Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power, S. Dropsho, A. Buyuktosunoglu, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, G. Semeraro, G. Magklis, and M.L. Scott, 11th International Conference on Parallel Architectures and Compilation Techniques, pp. 141-152, September 2002.

Dynamically Allocating Processor Resources Between Nearby and Distant ILP, R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, 28th International Symposium on Computer Architecture, pp. 26-37, June 2001.

A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors, A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D.H. Albonesi, 11th Great Lakes Symposium on VLSI, pp. 73-78, March 2001.

Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures, R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, 33rd International Symposium on Microarchitecture, pp. 245-257, December 2000.

An Adaptive Issue Queue for Reduced Power at High Performance, A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D.H. Albonesi, Workshop on Power-Aware Computer Systems, held at the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000. Also appears in Springer-Verlag Lecture Notes in Computer Science, Volume 2008.

Selective Cache Ways: On-Demand Cache Resource Allocation, D.H. Albonesi, 32nd International Symposium on Microarchitecture, pp. 248-259, November 1999.

A Methodology for the Analysis of Dynamic Application Parallelism and Its Application to Reconfigurable Computing, B. Xu and D.H. Albonesi, SPIE International Conference on Reconfigurable Technology: FPGAs for Computing and Applications, pp. 78-86, September 1999. Warning: Huge ps file

Dynamic IPC/Clock Rate Optimization, D.H. Albonesi, 25th International Symposium on Computer Architecture, pp. 282-292, June 1998.

The Inherent Energy Efficiency of Complexity-Adaptive Processors, D.H. Albonesi, 1998 Power-Driven Microarchitecture Workshop, held at the 25th International Symposium on Computer Architecture, pp. 107-112, June 1998.

Interconnect Architectures Exploiting Silicon Nanophotonics

Although photonic devices have long been touted as potentially superior to traditional electrical interconnects in computer systems, the lack of CMOS-compatible devices has impeded progress in the area. However, there has been considerable device-level innovation in the past several years that is bringing a CMOS-compatible photonic interconnect system closer to reality. While inter-chip interconnects will be the near term application of this technology, our focus is on silicon photonics for on-chip interconnects in microprocessors and memories. Our project takes an integrated approach that spans devices, integrated circuits, and microarchitectures.

A Low Latency, High Throughput On-Chip Optical Router Architecture for Future Chip Multiprocessors, M.J. Cianchetti and D.H. Albonesi, ACM Journal on Emerging Technologies in Computing Systems, Special Issue on Nanophotonic Communication Technology Integration, Vol. 7, No. 2, June 2011.

Phastlane: A Rapid Transit Optical Routing Network, M.J. Cianchetti, J.C. Kerekes, and D.H. Albonesi, 36th International Symposium on Computer Architecture, June 2009.

On-Chip Optical Interconnects: Challenges and Critical Directions, G. Chen, H. Chen, M. Haurylau, N.A. Nelson, D.H. Albonesi, P.M. Fauchet, and E.G. Friedman, Proceedings of the European Optical Society Topical Meeting on Optical Microsystems, p. 97, October 2007.

On-Chip Optical Interconnect for Reduced Delay Uncertainty, G. Chen, H. Chen, M. Haurylau, N.A. Nelson, D.H. Albonesi, P.M. Fauchet, and E.G. Friedman, Proceedings of Nano-Net, September 2007.

On-chip Optical Technology in Future Bus-based Multicore Designs: Opportunities and Challenges, N. Kırman, M. Kırman, R.K. Dokania, J. Martínez, A.B. Apsel, M.A. Watkins, and D.H. Albonesi, IEEE Micro, Special Issue on the Top Picks from Microarchitecture Conferences, Vol. 27, No. 1, January/February 2007.

On-Chip Optical Interconnect Roadmap: Challenges and Critical Directions, M. Haurylau, G. Chen, H. Chen, J. Zhang, N.A. Nelson, D.H. Albonesi, E.G. Friedman, and P.M. Fauchet, IEEE Journal of Selected Topics in Quantum Electronics, Special Issue on Silicon Photonics, Vol. 12, No. 6, pp. 1699-1705, November/December 2006.

Leveraging Optical Technology in Future Bus-based Chip Multiprocessors, N. Kırman, M. Kırman, R.K. Dokania, J. Martínez, A.B. Apsel, M.A. Watkins, and D.H. Albonesi, 39th International Symposium on Microarchitecture, December 2006.

On-Chip Copper-Based vs. Optical Interconnects: Delay Uncertainty, Latency, Power, and Bandwidth Density Comparative Predictions, G. Chen, H. Chen, M. Haurylau, N.A. Nelson, D.H. Albonesi, P.M. Fauchet, and E.G. Friedman, IEEE International Interconnect Technology Conference, pp. 39-41, June 2006.

On-chip Optical Interconnect Roadmap: Challenges and Critical Directions, M. Haurylau, H. Chen, J. Zhang, G. Chen, N.A. Nelson, D.H. Albonesi, E.G. Friedman, and P.M. Fauchet, 2nd International Group IV Photonics Conference, pp. 17-19, September 2005.

Electrical and Optical On-Chip Interconnects in Scaled Microprocessors, G. Chen, H. Chen, M. Haurylau, N. Nelson, D.H. Albonesi, P.M. Fauchet, and E.G. Friedman, International Symposium on Circuits and Systems, pp. 2514-2517, May 2005.

Predictions of CMOS Compatible On-Chip Optical Interconnect, G. Chen, H. Chen, M. Haurylau, N. Nelson, P.M. Fauchet, E.G. Friedman, and D.H. Albonesi, 7th International Workshop on System Level Interconnect Prediction, pp. 13-20, April 2005.

Alleviating Thermal Constraints while Maintaining Performance Via Silicon-Based On-Chip Optical Interconnects, N. Nelson, G. Briggs, M. Haurylau, G. Chen, H. Chen, D.H. Albonesi, E.G. Friedman, and P.M. Fauchet, Workshop on Unique Chips and Systems, March 2005.

Smart Buildings

Our research in smart buildings is inspired by our work in computer systems dynamic power management. We focus on proactive energy-saving techniques, based on occupant behavior and building metadata, such as meeting schedules.

Characterizing the Benefits and Limitations of Smart Building Meeting Room Scheduling, A. Majumdar, Z. Zhang, and D.H. Albonesi, 7th International Conference on Cyber-Physical Systems, April 2016.

Energy-Comfort Optimization using Discomfort History and Probabilistic Occupancy Prediction, A. Majumdar, J.L. Setter, J.R. Dobbs, B.M. Hencey, and D.H. Albonesi, 5th International Green Computing Conference, November 2014.

Energy-Aware Meeting Scheduling Algorithms for Smart Buildings, A. Majumdar, D.H. Albonesi, and P. Bose, 4th ACM Workshop on Embedded Systems for Energy-Efficiency in Buildings, November 2012.

GALS Microarchitectures

In a Globally Asynchronous, Locally Synchronous (GALS) system, the design is divided into several different domains, each with their own independent clock generation and distribution system. The potential benefits of GALS include reduced clock skew and overhead, and the potential to better tolerate process variations, a growing concern in the nanoscale regime.

Our research explores the application of a GALS design methodology to each core of a multi-core microprocessor, and within each processor core itself. In terms of the latter, we have devised algorithms for general purpose Dynamic Voltage Scaling (DVS) within our Multiple Clock Domain (MCD) processor design. In MCD, domains that are off the critical execution path can be slowed down (either under hardware or software control) to save energy without undue performance loss. This localized DVS approach applies to a wide range of applications. We have also explored the use of loop fusion within MCD for energy savings, and devised a more complexity-effective version of MCD that achieves better energy efficiency with simplified hardware.

Synergistic Temperature and Energy Management in GALS Processor Architectures, Y. Zhu and D.H. Albonesi, International Symposium on Low Power Electronics and Design, pp. 55-60, October 2006.

Localized Microarchitecture-Level Voltage Management, Y. Zhu and D.H. Albonesi, International Symposium on Circuits and Systems, pp. 37-40, May 2006.

A High Performance, Energy Efficient, GALS Processor Microarchitecture with Reduced Implementation Complexity, Y. Zhu, D.H. Albonesi, and A. Buyuktosunoglu, International Symposium on Performance Analysis of Systems and Software, pp. 42-53, March 2005.

Dynamically Trading Frequency for Complexity in a GALS Microprocessor, S. Dropsho, G. Semeraro, D.H. Albonesi, G. Magklis, and M.L. Scott, 37th International Symposium on Microarchitecture, pp. 157-168, December 2004.

The Energy Impact of Aggressive Loop Fusion, Y. Zhu, G. Magklis, M.L. Scott, C. Ding, and D.H. Albonesi, 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 153-164, September 2004.

Hiding Synchronization Delays in a GALS Processor Microarchitecture, G. Semeraro, D.H. Albonesi, G. Magklis, M.L. Scott, S.G. Dropsho, and S. Dwarkadas, 10th International Symposium on Asynchronous Circuits and Systems, pp. 159-169, April 2004.

Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor, G. Magklis, G. Semeraro, D.H. Albonesi, S.G. Dropsho, S. Dwarkadas, and M.L. Scott, IEEE Micro, Special Issue on the Top Picks from Microarchitecture Conferences, Vol. 23, No. 6, pp. 62-68, November/December 2003.

Profile-based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor, G. Magklis, M.L. Scott, G. Semeraro, D.H. Albonesi, and S. Dropsho, 30th International Symposium on Computer Architecture, pp. 14-25, June 2003.

Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture, G. Semeraro, D.H. Albonesi, S.G. Dropsho, G. Magklis, S. Dwarkadas, and M.L. Scott, 35th International Symposium on Microarchitecture, pp. 356-367, November 2002.

Energy Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling, G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, S. Dwarkadas, and M.L. Scott, 8th International Symposium on High-Performance Computer Architecture, pp. 29-40, February 2002.

Power- and Reliability-Aware Computing

Our research broadly addresses the problems of computer systems' power and reliability, including thermal constraints, energy consumption, inductive noise, soft errors, aging defects, and variations. We have devised a variety of approaches at the processor core and multi-core architectural levels to address these problems.

Dynamic Power Redistribution in Failure Prone CMPs, P. Petrica, J.A. Winter, and D.H. Albonesi, Workshop on Energy Efficient Design, held at the 37th International Symposium on Computer Architecture, June 2010.

The Scalability of Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures, J.A. Winter and D.H. Albonesi, Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, held at the 35th International Symposium on Computer Architecture, June 2008.

Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures, J.A. Winter and D.H. Albonesi, 38th International Conference on Dependable Systems and Networks, June 2008.

Addressing Thermal Non-Uniformity in SMT Workloads, J.A. Winter and D.H. Albonesi, ACM Transactions on Architecture and Code Optimization, 2008.

Synergistic Temperature and Energy Management in GALS Processor Architectures, Y. Zhu and D.H. Albonesi, International Symposium on Low Power Electronics and Design, pp. 55-60, October 2006.

Localized Microarchitecture-Level Voltage Management, Y. Zhu and D.H. Albonesi, International Symposium on Circuits and Systems, pp. 37-40, May 2006.

Power Efficient Error Tolerance in Chip Multi-Processors, M.W. Rashid, E.J. Tan, M.C. Huang, and D.H. Albonesi, IEEE Micro, Special Issue on Reliability-Aware Microarchitectures, Vol. 25, No. 6, pp. 60-70, November/December 2005.

Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance, M.W. Rashid, E.J. Tan, M.C. Huang, and D.H. Albonesi, 14th International Conference on Parallel Architectures and Compilation Techniques, pp. 315-325, September 2005.

An Evaluation of a Configurable VLIW Microarchitecture for Embedded DSP Applications, W. Liu, D.H. Albonesi, J. Gostomski, L. Palum, D. Hinterberger, R. Wanzenried, and M. Indovina, Journal of Circuits, Systems, and Computers, Special Issue on VLSI Architectures for Multimedia Applications, Vol. 13, No. 6, pp. 1321-1345, December 2004.

Mitigating Inductive Noise in SMT Processors, W. El-Essawy and D.H. Albonesi, International Symposium on Low Power Electronics and Design, pp. 332-337, August 2004.

Front-End Policies for Improved Issue Efficiency in SMT Processors, A. El-Moursy and D.H. Albonesi, 9th International Symposium on High-Performance Computer Architecture, pp. 31-40, February 2003.

Managing Static Leakage Energy in Microprocessor Functional Units, S. Dropsho, V. Kursun, D.H. Albonesi, S. Dwarkadas, and E.G. Friedman, 35th International Symposium on Microarchitecture, pp. 321-332, November 2002.

A Microarchitectural-Level Step-Power Analysis Tool, W. El-Essawy, D.H. Albonesi, and B. Sinharoy, International Symposium on Low Power Electronics and Design, pp. 263-266, August 2002.

Clustered Multi-Threaded Microprocessors

In order to extract both instruction-level parallelism (ILP) and thread-level parallelism (TLP) in a multi-threaded processor core, complex hardware resources are required. The potential ramifications of this increased complexity are reduced clock frequency, reduced throughout (due to the need to overpipeline to maintain frequency), increased power dissipation, and difficulty in scaling the design to a new process technology. In a Clustered Multi-Threaded (CMT) microarchitecture, the core is divided into smaller, more scalable, clusters, with communication paths introduced between them. Instructions from different threads are assigned to clusters according to a steering algorithm implemented in the front-end of the machine.

Our early work in clustered microarchitectures for single-threaded machines explored dynamically trading off communication and parallelism on an application phase basis. More recently, we have demonstrated that CMT processors with efficient multi-threaded steering mechanisms can achieve almost all of the cycle-level performance of very complex monolithic multi-threaded cores, with a significant reduction in power consumption. We've also shown how a multi-core design of CMT processor cores is an extremely attractive design option for the future. Our most recent work exploits the built-in steering mechanisms of CMTs for thermal management.

Addressing Thermal Non-Uniformity in SMT Workloads, J.A. Winter and D.H. Albonesi, ACM Transactions on Architecture and Code Optimization, 2008.

Compatible Phase Co-Scheduling on a CMP of Multi-Threaded Processors, A. El-Moursy, R. Garg, D.H. Albonesi, and S. Dwarkadas, 20th International Parallel and Distributed Processing Symposium, April 2006.

Partitioning Multi-Threaded Processors with a Large Number of Threads, A. El-Moursy, R. Garg, D.H. Albonesi, and S. Dwarkadas, International Symposium on Performance Analysis of Systems and Software, pp. 112-123, March 2005.

Dynamically Matching ILP Characteristics Via a Heterogeneous Clustered Microarchitecture, L. Chen, D.H. Albonesi, and S. Dropsho, IBM Watson Conference on the Interaction Between Architecture, Circuits, and Compilers, pp. 136-143, October 2004.

Dynamically Managing the Communication-Parallelism Trade-off in Future Clustered Processors, R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, 30th International Symposium on Computer Architecture, pp. 275-286, June 2003.

Dynamic Data Dependence Tracking

Many microprocessor optimizations rely on precise information regarding dependences among instructions in the pipeline, yet this information is not readily available. We have developed an efficient mechanism for dynamic data dependence tracking among all the in-flight instructions in the machine, and shown how this information can be used to significantly improve branch prediction accuracy, and guide the steering mechanism in a heterogeneous clustered microarchitecture.

Dynamic Data Dependence Tracking and its Application to Branch Prediction, L. Chen, S. Dropsho, and D.H. Albonesi, 9th International Symposium on High-Performance Computer Architecture, pp. 65-76, February 2003.