Daniel Lo

Photo credit Crystal Lu.

Daniel Lo, Ph.D.
Electrical and Computer Engineering

368 Upson Hall
Ithaca, NY 14853 USA
d-l--5---7----5 at cornell...edu! (erase dashes, extra periods, !)

I graduated with a Ph.D. in 2015. While at Cornell, I worked with Prof. Edward Suh in the Suh Research Group. I am now at Microsoft Research.

CV available upon request.

Research Interests

Computer architecture, embedded real-time systems, hardware security and reliability

Education

Cornell University
M.S./Ph.D. in Electrical & Computer Engineering, 2015

Thesis: Hardware Architectures for Secure, Reliable, and Energy-Efficient Real-Time Systems
Fall 2009: Jacobs Scholar Fellowship

California Institute of Technology
B.S. with Honors in Electrical Engineering, 2009

2008 - 2009: Alcott Scholarship
2007 - 2008: Carnation Scholarship

Publications

DBLP Bibliography

Conference Papers

Daniel Lo, Taejoon Song, and G. Edward Suh.
Prediction-Guided Performance-Energy Trade-off for Interactive Applications. MICRO 2015.
[ACM] [PDF] [Slides] [Poster] [Abstract]

Many modern mobile and desktop applications involve real-time interactions with users. For these interactive applications, tasks must complete in a reasonable amount of time in order to provide a responsive user experience. Conversely, completing a task faster than the limits of human perception does not improve the user experience. Thus, for energy efficiency, tasks should be run just fast enough to meet the response-time requirement instead of wasting energy by running faster. In this paper, we present a predictive DVFS controller that predicts the execution time of a job before it executes in order to appropriately set the DVFS level to just meet user response-time deadlines. Our results show 56% energy savings compared to running tasks at the maximum frequency with almost no deadline misses. This is 27% more energy savings than the default Linux interactive power governor, which also shows 2% deadline misses on average.
Mohamed Ismail, Daniel Lo, and G. Edward Suh.
Improving Worst-Case Cache Performance through Selective Bypassing and Register-Indexed Cache. DAC 2015.
[ACM] [PDF] [Abstract]

Worst-case execution time (WCET) analysis is a critical part of designing real-time systems that require strict timing guarantees. Data caches have traditionally been challenging to analyze in the context of WCET due to the unpredictability of memory access patterns. In this paper, we present a novel register-indexed cache structure that is designed to be amenable to static analysis. This is based on the idea that absolute addresses may not be known, but by using relative addresses, analysis may be able to guarantee a number of hits in the cache. In addition, we observe that keeping unpredictable memory accesses in caches can increase or decrease WCET depending on the application. Thus, we explore selectively bypassing caches in order to provide lower WCET. Our experimental results show reductions in WCET of up to 35% over the state-of-the-art static analysis.
Daniel Lo, Tao Chen, Mohamed Ismail, and G. Edward Suh.
Run-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering. HPCA 2015.
[IEEE] [PDF] [Abstract]

Recent studies have proposed various parallel run-time monitoring techniques to improve the reliability, security, and debugging capabilities of computer systems. However, these run-time monitors can introduce large performance and energy overheads, especially for flexible systems that support a range of monitors. In this paper, we introduce a hardware dataflow tracking engine that enables adjustable overhead through partial monitoring. This allows a trade-off to be made between monitoring coverage and overhead. This dataflow engine can also be extended to filter out monitoring operations associated with null metadata in order to reduce overhead. Given this architecture, we investigate how the dropping decisions should be made for partial monitoring and show that there exist interesting policy decisions depending on the target application of partial monitoring. Our experimental results show that overhead can be reduced significantly by trading off coverage. For example, for monitoring techniques with average overheads of 2-6x, the proposed architecture is able to reduce overhead to 1.5x while still achieving 14-85% average coverage.
Daniel Lo, Mohamed Ismail, Tao Chen, and G. Edward Suh.
Slack-Aware Opportunistic Monitoring for Real-Time Systems. RTAS 2014.
[IEEE] [PDF] [Abstract]

Recent studies have shown that run-time monitoring is a promising approach for improving the security and reliability of computer systems. In this paper, we present a framework and architecture for applying run-time monitoring to hard real-time systems. In this framework, monitoring is only performed when enough dynamic slack exists in order to ensure that the monitoring does not impact the timing guarantees of tasks. If the slack is insufficient, a dropping operation is run which minimizes the timing impact on the task while ensuring that no false positives occur. We present a novel hardware architecture that can perform this dropping operation in a single cycle, matching the throughput of the task being monitored. Thus, run-time monitoring is able to be applied opportunistically, with no impact on the worst-case execution time of tasks. Our experimental results for three different monitoring techniques verify that timing is never violated and that false positives never occur. In addition, on average, 15-66% of monitoring coverage is achieved with no impact on the worst-case execution times of tasks depending on the monitoring technique. With an FPGA-based monitor, this average coverage of monitoring ranged from 62-86% depending on the monitoring technique.
Daniel Lo and G. Edward Suh.
Worst-Case Execution Time Analysis for Parallel Run-Time Monitoring. DAC 2012.
[IEEE] [ACM] [PDF] [Abstract]

The increasing safety-critical role of real-time systems requires increased attention to their security and reliability. Several recent studies have shown that parallel run-time monitoring of programs can significantly improve the security and reliability of computing systems. However, these techniques cannot be applied to real-time systems without first estimating their impact on worst-case execution time (WCET). In this paper, we present a method for determining the impact of parallel monitoring on WCET using a mixed integer linear programming (MILP) formulation. We use our method to estimate the WCET for seven benchmark programs and two possible monitoring techniques. This estimate is compared against observed execution times from simulation and an upper bound based on sequential monitoring. The results show that our method estimates a WCET within 71% of worst-case observed execution times and up to 74% lower than the sequential bound.
Daniel Lo, Greg Malysa, and G. Edward Suh.
FlexCache: Field Extensible Cache Controller Architecture Using On-Chip Reconfigurable Fabric. FPL 2011.
[IEEE] [ACM] [PDF] [Abstract]

In today's microprocessors, the cache architecture is highly optimized for one particular design and cannot be changed after fabrication. While allowing efficient implementations in dedicated logic, this inflexibility also implies that new techniques cannot be deployed in the field. This paper presents FlexCache, a flexible cache architecture that uses on-chip reconfigurable fabric to enable new extensions to be added in the field after fabrication. We evaluate the flexibility and efficiency of the architecture through an RTL prototype implementation of the cache along with example extensions such as cache performance counters, side-channel protection, prefetching, various replacement policies and computation acceleration. The results show that various types of extensions can be realized on FlexCache with minimal impact on performance, power, and area.
Daniel Y. Deng, Daniel Lo, Greg Malysa, Skyler Schneider, and G. Edward Suh.
Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric. MICRO 2010.
[IEEE] [ACM] [PDF] [Abstract]

This paper proposes FlexCore, a hybrid processor architecture where an on-chip reconfigurable fabric (FPGA) is tightly coupled with the main processing core. FlexCore provides an efficient platform that can support a broad range of run-time monitoring and bookkeeping techniques. Unlike using custom hardware, which is more efficient but often extremely difficult and expensive to incorporate into a modern microprocessor, the FlexCore architecture allows parallel monitoring and bookkeeping functions to be dynamically added to the processing core and adapt to application needs even after the chip has been fabricated. At the same time, FlexCore is far more efficient than software implementations because its fine-grained reconfigurable architecture closely matches bit-level operations of typical monitoring schemes and allows monitoring schemes to operate in parallel to the monitored core. In fact, our experimental results show that monitoring on FlexCore can almost match the performance of full ASIC implementations. To evaluate the FlexCore architecture, we implemented an RTL prototype along with several extensions including uninitialized memory read checking, dynamic information flow tracking, array bound checking, and soft error checking. The prototypes demonstrate that the architecture can support a range of monitoring extensions with different characteristics in an efficient manner. FlexCore takes moderate silicon area and results in far better performance and energy efficiency than software.

Posters

Skyler Schneider, Daniel Y. Deng, Daniel Lo, Greg Malysa, and G. Edward Suh.
Implementing Dynamic Information Flow Tracking on Microprocessors with Integrated FPGA Fabric. FPGA 2010.

Undergraduate research experience in the Lester Lab. I wrote image processing software for a new microscope that was being developed.

Lawrence A. Wade, Daniel Lo, Scott Fraser, and Henry Lester.
Imaging the Microorganization of Synaptic Receptors. Biophysical Society 51st Annual Meeting, 2007.

The documents listed above are posted as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
IEEE Copyright: © Copyright 2010-2015 by IEEE
ACM Copyright: © Copyright 2010-2015 by ACM, Inc.

Professional Experience

Intel Corporation, Security Research Lab, Graduate Technical Intern, May-August 2012, Hillsboro, OR
Applied Minds Inc., Intern, March-June 2009, Glendale, CA
Applied Minds Inc., Intern, June-September 2008, Glendale, CA
Jet Propulsion Laboratory, DARTS Lab, Summer Undergraduate Researcher, June-August 2007, Pasadena, CA

Teaching Experience

ECE 2300 - Introduction to Digital Logic Design, Course Assistant, Fall 2014, Cornell
ECE 5750 - Advanced Computer Architecture, Course Assistant, Spring 2013, Cornell
ECE 5750 - Advanced Computer Architecture, Course Assistant, Spring 2012, Cornell
ECE 2300 - Introduction to Digital Logic Design, Head Teaching Assistant, Fall 2010, Cornell
EE/CS 52 - Principles of Microprocessor Systems, Teaching Assitant, Spring 2008, Caltech

Other

External Reviewer for DAC 2013 - 2015

Computer Systems Laboratory

Daniel Lo, Ph.D. Electrical and Computer Engineering