Photo credit Crystal Lu.
368 Upson Hall
Ithaca, NY 14853 USA
d-l--5---7----5 at cornell...edu! (erase dashes, extra periods, !)
I graduated with a Ph.D. in 2015. While at Cornell, I worked with
Prof. Edward Suh
in the Suh Research Group.
I am now at Microsoft Research.
CV available upon request.
Research Interests
Computer architecture, embedded real-time systems, hardware security and reliability
Education
Cornell University
M.S./Ph.D. in Electrical & Computer Engineering, 2015
California Institute of Technology
B.S. with Honors in Electrical Engineering, 2009
- 2008 - 2009: Alcott Scholarship
- 2007 - 2008: Carnation Scholarship
Publications
DBLP Bibliography
Conference Papers
-
Daniel Lo, Taejoon Song, and G. Edward Suh.
Prediction-Guided Performance-Energy Trade-off for Interactive Applications.
MICRO 2015.
[ACM]
[PDF]
[Slides]
[Poster]
[Abstract]
Many modern mobile and desktop applications involve real-time
interactions with users. For these interactive applications, tasks
must complete in a reasonable amount of time in order to provide a
responsive user experience. Conversely, completing a task faster than
the limits of human perception does not improve the user experience.
Thus, for energy efficiency, tasks should be run just fast enough to
meet the response-time requirement instead of wasting energy by
running faster. In this paper, we present a predictive DVFS
controller that predicts the execution time of a job before it
executes in order to appropriately set the DVFS level to just meet
user response-time deadlines. Our results show 56% energy savings
compared to running tasks at the maximum frequency with almost no
deadline misses. This is 27% more energy savings than the default
Linux interactive power governor, which also shows 2% deadline misses
on average.
-
Mohamed Ismail, Daniel Lo, and G. Edward Suh.
Improving Worst-Case Cache Performance through Selective Bypassing and Register-Indexed Cache.
DAC 2015.
[ACM]
[PDF]
[Abstract]
Worst-case execution time (WCET) analysis is a critical part of
designing real-time systems that require strict timing guarantees. Data
caches have traditionally been challenging to analyze in the context of
WCET due to the unpredictability of memory access patterns. In this
paper, we present a novel register-indexed cache structure that is
designed to be amenable to static analysis. This is based on the idea
that absolute addresses may not be known, but by using relative
addresses, analysis may be able to guarantee a number of hits in the
cache. In addition, we observe that keeping unpredictable memory
accesses in caches can increase or decrease WCET depending on the
application. Thus, we explore selectively bypassing caches in order to
provide lower WCET. Our experimental results show reductions in WCET of
up to 35% over the state-of-the-art static analysis.
-
Daniel Lo, Tao Chen, Mohamed Ismail, and G. Edward Suh.
Run-Time Monitoring with Adjustable Overhead Using Dataflow-Guided Filtering.
HPCA 2015.
[IEEE]
[PDF]
[Abstract]
Recent studies have proposed various parallel run-time monitoring
techniques to improve the reliability, security, and debugging
capabilities of computer systems. However, these run-time monitors can
introduce large performance and energy overheads, especially for
flexible systems that support a range of monitors. In this paper, we
introduce a hardware dataflow tracking engine that enables adjustable
overhead through partial monitoring. This allows a trade-off to be made
between monitoring coverage and overhead. This dataflow engine can also
be extended to filter out monitoring operations associated with null
metadata in order to reduce overhead. Given this architecture, we
investigate how the dropping decisions should be made for partial
monitoring and show that there exist interesting policy decisions
depending on the target application of partial monitoring. Our
experimental results show that overhead can be reduced significantly by
trading off coverage. For example, for monitoring techniques with
average overheads of 2-6x, the proposed architecture is able to reduce
overhead to 1.5x while still achieving 14-85% average coverage.
-
Daniel Lo, Mohamed Ismail, Tao Chen, and G. Edward Suh.
Slack-Aware Opportunistic Monitoring for Real-Time Systems.
RTAS 2014.
[IEEE]
[PDF]
[Abstract]
Recent studies have shown that run-time monitoring is a promising
approach for improving the security and reliability of computer systems.
In this paper, we present a framework and architecture for applying
run-time monitoring to hard real-time systems. In this framework,
monitoring is only performed when enough dynamic slack exists in order to
ensure that the monitoring does not impact the timing guarantees of
tasks. If the slack is insufficient, a dropping operation is run which
minimizes the timing impact on the task while ensuring that no false
positives occur. We present a novel hardware architecture that can
perform this dropping operation in a single cycle, matching the
throughput of the task being monitored. Thus, run-time monitoring is able
to be applied opportunistically, with no impact on the worst-case
execution time of tasks. Our experimental results for three different
monitoring techniques verify that timing is never violated and that false
positives never occur. In addition, on average, 15-66% of monitoring
coverage is achieved with no impact on the worst-case execution times of
tasks depending on the monitoring technique. With an FPGA-based monitor,
this average coverage of monitoring ranged from 62-86% depending on the
monitoring technique.
-
Daniel Lo and G. Edward Suh.
Worst-Case Execution Time Analysis for Parallel Run-Time Monitoring.
DAC 2012.
[IEEE]
[ACM]
[PDF]
[Abstract]
The increasing safety-critical role of real-time systems requires
increased attention to their security and reliability. Several recent
studies have shown that parallel run-time monitoring of programs can
significantly improve the security and reliability of computing systems.
However, these techniques cannot be applied to real-time systems without
first estimating their impact on worst-case execution time (WCET). In
this paper, we present a method for determining the impact of parallel
monitoring on WCET using a mixed integer linear programming (MILP)
formulation. We use our method to estimate the WCET for seven benchmark
programs and two possible monitoring techniques. This estimate is
compared against observed execution times from simulation and an upper
bound based on sequential monitoring. The results show that our method
estimates a WCET within 71% of worst-case observed execution times and
up to 74% lower than the sequential bound.
-
Daniel Lo, Greg Malysa, and G. Edward Suh.
FlexCache: Field Extensible Cache Controller Architecture Using On-Chip
Reconfigurable Fabric.
FPL 2011.
[IEEE]
[ACM]
[PDF]
[Abstract]
In today's microprocessors, the cache architecture is highly optimized
for one particular design and cannot be changed after fabrication. While
allowing efficient implementations in dedicated logic, this inflexibility
also implies that new techniques cannot be deployed in the field. This
paper presents FlexCache, a flexible cache architecture that uses on-chip
reconfigurable fabric to enable new extensions to be added in the field
after fabrication. We evaluate the flexibility and efficiency of the
architecture through an RTL prototype implementation of the cache along
with example extensions such as cache performance counters, side-channel
protection, prefetching, various replacement policies and computation
acceleration. The results show that various types of extensions can be
realized on FlexCache with minimal impact on performance, power, and
area.
-
Daniel Y. Deng, Daniel Lo, Greg Malysa, Skyler Schneider, and G. Edward Suh.
Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric.
MICRO 2010.
[IEEE]
[ACM]
[PDF]
[Abstract]
This paper proposes FlexCore, a hybrid processor architecture where an
on-chip reconfigurable fabric (FPGA) is tightly coupled with the main
processing core. FlexCore provides an efficient platform that can
support a broad range of run-time monitoring and bookkeeping
techniques. Unlike using custom hardware, which is more efficient but
often extremely difficult and expensive to incorporate into a modern
microprocessor, the FlexCore architecture allows parallel monitoring
and bookkeeping functions to be dynamically added to the processing
core and adapt to application needs even after the chip has been
fabricated. At the same time, FlexCore is far more efficient than
software implementations because its fine-grained reconfigurable
architecture closely matches bit-level operations of typical monitoring
schemes and allows monitoring schemes to operate in parallel to the
monitored core. In fact, our experimental results show that monitoring
on FlexCore can almost match the performance of full ASIC
implementations. To evaluate the FlexCore architecture, we implemented
an RTL prototype along with several extensions including uninitialized
memory read checking, dynamic information flow tracking, array bound
checking, and soft error checking. The prototypes demonstrate that the
architecture can support a range of monitoring extensions with
different characteristics in an efficient manner. FlexCore takes
moderate silicon area and results in far better performance and energy
efficiency than software.
Posters
- Skyler Schneider, Daniel Y. Deng, Daniel Lo, Greg Malysa, and G. Edward Suh.
Implementing Dynamic Information Flow Tracking on Microprocessors with Integrated FPGA Fabric.
FPGA 2010.
Undergraduate research experience in the Lester Lab. I wrote image processing software for a new microscope that was being developed.
- Lawrence A. Wade, Daniel Lo, Scott Fraser, and Henry Lester.
Imaging the Microorganization of Synaptic Receptors.
Biophysical Society 51st Annual Meeting, 2007.
The documents listed above are posted as a means to ensure timely dissemination
of scholarly and technical work on a non-commercial basis. Copyright and all
rights therein are maintained by the authors or by other copyright holders,
notwithstanding that they have offered their works here electronically. It is
understood that all persons copying this information will adhere to the terms
and constraints invoked by each author's copyright. These works may not be
reposted without the explicit permission of the copyright holder.
IEEE Copyright: © Copyright 2010-2015 by IEEE
ACM Copyright: © Copyright 2010-2015 by ACM, Inc.
Professional Experience
- Intel Corporation, Security Research Lab, Graduate Technical Intern, May-August 2012, Hillsboro, OR
- Applied Minds Inc., Intern, March-June 2009, Glendale, CA
- Applied Minds Inc., Intern, June-September 2008, Glendale, CA
- Jet Propulsion Laboratory, DARTS Lab, Summer Undergraduate Researcher,
June-August 2007, Pasadena, CA
Teaching Experience
- ECE 2300 - Introduction to Digital Logic Design, Course Assistant, Fall 2014, Cornell
- ECE 5750 - Advanced Computer Architecture, Course Assistant, Spring 2013, Cornell
- ECE 5750 - Advanced Computer Architecture, Course Assistant, Spring 2012, Cornell
- ECE 2300 - Introduction to Digital Logic Design, Head Teaching Assistant, Fall 2010, Cornell
- EE/CS 52 - Principles of Microprocessor Systems, Teaching Assitant, Spring 2008, Caltech
Other
- External Reviewer for DAC 2013 - 2015