Cornell University

Yao Wang
王瑶

Ph.D. Candidate, supervised by Prof. G. Edward Suh
Computer Systems Laboratory (CSL)
Electrical and Computer Engineering, Cornell University

Email:	y .a .o at csl...cornell...edu! (erase space, extra dot, !)
Phone:	+1 (607) 379-4431
Office address:	365C Upson Hall, Cornell University

About Me

I got my PhD in October 2016 and now work in a startup called Waltz Networks. I was working with Prof. G. Edward Suh during my PhD study. Before joining Cornell, I received my bachelor's degree in Electronic Engineering from Tsinghua University, Beijing, China, in 2010.

Google scholar website

Thesis

Efficient and Verifiable Timing Channel Protection for Multi-Core Processors

Research Interests

Hardware designs for timing channel protection

On-Chip Networks(NOCS12')
Cache(DAC16')
Main Memory(HPCA14', HPCA16', HPCA17')

Information flow control

SecVerilog(ASPLOS15')

Publications

Secure Dynamic Memory Scheduling against Timing Channel Attacks
[PDF][Slides]
Yao Wang, Benjaming Wu, G. Edward Suh.
HPCA17', Austin, Texas, Feb 2017

SecDCP: Secure Dynamic Cache Partitioning for Efficient Timing Channel Protection
[PDF][Slides][Poster]
Yao Wang, Andrew Ferraiuolo, Danfeng Zhang, Andrew C. Myers, G. Edward Suh.
DAC16', Austin, Texas, June 2016

Lattice Priority Scheduling: Low-Overhead Timing Channel Protection for a Shared Memory Controller
[PDF]
Andrew Ferraiuolo, Yao Wang, Danfeng Zhang, Andrew C. Myers, G. Edward Suh.
HPCA16', Barcelona, Spain, March 2016

A Hardware Design Language for Timing-Sensitive Information-Flow Security
[PDF]
Danfeng Zhang, Yao Wang, G. Edward Suh, Andrew C. Myers.
ASPLOS15', Istanbul, Turkey, March 2015

Timing Channel Protection for a Shared Memory Controller
[PDF][Slides]
Yao Wang, Andrew Ferraiuolo, G. Edward Suh.
HPCA14', Orlando, Florida, February 2014

Quadrisection-Based Task Mapping on Many-Core Processors for Energy-Efficient On-Chip Communication
[PDF][Slides]
Nithin Michael, Yao Wang, Edward G. Suh, Ao Tang.
NOCS13', Tempe, Arizona, April 2013

Efficient Timing Channel Protection for On-chip Networks
[PDF][Slides]
Yao Wang, Edward G. Suh.
NOCS12', Lyngby, Denmark, May 2012

Performance Evaluation of On-Chip Sensor Network (SENoC)
[PDF][Slides]
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang.
ICGCS10', Shanghai, China, June 2010

Projects

Full System Timing Channel Protection

With the emergence of new computing platform such as cloud computing, hardware resources are extensively shared among processes, programs and users. The dynamic sharing of hardware resources introduces interference between different security domains, which becomes a vector for timing channels. Any hardware resources that are shared among different security domains have the risk of timing channel attacks. These hardware resources includes but not limited to core pipelines, branch predictors, caches, on-chip network, and memory.

In this project, we are working on efficient scheme to provide full system protection against timing channels in shared hardware resources throughout the system...

A Hardware Design Language for Timing-Sensitive Information-Flow Security (ASPLOS'15)

Information flow control is a promising method for building future applications and computing systems with strong security. However, information flows, especially timing channel leakage, are extremely difficult to control using pure software approaches, because the timing behavior of a program relies on the underlying hardware.

In this project, we are trying to combine software and hardware approaches to provide strong information flow control. In the software level, we add annotations to the program to track information flow. These annotations are then communicated to the hardware which enforces the information flow control through its implementation. To verify the security of the implemented hardware, we extended Verilog with annotations that support comprehensive, precise reasoning about information flows at the hardware level...

Timing Channel Protection for a Shared Memory Controller (HPCA14')

Modern computing systems are increasingly vulnerable to timing channel attacks due to interference through extensive hardware resource sharing between programs, threads and virtual machines. Previous research proposed some hardware techniques to deal with timing channels through hardware resources such as cache, on-chip network. However, no hardware techniques have looked into the timing channel through a shared memory controller.

In this project, we demonstrated the existence of timing channels through a shared memory controller. Our solution, Temporal Partitioning solves this problem using several methods:

Per Security Domain (SD) based queuing structure
Time Division Multiplexing (TDM) based scheduling algorithm
Dead time to drain in-flight transactions

Energy-Efficient Task Mapping Algorithm for Many-Core Processors (NOCS13')

Many-core platforms with an on-chip interconnect network, sometimes referred to as Network-on-Chip (NoC), have emerged as a likely candidate for future microprocessor architecture. In order to efficiently utilize the large number of cores, however, application tasks need to be carefully mapped to processing cores (network nodes). In this project, we propose to apply a heuristic approach for VLSI layout called Quadrisection to the task mapping problem for a two-dimensional on-chip network such as meshes.

Efficient Timing Channel Protection for On-Chipe Networks (NOCS12')

On-chip network is often dynamically shared among applications that are concurrently running on a chip- multiprocessor (CMP). In general, such shared resources imply that applications can affect each other’s timing characteristics through interference in shared resources. For example, in on-chip networks, multiple flows can compete for links and buffers. We show that this interference is an attack vector through which a malicious application may be able to infer data-dependent information about other applications (side channel attacks), or two applications can exchange information covertly when direct communications are prohibited (covert channel attacks).

To prevent these timing channel attacks, approaches such as Spatial Network Partitioning (SNP) or Temporal Network Partitioning (TNP) can be used, but they are very inefficient in bandwidth utilization. We propose an efficient scheme, called Reversed Priority with Static Limits (RPSL) which exploits one-way timing channel protection from high security level to low security level by:

Giving high priority to packets from low security level in router arbitration
Setting a static limit on bandwidth utilization for low security level

Routing Algorithm to Reduce Static Power of On-Chip Networks

Technology scaling indicates that future microprocessors will be power limited. Since microprocessors will be power limited and only a portion of the silicon can be turned on, recent studies have proposed specialized and adaptive cores for future multi-core systems. However, the on-chip networks for these systems have been designed assuming that all nodes are simultaneously active.

In this project, we investigate activating only a subset of routers in the network. Specifically, we propose an oblivious routing algorithm, called "BackTrack", to maximize the number of unused routers which can then be turned off to save static power. The basic idea is to have flows in both directions between each pair of network nodes use the same set of routers. We proved this routing algorithm to be deadlock-free, and we show that this routing algorithm can reduce power usage with little to no performance degradation on the network.

Performance Evaluation of On-Chip Sensor Network (SENoC) in MPSoC (ICGCS10')

As technology scaling, more processing units (PUs) are integrated in multiprocessor system-on-chip (MPSoC) to achieve higher performance. Due to the higher variations resulted from reducing feature sizes and the needs of lower power consumption, on-chip monitoring of environmental information, such as thermal, voltage, and frequency, is becoming increasingly important. To address this need, sensors are integrated into network-on-chip (NoC) to perform system monitoring. However, sensors which transfer their data through NOC will compete with PUs for the limited bandwidth resources, thus communication between PUs will be delayed.

To evaluate the sensors' overhead on the regular data traffic, we implement a VC based NoC. The sensor data are transferred through NoC together with the regular data. We study the average delay of regular data and sensor data, respectively. We compare the experimental results with that of a NOC without sensors. The results show that the overhead of sensors is negligible.

Academic Awards

Cornell University Fellowship (Sep 2010)
Outstanding Graduate Award, Tsinghua University (May 2010)
Guanghua Scholarship, Tsinghua University (May 2009)
HuangqianXiang Scholarship, Tsinghua University (May 2008)
National Scholarship, Tsinghua University (May 2007)

Professional Activities

External reviewer for HPCA2014, CCS2014
Teaching Assistant for ENGRD 2300 (Fall 2012)
Registration assistant for ISCA2012

Courses

Fall 2015	CS5110	Programming Languages and Logics
Spring 2014	CS5412	Cloud Computing
Fall 2013	CS5120	Introduction to Compilers
Spring 2013	ECE5430	System Security
Spring 2013	ECE5750	Advanced Microprocessor Architecture
Fall 2011	ECE5720	Parallel Computer Architecture
Fall 2011	CS4410	Operating Systems
Spring 2011	ECE5730	Memory Systems
Spring 2011	ECE4740	Digital VLSI
Fall 2010	ECE4750	Computer Architecture
Fall 2010	CS5722	Heuristic Methods for Optimization

Yao Wang 王瑶

Ph.D. Candidate, supervised by Prof. G. Edward Suh Computer Systems Laboratory (CSL) Electrical and Computer Engineering, Cornell University