Cornell University
School of Electrical and Computer Engineering
ECE 4750 Computer Architecture
Fall 2016
Prof. Christopher Batten
255 Olin Hall • Monday and Wednesday • 2:55–4:10pm
home | details | schedule | videos | readings | handouts | resources
VideoNote
Professionally indexed videos of each lecture are available through the VideoNote service to those students enrolled in the course. The videos are usually posted within a day or two. You can find the complete collection of videos by going to the following URL:
Direct Video Links
Direct links to each video are also included below. The videos are often on VideoNote before I add them to this list, so be sure to check VideoNote if the link is not included below.
Wednesday, August 24
- Course Overview
Monday, August 29
- T01: Fundamental Processor Concepts
- 1. Instruction Set Architecture
- 1.1. IBM 360 Instruction Set Architecture
- 1.2. MIPS32 Instruction Set Architecture
- 1.3. Tiny RISC-V Instruction Set Architecture
- 2. Processor Functional-Level Model
- 2.1. Transactions and Steps
Wednesday, August 31
- T01: Fundamental Processor Concepts
- 2. Processor Functional-Level Model
- 2.2. Simple Assembly Example
- 2.2. TinyRV1 Vector-Vector Assembly and C Program
- 2.3. TinyRV1 Mystery Assembly and C Program
- 3. Processor/Laundry Analogy
- 3.1. Architecture vs. Microarchitecture vs. VLSI Implementation
- 3.2. Processor Microarchitectural Design Patterns
- 3.3. Transaction Diagrams
- 4. Analyzing Processor Performance
Wednesday, September 7
- T02: Fundamental Processor Microarchitecture
- 1. Processor Microarchitectural Design Patterns
- 1.1. Transactions and Steps
- 1.2. Microarchitecture: Control/Datapath Split
- 2. TinyRV1 Single-Cycle Processors
- 2.1. High-Level Idea for Single-Cycle Processors
- 2.2. Single-Cycle Processor Datapath
- 2.3. Single-Cycle Processor Control Unit
- 2.4. Analyzing Performance
Monday, September 12
- T02: Fundamental Processor Microarchitecture
- 3. TinyRV1 FSM Processor
- 3.1. High-Level Idea for FSM Processors
- 3.2. FSM Processor Datapath
- 3.3. FSM Processor Control Unit
- 3.4. Analyzing Performance
Wednesday, September 14
- T02: Fundamental Processor Microarchitecture
- 4. TinyRV1 Pipelined Processor
- 4.1. High-Level Idea for Pipelined Processors
- 4.2. Pipelined Processor Datapath and Control Unit
- 5. Pipeline Hazards: RAW Data Hazards
- 5.1. Software Scheduling
- 5.2. Hardware Stalling
Monday, September 19
- T02: Fundamental Processor Microarchitecture
- 5. Pipeline Hazards: RAW Data Hazards
- 5.2. Hardware Stalling
- 5.3. Hardware Bypassing
- 5.4. RAW Data Hazards Through Memory
- 6. Pipeline Hazards: Control Hazards
- 6.1. Software Scheduling
- 6.2. Hardware Speculation
Wednesday, September 21
- T02: Fundamental Processor Microarchitecture
- 6. Pipeline Hazards: Control Hazards
- 6.1. Software Scheduling
- 6.2. Hardware Speculation
- 7. Pipeline Hazards: Structural Hazards
- 7.1. Software Scheduling
- 7.2. Hardware Stalling
- 7.3. Hardware Duplication
Monday, September 26
- T03: Fundamental Processor Microarchitecture
- 8. Pipeline Hazards: WAR and WAW Name Hazards
- 8.1. Software Renaming
- 8.2. Hardware Stalling
- 9. Summary of Processor Performance
- 10. Case Study: Transition from CISC to RISC
- T03: Fundamental Memory Concepts
- 1. Memory/Library Analogy
- 1.1. Three Example Scenarios
Tuesday, September 27
- T03: Fundamental Memory Concepts
- 1. Memory/Library Analogy
- 1.2. Memory Technology
- 1.3. Cache Memories in Computer Architecture
- 2. Cache Memory Concepts
- 2.1. Single-Line Caches
- 2.2. Multi-Line Caches
- 2.3. Replacement Policies
Monday, October 3
- T03: Fundamental Memory Concepts
- 2. Cache Memory Concepts
- 2.4. Write Policies
- 2.5. Categorizing Misses
- 3. Memory Translation, Protection, and Virtualization
- 3.1. Memory Translation
- 3.2. Memory Protection
- 3.3. Memory Virtualization
Wednesday, October 5
- T03: Fundamental Memory Concepts
- 4. Analyzing Memory Performance
- T04: Fundamental Memory Microarchitecture
- 1. Memory Microarchitectural Design Patterns
- 1.1. Transactions and Steps
- 1.2. Microarchitecture Overview
- 2. FSM Cache
- 2.1. High-Level Idea for FSM Cache
- 2.2. FSM Cache Datapath
- 2.2. FSM Cache Datapath
- 2.3. FSM Cache Control Unit
- 2.4. Analyzing Performance
Wednesday, October 12
- T04: Fundamental Memory Microarchitecture
- 3.2. Pipelined Cache Datapath and Control Unit
- 3.3. Analyzing Performance
- 3.4. Pipelined Cache with TLB
- 4. Cache Microarchitecture Optimizations
- 4.1. Reduce Hit Latency
- 4.2. Reduce Miss Rate
- 4.3. Reduce Miss Penalty
- 4.4. Cache Optimization Summary
- 5. Case Study: ARM Cortex A8 and Intel Core i7
- 5.1. ARM Cortex A8
- 5.2. Intel Core i7
Monday, October 17
- T05: Integrating Processors and Memories
- 1. Processor and L1 Cache Interface
- 2. Analyzing Processor + Cache Performance
- 3. Case Study: MIPS R4000
- T06: Fundamental Network Concepts
- 1. Network/Roadway Analogy
- 1.1. Running Errands
- 1.2. Network Technology
- 1.3. Networks in Computer Architecture
- 2. Network Topology
- 2.1. Single-Stage Bus Topology
- 2.2. Single-Stage Crossbar Topology
- 2.3. Multi-Stage Butterfly Topology
Wednesday, October 19
- T06: Fundamental Network Concepts
- 2. Network Topology
- 2.4. Multi-Stage Torus Topology
- 3. Network Routing
- 3.1. Oblivious Deterministic Routing
- 3.2. Oblivious Non-Deterministic Routing
- 3.3. Adaptive Routing
- 3.4. Deadlock
- 4. Analyzing Network Performance
- 4.1. Traffic Patterns
- 4.2. Ideal Throughput
Monday, October 24
- T06: Fundamental Network Concepts
- 4.3. Zero-Load Latency
- 4.4. Comparing Topologies
- 4.5. Comparing Routing Algorithms
Wednesday, October 26
- T07: Fundamental Network Microarchitecture
- 1. Buffer Microarchitecture
- 1.1. Normal Queues
- 1.2. Pipe Queues
- 1.3. Bypass Queues
- 1.4. Composing Queues
- 2. Channel Microarchitecture
- 2.1. On-Off Flow-Control
Monday, October 31
- T07: Fundamental Network Microarchitecture
- 2.2. Elastic Buffer Flow-Control
- 2.3. Store-and-Forward Flow-Control
- 2.4. Virtual-Cut-Through Flow-Control
- 3. Router Microarchitecture
- 3.1. Pipelined Router
- 3.2. Arbitration
- T09: Advanced Processors – Superscalar Execution
- 1. In-Order Dual-Issue Superscalar TinyRV1 Processor
- 2. Superscalar Pipeline Hazards
- 2.1. RAW Hazards
Wednesday, November 2
- T09: Advanced Processors – Superscalar Execution
- 2.2. Control Hazards
- 2.3. Structural Hazards
- 2.4. WAW and WAR Name Hazards
- 3. Analyzing Performance of Superscalar Processors
- T10: Advanced Processors – Out-of-Order Execution
- 1. Incremental Approach to Exploring OOO Execution
- 2. I3L: IO Front-End/Issue/Completion, Late Commit
- 3. I2OE: IO Front-End/Issue, OOO Completion, Early Commit
Monday, November 7
- T10: Advanced Processors – Out-of-Order Execution
- 4. I2OL: IO Front-End/Issue, OOO Completion, Late Commit
- 5. IO2E: IO Front-End, OOO Issue/Completion, Early Commit
Wednesday, November 9
- T10: Advanced Processors – Out-of-Order Execution
- 6. IO2L: IO Front-End, OOO Issue/Completion, Late Commit
- T11: Advanced Processors – Register Renaming
- 1. WAW and WAR Hazards
- 2. IO2L Pointer-Based Register Renaming Scheme
Monday, November 14
- T11: Advanced Processors – Register Renaming
- 2. IO2L Value-Based Register Renaming Scheme
- T12: Advanced Processors – Memory Disambiguation
- 1. Adding Memory Instructions to an OOO Processor
- 2. In-Order Load/Store Issue with Unified Stores
Wednesday, November 16
- T12: Advanced Processors – Memory Disambiguation
- 3. In-Order Load/Store Issue with Split Store
- 4. Out-of-Order Load/Store Issue
- T13: Advanced Processors – Branch Prediction
- 1. Branch Prediction Overview
- 3. Hardware-Based Branch Prediction
- 3.1. Fixed Prediction
- 3.2. Branch History Table (BHT) Predictor
Monday, November 21
- T13: Advanced Processors – Branch Prediction
- 3. Hardware-Based Branch Prediction
- 3.2. Branch History Table (BHT) Predictor
- 3.3. Two-Level Predictor for Temporal Correlation
- 3.4. Two-Level Predictor for Spatial Correlation
- 3.5. Generalized Two-Level Predictors
- 3.6. Tournament Predictors
- 3.7. Branch Target Buffer (BTB) Predictor
- T14: Advanced Processors – Speculative Execution
- 1. Speculative Execution with Late Recovery
- 2. Speculative Execution with Early Recovery
- 2.1. Adding Speculative Bits
Monday, November 28
- T14: Advanced Processors – Speculative Execution
- 1. Speculative Execution with Late Recovery
- 2. Speculative Execution with Early Recovery
- 2.1. Adding Speculative Bits
- 2.2. Adding Rename-Table Snapshots
- 3. Complete Out-of-Order Superscalar TinyRV2 Processor
- T13: Advanced Processors – Branch Prediction
- 2. Software-Based Branch Prediction
- 2.1. Static Software Hints
- 2.2. Branch Delay Slots
- 2.3. Predication
- T15: Advanced Processors – VLIW Processors
- 1. Motivating VLIW Processors
Wednesday, November 30
- T15: Advanced Processors – VLIW Processors
- 2. TinyRV1 VLIW Processor
- 3. VLIW Compilation Techniques
- 3.1. Loop Unrolling
- 3.2. Software Pipelining
- 3.3. Loop Unrolling and Software Pipelining
- 3.4. Other Compiler Techniques
Addendum
- T08: Integrating Processor, Memories, and Networks
- 1. Mem+Net: Banked Memory Systems
- 2. Proc+Net: Message-Passing Systems
- 3. Proc+Mem+Net: Shared-Memory Systems
- 4. Memory Synchronization, Consistency, and Coherence
- 4.1. Memory Synchronization
- 4.2. Memory Consistency
- 4.3. Memory Coherence