ECE 4750: Computer Architecture

VideoNote

Professionally indexed videos of each lecture are available through the VideoNote service to those students enrolled in the course. The videos are usually posted within a day or two. You can find the complete collection of videos by going to the following URL:

http://cornell.videonote.com/channels/1005/videos

Direct Video Links

Direct links to each video are also included below. The videos are often on VideoNote before I add them to this list, so be sure to check VideoNote if the link is not included below.

Wednesday, August 24

Course Overview

Monday, August 29

T01: Fundamental Processor Concepts

1. Instruction Set Architecture
1.1. IBM 360 Instruction Set Architecture
1.2. MIPS32 Instruction Set Architecture
1.3. Tiny RISC-V Instruction Set Architecture
2. Processor Functional-Level Model
2.1. Transactions and Steps

Wednesday, August 31

T01: Fundamental Processor Concepts

2. Processor Functional-Level Model
2.2. Simple Assembly Example
2.2. TinyRV1 Vector-Vector Assembly and C Program
2.3. TinyRV1 Mystery Assembly and C Program
3. Processor/Laundry Analogy
3.1. Architecture vs. Microarchitecture vs. VLSI Implementation
3.2. Processor Microarchitectural Design Patterns
3.3. Transaction Diagrams
4. Analyzing Processor Performance

Wednesday, September 7

T02: Fundamental Processor Microarchitecture

1. Processor Microarchitectural Design Patterns
1.1. Transactions and Steps
1.2. Microarchitecture: Control/Datapath Split
2. TinyRV1 Single-Cycle Processors
2.1. High-Level Idea for Single-Cycle Processors
2.2. Single-Cycle Processor Datapath
2.3. Single-Cycle Processor Control Unit
2.4. Analyzing Performance

Monday, September 12

T02: Fundamental Processor Microarchitecture

3. TinyRV1 FSM Processor
3.1. High-Level Idea for FSM Processors
3.2. FSM Processor Datapath
3.3. FSM Processor Control Unit
3.4. Analyzing Performance

Wednesday, September 14

T02: Fundamental Processor Microarchitecture

4. TinyRV1 Pipelined Processor
4.1. High-Level Idea for Pipelined Processors
4.2. Pipelined Processor Datapath and Control Unit
5. Pipeline Hazards: RAW Data Hazards
5.1. Software Scheduling
5.2. Hardware Stalling

Monday, September 19

T02: Fundamental Processor Microarchitecture

5. Pipeline Hazards: RAW Data Hazards
5.2. Hardware Stalling
5.3. Hardware Bypassing
5.4. RAW Data Hazards Through Memory
6. Pipeline Hazards: Control Hazards
6.1. Software Scheduling
6.2. Hardware Speculation

Wednesday, September 21

T02: Fundamental Processor Microarchitecture

6. Pipeline Hazards: Control Hazards
6.1. Software Scheduling
6.2. Hardware Speculation
7. Pipeline Hazards: Structural Hazards
7.1. Software Scheduling
7.2. Hardware Stalling
7.3. Hardware Duplication

Monday, September 26

T03: Fundamental Processor Microarchitecture

8. Pipeline Hazards: WAR and WAW Name Hazards
8.1. Software Renaming
8.2. Hardware Stalling
9. Summary of Processor Performance
10. Case Study: Transition from CISC to RISC

T03: Fundamental Memory Concepts

1. Memory/Library Analogy
1.1. Three Example Scenarios

Tuesday, September 27

T03: Fundamental Memory Concepts

1. Memory/Library Analogy
1.2. Memory Technology
1.3. Cache Memories in Computer Architecture
2. Cache Memory Concepts
2.1. Single-Line Caches
2.2. Multi-Line Caches
2.3. Replacement Policies

Monday, October 3

T03: Fundamental Memory Concepts

2. Cache Memory Concepts
2.4. Write Policies
2.5. Categorizing Misses
3. Memory Translation, Protection, and Virtualization
3.1. Memory Translation
3.2. Memory Protection
3.3. Memory Virtualization

Wednesday, October 5

T03: Fundamental Memory Concepts

4. Analyzing Memory Performance

T04: Fundamental Memory Microarchitecture

1. Memory Microarchitectural Design Patterns
1.1. Transactions and Steps
1.2. Microarchitecture Overview
2. FSM Cache
2.1. High-Level Idea for FSM Cache
2.2. FSM Cache Datapath
2.2. FSM Cache Datapath
2.3. FSM Cache Control Unit
2.4. Analyzing Performance

Wednesday, October 12

T04: Fundamental Memory Microarchitecture

3.2. Pipelined Cache Datapath and Control Unit
3.3. Analyzing Performance
3.4. Pipelined Cache with TLB
4. Cache Microarchitecture Optimizations
4.1. Reduce Hit Latency
4.2. Reduce Miss Rate
4.3. Reduce Miss Penalty
4.4. Cache Optimization Summary
5. Case Study: ARM Cortex A8 and Intel Core i7
5.1. ARM Cortex A8
5.2. Intel Core i7

Monday, October 17

T05: Integrating Processors and Memories

1. Processor and L1 Cache Interface
2. Analyzing Processor + Cache Performance
3. Case Study: MIPS R4000

T06: Fundamental Network Concepts

1. Network/Roadway Analogy
1.1. Running Errands
1.2. Network Technology
1.3. Networks in Computer Architecture
2. Network Topology
2.1. Single-Stage Bus Topology
2.2. Single-Stage Crossbar Topology
2.3. Multi-Stage Butterfly Topology

Wednesday, October 19

T06: Fundamental Network Concepts

2. Network Topology
2.4. Multi-Stage Torus Topology
3. Network Routing
3.1. Oblivious Deterministic Routing
3.2. Oblivious Non-Deterministic Routing
3.3. Adaptive Routing
3.4. Deadlock
4. Analyzing Network Performance
4.1. Traffic Patterns
4.2. Ideal Throughput

Monday, October 24

T06: Fundamental Network Concepts

4.3. Zero-Load Latency
4.4. Comparing Topologies
4.5. Comparing Routing Algorithms

Wednesday, October 26

T07: Fundamental Network Microarchitecture

1. Buffer Microarchitecture
1.1. Normal Queues
1.2. Pipe Queues
1.3. Bypass Queues
1.4. Composing Queues
2. Channel Microarchitecture
2.1. On-Off Flow-Control

Monday, October 31

T07: Fundamental Network Microarchitecture

2.2. Elastic Buffer Flow-Control
2.3. Store-and-Forward Flow-Control
2.4. Virtual-Cut-Through Flow-Control
3. Router Microarchitecture
3.1. Pipelined Router
3.2. Arbitration

T09: Advanced Processors – Superscalar Execution

1. In-Order Dual-Issue Superscalar TinyRV1 Processor
2. Superscalar Pipeline Hazards
2.1. RAW Hazards

Wednesday, November 2

T09: Advanced Processors – Superscalar Execution

2.2. Control Hazards
2.3. Structural Hazards
2.4. WAW and WAR Name Hazards
3. Analyzing Performance of Superscalar Processors

T10: Advanced Processors – Out-of-Order Execution

1. Incremental Approach to Exploring OOO Execution
2. I3L: IO Front-End/Issue/Completion, Late Commit
3. I2OE: IO Front-End/Issue, OOO Completion, Early Commit

Monday, November 7

T10: Advanced Processors – Out-of-Order Execution

4. I2OL: IO Front-End/Issue, OOO Completion, Late Commit
5. IO2E: IO Front-End, OOO Issue/Completion, Early Commit

Wednesday, November 9

T10: Advanced Processors – Out-of-Order Execution

6. IO2L: IO Front-End, OOO Issue/Completion, Late Commit

T11: Advanced Processors – Register Renaming

1. WAW and WAR Hazards
2. IO2L Pointer-Based Register Renaming Scheme

Monday, November 14

T11: Advanced Processors – Register Renaming

2. IO2L Value-Based Register Renaming Scheme

T12: Advanced Processors – Memory Disambiguation

1. Adding Memory Instructions to an OOO Processor
2. In-Order Load/Store Issue with Unified Stores

Wednesday, November 16

T12: Advanced Processors – Memory Disambiguation

3. In-Order Load/Store Issue with Split Store
4. Out-of-Order Load/Store Issue

T13: Advanced Processors – Branch Prediction

1. Branch Prediction Overview
3. Hardware-Based Branch Prediction
3.1. Fixed Prediction
3.2. Branch History Table (BHT) Predictor

Monday, November 21

T13: Advanced Processors – Branch Prediction

3. Hardware-Based Branch Prediction
3.2. Branch History Table (BHT) Predictor
3.3. Two-Level Predictor for Temporal Correlation
3.4. Two-Level Predictor for Spatial Correlation
3.5. Generalized Two-Level Predictors
3.6. Tournament Predictors
3.7. Branch Target Buffer (BTB) Predictor

T14: Advanced Processors – Speculative Execution

1. Speculative Execution with Late Recovery
2. Speculative Execution with Early Recovery
2.1. Adding Speculative Bits

Monday, November 28

T14: Advanced Processors – Speculative Execution

1. Speculative Execution with Late Recovery
2. Speculative Execution with Early Recovery
2.1. Adding Speculative Bits
2.2. Adding Rename-Table Snapshots
3. Complete Out-of-Order Superscalar TinyRV2 Processor

T13: Advanced Processors – Branch Prediction

2. Software-Based Branch Prediction
2.1. Static Software Hints
2.2. Branch Delay Slots
2.3. Predication

T15: Advanced Processors – VLIW Processors

1. Motivating VLIW Processors

Wednesday, November 30

T15: Advanced Processors – VLIW Processors

2. TinyRV1 VLIW Processor
3. VLIW Compilation Techniques
3.1. Loop Unrolling
3.2. Software Pipelining
3.3. Loop Unrolling and Software Pipelining
3.4. Other Compiler Techniques

Addendum

T08: Integrating Processor, Memories, and Networks

1. Mem+Net: Banked Memory Systems
2. Proc+Net: Message-Passing Systems
3. Proc+Mem+Net: Shared-Memory Systems
4. Memory Synchronization, Consistency, and Coherence
4.1. Memory Synchronization
4.2. Memory Consistency
4.3. Memory Coherence

web analytics