This course aims to provide a strong foundation for students to understand modern computer system architecture and to apply these insights and principles to future computer designs. The course is structured around the three primary building blocks of general-purpose computing systems: processors, memories, and networks.

The first half of the course focuses on the fundamentals of each building block. Topics include instruction set architecture; single-cycle, FSM, and pipelined processor microarchitecture; direct-mapped vs.~set-associative cache memories; memory protection, translation, and virtualization; FSM and pipelined cache microarchitecture; cache optimizations; network topology and routing; buffer, channel, and router microachitecture; and integrating processors, memories, and networks. The second half of the course delves into more advanced techniques and will enable students to understand how these three building blocks can be integrated to build a modern shared-memory multicore system. Topics include superscalar execution, out-of-order execution, register renaming, memory disambiguation, branch prediction, and speculative execution; multithreaded, VLIW, and SIMD processors; non-blocking cache memories; and memory synchronization, consistency, and coherence. Students will learn how to evaluate design decisions in the context of past, current, and future application requirements and technology constraints.

A significant project is decomposed into four lab assignments. Throughout the semester, students will gradually design, implement, test, and evaluate a complete multicore system capable of running simple parallel applications at the register-transfer level.