# ECE 4750 Computer Architecture, Fall 2024 Topic 8: Advanced Processors Register Renaming

School of Electrical and Computer Engineering Cornell University

revision: 2024-12-02-13-40

Copyright © 2024 Anne Bracy. All rights reserved. This handout was prepared by Prof. Anne Bracy at Cornell University for ECE 4750 Computer Architecture (derived from previous handouts prepared and copyrighted by Prof. Christopher Batten). Download and use of this handout is permitted for individual educational non-commercial purposes only. Redistribution either in part or in whole via both commercial or non-commercial means requires written permission.

# 1. WAW and WAR Hazards

| a: | mul  | x1, | x2, | xЗ |
|----|------|-----|-----|----|
| b: | mul  | x4, | x1, | x5 |
| c: | addi | x6, | x4, | 1  |
| d: | addi | x4, | x7, | 1  |

- RAW data hazards vs. WAW/WAR name hazards
  - RAW dependencies are "true" data dependencies because we actually pass data from the writer to the reader
  - WAW/WAR dependencies are not "true" data dependencies
  - WAW/WAR dependencies exist because of limited "names"
  - Can always avoid WAW/WAR hazards by renaming registers in software, but eventually we will run out of register names
  - Key Idea: Provide more "physical registers" and rename architectural to physical registers in hardware

## WAW/WAR name hazards in IO2L microarchitecture

| F 7    | D       |    | SB<br>I |       |   | Y0<br>X |   | <u>í</u> 1 | []-[] | (2) | <b>}</b> | /3- |    | PRF<br>W | $\mathbf{R}$ | )в [4<br>1 [ | ARF<br>C |
|--------|---------|----|---------|-------|---|---------|---|------------|-------|-----|----------|-----|----|----------|--------------|--------------|----------|
| ARF    |         |    | read    | ł     |   |         |   |            |       |     |          |     |    |          |              | v            | vrite    |
| PRF    |         |    | read    |       |   |         |   |            |       |     |          |     |    | write    |              | 1            | ead      |
| SB     |         |    | ead/v   |       |   |         |   |            |       |     |          |     |    |          |              |              |          |
| IQ     | alloc   | re | ad/de   | alloc |   |         |   |            |       |     |          |     |    |          |              |              |          |
| ROB    | alloc   |    |         |       |   |         |   |            |       |     |          |     |    | write    | re           | ad/de        | ealloc   |
|        |         |    | 0       | 1     | 2 | 3       | 4 | 5          | 6     | 7   | 8        | 9   | 10 | 11       | 12           | 13           | 14       |
| a:mul  | x1, x2, | xЗ |         |       |   |         |   |            |       |     |          |     |    |          |              |              |          |
| b:mul  | x4, x1, | x5 |         |       |   |         |   |            |       |     |          |     |    |          |              |              |          |
| c:addi | x6, x4, | 1  |         |       |   |         |   |            |       |     |          |     |    |          |              |              |          |
| d:addi | x4, x7, | 1  |         |       |   |         |   |            |       |     |          |     |    |          |              |              |          |

- Explore two different schemes
  - Store pointers in the IQ and ROB
  - Store values in the IQ and ROB
- For each scheme
  - overall pipeline structure
  - required hardware data-structures
  - example instruction sequence executing on microarchitecture
- Several simplifications
  - all designs are single issue
  - only support add, addi, mul

# 2. IO2L Pointer-Based Register Renaming Scheme



write

- Increase the size of the PRF to provide more "names"
- Add free list (FL) in D stage

read/write

RT

- FL holds list of unallocated physical registers
- Physical registers allocated in D and deallocated in C
- Add rename table (RT) in D stage
  - RT maps architectural registers to physical registers
  - Sometimes called the "map table"
  - Destination register renamed in D stage
  - Look up renamed source registers in D, and write these physical register specifiers into the IQ
- Modify SB and ROB
  - Scoreboard indexed by physical reg instead of architectural reg
- NOTE: Values can only be bypassed or read from the PRF
- I/X/Y/W stages only manipulate physical registers

## Data Structures: FL, RT, Modified ROB



| st |   |   | Re | order l | Buffer |       |
|----|---|---|----|---------|--------|-------|
|    | v | р | v  | preg    | areg   | ppreg |
|    | 1 | 1 | 1  | p7      | x8     | p10   |
|    | 1 | 1 | 1  | p8      | x4     | p3    |
|    | 1 | 1 | 1  | p9      | x6     | p5    |
|    | 0 |   |    |         |        |       |
|    |   |   |    |         |        |       |

- Free List (FL)
  - free: one if corresponding preg is free
  - Use priority encoder to allocate first free preg
- Rename Table (RT)
  - **p**: pending bit, is a write to this areg in flight?
  - preg: what preg the corresponding areg maps to
  - Entries in RT are always valid
- Modified Reorder Buffer (ROB)
  - Include three fields with pointers to PRF and ARF
  - preg: pointer to register in PRF that holds result value
  - areg: pointer to register in ARF to copy value into
  - ppreg: pointer to previous register in PRF for this areg

Can only free a physical register when we can guarantee no reads of that physical register are still in flight!

# **Example Execution Diagrams**



| 2. IO2L Pointer-Based | Register | Renaming Scheme |
|-----------------------|----------|-----------------|
|-----------------------|----------|-----------------|

| Cycle D I   0 0 1 1   1 a 2 b a   3 c a a a   5 b a a a | B |        |       |       |       |       |     |            |                    |          |           | man Amon |        |           |           | TATING TAN TANK |            |
|---------------------------------------------------------|---|--------|-------|-------|-------|-------|-----|------------|--------------------|----------|-----------|----------|--------|-----------|-----------|-----------------|------------|
| a<br>d<br>d                                             |   | с<br>К | x1 x2 | x2 x3 | x3 x4 | x4 x5 | x6  | <b>x</b> 7 | x7 Free List       | 0        | 1         | 2        | 3      | 0         | 1         | 2               | 3          |
| d c b a                                                 |   | pl     | p0 p1 | p1 p2 | 2 p3  | p4    | p5  | p6         | p6 p7, p8, p9, p10 |          |           |          |        |           |           |                 |            |
| q v p                                                   |   | _      | _     | -     | -     | -     | -   | -          | p7, p8, p9, p10    |          |           |          |        |           |           |                 |            |
|                                                         |   | p7*    | *     | -     | -     | -     | -   | -          | p8, p9, p10        | p7/p1/p2 |           |          |        | p7*/x1/p0 |           |                 |            |
|                                                         |   | _      | -     | -     | p8*   | -     | -   | -          | p9, p10            |          | p8/p7*/p4 |          |        | _         | p8*/x4/p3 |                 |            |
| c.                                                      |   | _      | -     | -     | -     | -     | *6d | -          | p10                |          | _         | p9/p8*   |        | _         | _         | p9*/x6/p5       |            |
| ,                                                       |   | _      | -     | -     | p10*  | -     | -   | -          |                    |          | _         | _        | p10/p6 | _         | _         | _               | p10*/x4/p8 |
| 9 p                                                     |   | _      | -     | -     | -     | -     | -   | -          |                    |          | •         | _        | _      | _         | _         | _               | _          |
| 7 d                                                     | e | _      | -     | -     | -     | -     | -   | -          |                    |          |           | _        | •      | _         | _         | _               |            |
| ∞                                                       |   | a p7   | -     | -     | -     | -     | -   | -          |                    |          |           | _        |        | p7/x1/p0  | _         | _               |            |
| 6                                                       | q | _      | -     | -     | -     | -     | -   | -          | p0                 |          |           | _        |        |           | _         | _               | _          |
| 10 c                                                    |   | _      | _     | -     | p10   | -     | -   | -          | p0                 |          |           | •        |        |           | _         | _               | p10/x4/p8  |
| 11                                                      | q | _      | -     | -     | -     | -     | -   | -          | p0                 |          |           |          |        |           | _         | _               |            |
| 12                                                      | υ | q      | -     | -     | -     | -     | -   | -          | p0                 |          |           |          |        |           | p8/x4/p3  | _               |            |
| 13                                                      |   | -<br>- | _     | -     | -     | -     | p9  | -          | p0, p3             |          |           |          |        |           |           | p9/x6/p5        | _          |
| 14                                                      |   | q<br>p | -     | -     | -     | -     | -   | -          | p0, p3, p5         |          |           |          |        |           |           |                 | •          |
| 15                                                      |   | _      | _     | -     | -     | _     | -   | -          | p0, p3, p5, p8     |          |           |          |        |           |           |                 |            |

#### **Freeing Physical Registers**



## Unified Physical/Architectural Register File



- Combine the PRF and ARF into one large unified register file (URF)
- Replace ARF with an architectural rename table (ART)
- Instead of copying *values*, C stage simply copies the preg pointer into the appropriate entry of the ART
- URF can be smaller than area for separate PRF/ARF
- Sometimes in the literature URF is just called PRF (and there is no "real" ARF, just the ART)

# 3. IO2L Value-Based Register Renaming Scheme



- Instead of storing future values in a separate PRF, we store them these future values in the actual ROB
- No need for FL, since "physical registers" are now really ROB entry IDs and managed naturally through ROB allocation/deallocation
- Add rename table (RT) in D stage
  - RT maps architectural registers to physical registers
  - Registers renamed in D stage, entries cleared in C
  - Destination register renamed in D stage
  - Look up renamed source registers in D, and write these physical register specifiers into the IQ
- Modify scoreboard, IQ, ROB
  - Scoreboard indexed by preg instead of areg
- NOTE: Values can be bypassed or read from either the ROB or ARF
- I/X/Y/W stages only manipulate physical registers

# Data Structures: RT, Modified IQ, ROB



- Rename Table (RT)
  - v: valid bit
  - **p**: pending bit, is a write to this areg in flight?
  - preg: what preg the corresponding areg maps to
  - Entries are only valid if instruction is in-flight
  - Valid bit is cleared after instruction has committed
- Modified Issue Queue (IQ)
  - src0/src1: when pending bit is set, source fields contain the preg specifier (i.e., ROB entry ID) that we are waiting on; when pending bit is clear, source fields contain the *values*
- Modified Reorder Buffer (ROB)
  - Replace single rdest field with two new fields
  - value: actual result value
  - areg: pointer to register in ARF to copy value into

## **Example Execution Diagrams**



We can use a table to compactly illustrate how IO2L value-based register renaming works. We show the state of the RT and ROB at the beginning of every cycle.

|       |   |   |   |   |     |    | Ren | ame T | able |     |            |          | Issue (   | Queue  |       |        | Reorde | r Buffer |        |
|-------|---|---|---|---|-----|----|-----|-------|------|-----|------------|----------|-----------|--------|-------|--------|--------|----------|--------|
| Cycle | D | I | w | С | x1  | x2 | x3  | x4    | x5   | x6  | <b>x</b> 7 | 0        | 1         | 2      | 3     | 0      | 1      | 2        | 3      |
| 0     |   |   |   |   |     |    |     |       |      |     |            |          |           |        |       |        |        |          |        |
| 1     | а |   |   |   |     |    |     |       |      |     |            |          |           |        |       |        |        |          |        |
| 2     | b | а |   |   | p0* |    |     |       |      |     |            | p0/x2/x3 |           |        |       | p0*/x1 |        |          |        |
| 3     | с |   |   |   |     |    |     | p1*   |      |     |            |          | p1/p0*/x5 |        |       |        | p1*/x4 |          |        |
| 4     | d |   |   |   |     |    |     |       |      | p2* |            |          |           | p2/p1* |       |        |        | p2*/x6   |        |
| 5     |   |   |   |   |     |    |     | p3*   |      |     |            |          |           | 1      | p3/r7 |        |        |          | p3*/x4 |
| 6     |   | b |   |   |     |    |     |       |      |     |            |          | •         | 1      |       |        |        |          |        |
| 7     |   |   | а |   |     |    |     |       |      |     |            |          |           | 1      |       |        |        |          |        |
| 8     |   | d |   | а | •   |    |     |       |      |     |            |          |           |        | •     | p0/x1  |        |          |        |
| 9     |   |   | d |   |     |    |     |       |      |     |            |          |           |        |       |        |        |          |        |
| 10    |   | с |   |   |     |    |     | p3    |      |     |            |          |           | •      |       |        |        |          | p3/x4  |
| 11    |   |   | b |   |     |    |     |       |      |     |            |          |           |        |       |        |        |          |        |
| 12    |   |   | с | b |     |    |     |       |      |     |            |          |           |        |       |        | p1/x4  |          |        |
| 13    |   |   |   | с |     |    |     |       |      | •   |            |          |           |        |       |        |        | p2/x6    |        |
| 14    |   |   |   | d |     |    |     | •     |      |     |            |          |           |        |       |        |        |          | •      |
| 15    |   |   |   |   |     |    |     |       |      |     |            |          |           |        |       |        |        |          |        |