# ECE 4750 Computer Architecture, Fall 2016 T11 Advanced Processors: Register Renaming

School of Electrical and Computer Engineering Cornell University

revision: 2016-11-18-15-42

| 1 | WAW and WAR Hazards                         | 2  |
|---|---------------------------------------------|----|
| 2 | IO2L Pointer-Based Register Renaming Scheme | 4  |
| 3 | IO2L Value-Based Register Renaming Scheme   | 10 |

# 1. WAW and WAR Hazards

| a: | mul  | x1, | x2, | xЗ |
|----|------|-----|-----|----|
| b: | mul  | x4, | x1, | x5 |
| c: | addi | x6, | x4, | 1  |
| d: | addi | x4, | x7, | 1  |

- RAW data hazards vs. WAW/WAR name hazards
  - RAW dependencies are "true" data dependencies because we actually pass data from the writer to the reader
  - WAW/WAR dependencies are not "true" data dependencies
  - WAW/WAR dependencies exist because of limited "names"
  - Can always avoid WAW/WAR hazards by renaming registers in software, but eventually we will run out of register names
  - Key Idea: Provide more "physical registers" and rename architectural to physical registers in hardware

### WAW/WAR name hazards in IO2L microarchitecture

| F        | ARF read write |     |                |      |   |   |   |   |   |   |   |   |    |       |    | ARF<br>C |        |
|----------|----------------|-----|----------------|------|---|---|---|---|---|---|---|---|----|-------|----|----------|--------|
| ARF      |                |     | read           | 1    |   |   |   |   |   |   |   |   |    |       |    | v        | vrite  |
| PRF      |                |     | read           |      |   |   |   |   |   |   |   |   |    | write |    | 1        | read   |
| SB<br>IQ | alloc          |     | ead/w<br>ad/de |      |   |   |   |   |   |   |   |   |    |       |    |          |        |
| ROB      | alloc          | rea | au/ue          | anoc |   |   |   |   |   |   |   |   |    | write | re | ad/de    | ealloc |
| ROD      | anoc           |     |                |      |   |   |   |   |   |   |   |   |    | write | ic | aa / aa  | anoc   |
|          |                |     | 0              | 1    | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11    | 12 | 13       | 14     |
| a:mul    | x1, x2,        | xЗ  |                |      |   |   |   |   |   |   |   |   |    |       |    |          |        |
| b:mul    | x4, x1,        | x5  |                |      |   |   |   |   |   |   |   |   |    |       |    |          |        |
| c:addi   | x6, x4,        | 1   |                |      |   |   |   |   |   |   |   |   |    |       |    |          |        |
| d : addi | x4, x7,        | 1   |                |      |   |   |   |   |   |   |   |   |    |       |    |          |        |

- Explore two different schemes
  - Store pointers in the IQ and ROB
  - Store values in the IQ and ROB
- For each scheme
  - overall pipeline structure
  - required hardware data-structures
  - example instruction sequence executing on microarchitecture
- Several simplifications
  - all designs are single issue
  - only support add, addi, mul

# 2. IO2L Pointer-Based Register Renaming Scheme



write

RT read/write

- Increase the size of the PRF to provide more "names"
- Add free list (FL) in D stage
  - FL holds list of unallocated physical registers
  - Physical registers allocated in D and deallocated in C
- Add rename table (RT) in D stage
  - RT maps architectural registers to physical registers
  - Sometimes called the "map table"
  - Destination register renamed in D stage
  - Look up renamed source registers in D, and write these physical register specifiers into the IQ
- Modify SB and ROB
  - Scoreboard indexed by physical reg instead of architectural reg
- NOTE: Values can only be bypassed or read from the PRF
- I/X/Y/W stages only manipulate physical registers

## Data Structures: FL, RT, Modified ROB



| Reorder Buffer |       |   |    |    |    |  |  |  |  |  |  |  |  |  |
|----------------|-------|---|----|----|----|--|--|--|--|--|--|--|--|--|
| v              | ppreg |   |    |    |    |  |  |  |  |  |  |  |  |  |
| 1              | 1     | 1 | p7 | p8 | p9 |  |  |  |  |  |  |  |  |  |
| 1              | 1     | 1 | p8 | p4 | p3 |  |  |  |  |  |  |  |  |  |
| 1              | 1     | 1 | p9 | p6 | p5 |  |  |  |  |  |  |  |  |  |
| 0              |       |   |    |    |    |  |  |  |  |  |  |  |  |  |

- Free List (FL)
  - free: one if corresponding preg is free
  - Use priority encoder to allocate first free preg
- Rename Table (RT)
  - **p**: pending bit, is a write to this areg in flight?
  - preg: what preg the corresponding areg maps to
  - Entries in RT are always valid
- Modified Reorder Buffer (ROB)
  - Include three fields with pointers to PRF and ARF
  - preg: pointer to register in PRF that holds result value
  - areg: pointer to register in ARF to copy value into
  - ppreg: pointer to previous register in PRF for this areg

Can only free a physical register when we can guarantee no reads of that physical register are still in flight!

### **Example Execution Diagrams**



| 2. IO2L Pointer-Based | Register l | Renaming Scheme |
|-----------------------|------------|-----------------|
|                       |            |                 |

|                       |              |                    |                 |             |           |           | œ          |   |   |          |    | 60        |    |          |          |            |                |
|-----------------------|--------------|--------------------|-----------------|-------------|-----------|-----------|------------|---|---|----------|----|-----------|----|----------|----------|------------|----------------|
|                       | ŝ            |                    |                 |             |           |           | p10*/x4/p8 | _ | _ | _        | -  | p10/x4/p8 | _  | _        | -        | •          |                |
| Buffer                | 2            |                    |                 |             |           | p9*/x6/p5 | _          | _ | _ | _        | _  | _         | _  | _        | p9/x6/p5 |            |                |
| <b>Reorder Buffer</b> | 1            |                    |                 |             | p8*/x4/p3 | _         | _          | _ | _ | _        | _  | _         | _  | p8/x4/p3 |          |            |                |
|                       | 0            |                    |                 | p7*/x1/p0   | _         | _         | _          | _ | _ | p7/x1/p0 |    |           |    |          |          |            |                |
|                       | 33           |                    |                 |             |           |           | p10/p6     | _ | • |          |    |           |    |          |          |            |                |
| Jueue                 | 2            |                    |                 |             |           | p9/p8*    | _          | _ |   | _        | _  | •         |    |          |          |            |                |
| Issue Queue           | 1            |                    |                 |             | p8/p7*/p4 | _         | _          | • |   |          |    |           |    |          |          |            |                |
|                       | 0            |                    |                 | p7/p1/p2    |           |           |            |   |   |          |    |           |    |          |          |            |                |
|                       | x7 Free List | p6 p7, p8, p9, p10 | p7, p8, p9, p10 | p8, p9, p10 | p9, p10   | p10       |            |   |   |          | p0 | p0        | p0 | p0       | p0, p3   | p0, p3, p5 | p0, p3, p5, p8 |
|                       | ×7           | p6                 |                 |             | _         | _         | _          | _ | _ | _        | _  |           | _  | _        |          | _          |                |
|                       | <b>x</b> 6   | p5                 | _               | _           | _         | *6d       | _          | _ | _ | _        | _  | _         | _  | _        | p9       | _          | _              |
| ble                   | ŝ            | p4                 | _               | _           | _         | _         | _          | _ | _ | _        | _  | _         | _  | _        | _        | _          | _              |
| <b>Rename Table</b>   | x4           | p3                 | _               | _           | p8*       |           | p10*       | _ | _ | _        | _  | p10       | _  | _        | _        | _          | _              |
| tenan                 | x3 x4        | 02                 | _               | _           | -         | _         | d          | _ | _ | _        | _  | -         | _  | _        | _        | _          | _              |
| H                     | x2           | p1 p2              | _               | _           | _         | _         | _          | _ | _ | _        | _  | _         | _  | _        | _        | _          | _              |
|                       | X            | l 0d               | _               | p7*         |           |           | _          | _ | _ | p7       | _  | _         |    |          | _        | _          | _              |
| -                     | ີ<br>ບ       |                    |                 | H           |           |           |            | - |   | a        |    |           |    | q        | υ        | q          |                |
|                       | A            |                    |                 |             |           |           |            |   | a |          | q  |           | q  | υ        |          |            |                |
|                       | I            |                    |                 | a           |           |           |            | q | q |          |    | υ         |    |          |          |            |                |
|                       | D            |                    | a               | q           | υ         | q         |            |   |   |          |    |           |    |          |          |            |                |
|                       | Cycle D      | 0                  |                 | 2           | e         | 4         | 5          | 9 | 2 | ∞        | 6  | 10        | 11 | 12       | 13       | 14         | 15             |

#### **Freeing Physical Registers**



# Unified Physical/Architectural Register File



- Combine the PRF and ARF into one large unified register file (URF)
- Replace ARF with an architectural rename table (ART)
- Instead of copying *values*, C stage simply copies the preg pointer into the appropriate entry of the ART
- URF can be smaller than area for separate PRF/ARF
- Sometimes in the literature URF is just called PRF (and there is no "real" ARF, just the ART)

# 3. IO2L Value-Based Register Renaming Scheme



RT read/write

- Instead of storing future values in a separate PRF, we store them these future values in the actual ROB
- No need for FL, since "physical registers" are now really ROB entry IDs and managed naturally through ROB allocation/deallocation
- Add rename table (RT) in D stage
  - RT maps architectural registers to physical registers
  - Registers renamed in D stage, entries cleared in C
  - Destination register renamed in D stage
  - Look up renamed source registers in D, and write these physical register specifiers into the IQ
- Modify scoreboard, IQ, ROB
  - Scoreboard indexed by preg instead of areg
- NOTE: Values can be bypassed or read from either the ROB or ARF
- I/X/Y/W stages only manipulate physical registers

# Data Structures: RT, Modified IQ, ROB



- Rename Table (RT)
  - v: valid bit
  - **p**: pending bit, is a write to this areg in flight?
  - preg: what preg the corresponding areg maps to
  - Entries are only valid if instruction is in-flight
  - Valid bit is cleared after instruction has committed
- Modified Issue Queue (IQ)
  - src0/src1: when pending bit is set, source fields contain the preg specifier (i.e., ROB entry ID) that we are waiting on; when pending bit is clear, source fields contain the *values*
- Modified Reorder Buffer (ROB)
  - Replace single rdest field with two new fields
  - value: actual result value
  - areg: pointer to register in ARF to copy value into

### **Example Execution Diagrams**



We can use a table to compactly illustrate how IO2L value-based register renaming works. We show the state of the RT and ROB at the beginning of every cycle.

| Cycle |   |   |   |   |     |    | Ren | ame T | able |     |            |          | Issue (   | Queue  |       | Reorder Buffer |        |        |        |
|-------|---|---|---|---|-----|----|-----|-------|------|-----|------------|----------|-----------|--------|-------|----------------|--------|--------|--------|
|       | D | Ι | w | С | x1  | x2 | x3  | x4    | x5   | x6  | <b>x</b> 7 | 0        | 1         | 2      | 3     | 0              | 1      | 2      | 3      |
| 0     |   |   |   |   |     |    |     |       |      |     |            |          |           |        |       |                |        |        |        |
| 1     | а |   |   |   |     |    |     |       |      |     |            |          |           |        |       |                |        |        |        |
| 2     | b | а |   |   | p0* |    |     |       |      |     |            | p0/x2/x3 |           |        |       | p0*/x1         |        |        |        |
| 3     | с |   |   |   |     |    |     | p1*   |      |     |            |          | p1/p0*/x5 |        |       |                | p1*/x4 |        |        |
| 4     | d |   |   |   |     |    |     |       |      | p2* |            |          |           | p2/p1* |       |                |        | p2*/x6 |        |
| 5     |   |   |   |   |     |    |     | p3*   |      |     |            |          |           | 1      | p3/r7 |                |        |        | p3*/x4 |
| 6     |   | b |   |   |     |    |     |       |      |     |            |          | •         | 1      |       |                |        |        |        |
| 7     |   |   | а |   |     |    |     |       |      |     |            |          |           | 1      |       |                |        |        |        |
| 8     |   | d |   | а | •   |    |     |       |      |     |            |          |           |        | •     | p0/x1          |        |        |        |
| 9     |   |   | d |   |     |    |     |       |      |     |            |          |           |        |       |                |        |        |        |
| 10    |   | с |   |   |     |    |     | p3    |      |     |            |          |           | •      |       |                |        |        | p3/x4  |
| 11    |   |   | b |   |     |    |     |       |      |     |            |          |           |        |       |                |        |        |        |
| 12    |   |   | с | b |     |    |     |       |      |     |            |          |           |        |       |                | p1/x4  |        |        |
| 13    |   |   |   | с |     |    |     |       |      | •   |            |          |           |        |       |                |        | p2/x6  |        |
| 14    |   |   |   | d |     |    |     | •     |      |     |            |          |           |        |       |                |        |        | •      |
| 15    |   |   |   |   |     |    |     |       |      |     |            |          |           |        |       |                |        |        |        |