# ECE 4750 Computer Architecture, Fall 2016

# **T17 Advanced Processors:** Multithreaded Processors

## School of Electrical and Computer Engineering Cornell University

revision: 2016-12-02-09-26

| 1 | Multithreading Overview     | 2 |
|---|-----------------------------|---|
| 2 | Vertical Multithreading     | 3 |
| 3 | Simultaneous Multithreading | 6 |





### FIME-CRAW MULTUREADING

MANDWAM SUPPORT TO ENABLE INTERECTION MULTIPLE THREADS OF A SINGLE COM AT A VEY FIRE STABULARITY.

we will Discuss Two VALIANTS of FIRE-CILAIN MULTIMEADING:

- VENTICAL MULTITHREADING
- SIMULTANEOUS MULTIPUREADING (SMT)

## VERTICAL MULTITUREADING



#### MT COOK EXAMPLE

```
5 = thread -id;

Start = j * (n/nthreads);

for (int i = start; i < n/nthreads; i++);

B[i] = A[i] * C;
```

#### VERTICAL MULTIMIZE ADING MICHO ACCUITECTURE



Completely with loss FDILOWA 10: (w r1,0(2) Use Deray latercy FOIDHW 11: (W 11,0(2) 40: Mul 13,11,14 F D. Dro 11 7LMDW MEDST 1071 +TEDW TI: NUL 13, 11, 14 10: 5w (3,0.(15) FMITEDSO/SIW FDDDDESIW 300 (3,0(15) 11: FFF DIIXV 70: ADDU (2, -2, 4 FDDIXW TI: ADOW 12124 FFDIXN 70: ADD 15, 15, 4 Here HIDE FDIXW TI: ADON 15,15,7 MULTIPLIER RAW TO: MON -7.17.1 FDIXW Reizy it every FDIXN TI: AMN ,7,7,-1 To: byte of losp FDIXW pots 17, 1000 · FDIXW FD IX F 0 / COULD POTENTIALLY REMOVE HELP MIDE BRANCH X-I BUPASS PATHI RUSOLUTION DELAY LATERCY

#### SCHEDULING POLICIES

- 1. STATIC FIXED INTERGOVING
  - EACH OF A MARADS EXECUTE ONE INSTRUCTION EVENT A CHELS
  - IF TUREAD IS NOT READY TO go CON eine:
    - STALL WILL FRAT- WD
    - INSERT TUBBLE, but DO NOT STAIL FIRST- END
  - CAN POTENTIAlly Eliminate interlocking + Bypass Nerwork
- 2. Dyramic intalcaving
  - HAID WARE KEEPS TIACK OF WHICH THREADS AN READY
  - PICKS NEXT THREAD TO EXECUTE BASES ON PRIORITY SCHEME
- 3. COANSE-CIRAIN HANDWAKE INTERGOVING
  - USE THREADS TO MIDE OCCASSIONAL CACHE MISS LATERCY
    - 1. 012012012012012
    - 7. 010100122001212
    - 3 [000000111111111] CACHE MIST

# SIMULTANEOUS MULTITHREADING (SMT)



ON THIS CYCLE WE ARE
ISSUING FOUN ISTIVETIONS
FROM FLINE THREADS AT
THE SAME TIME

SMT USES THE FINE GRAW CONTROL ALREADY PRISENT IN AN OOD SUPESCULAR procession TO Allow instructions from Different Threads to issue At the SAME TIME

ADD multiple Feron enginer to enable ferening + Decoding instructions from Different AREDOS

TO DOES NOT KNOW ATSOUT THREADS. SIMPLY FINDS
INSTRUCTIONS THAT ARE MADY TO 155UR - THESE
INSTRUCTIONS MAY OF MAY NOT be from DIFFERENT THREADS

## \* SMT ADAPTS TO PAIGHTELISM TYPE

- FOT APPLICATIONS WITH HIGH FLP BUT NO TLP,
  APP CAN use entire WIDTH of me Machine
- FOR APPLICATIONS WITH HIGH TEP BUT LESS ILP,
  THE WIDTH OF THE MACHINE IS SHARD GETSS THROUGH



AS WITH VOLITICAL MT, ARCHITECTUAL
STATE MUST TRE DEPLICATED

MICROARCUITECTURAL STATE CAN PITHER BE

- DUPLICATED AT DESIGN TIME
- HARD PARTITIONES AT BOT TIME
- DINAMICACY SUARED AT EXECUTION TIME

IQ, PUR, LSQ

THREAD SCHEDULING

FETCH FROM THREAD WITH THE LEAST WARDERTONS IN FLIGHT

Draw a pipeline diagram for the assembly loop to the right executing on a dual-issue IO2L microarchitecture with register renaming, memory disambiguation, perfect branch prediction, and *two SMT threads*. Draw the diagram to illustrate how both threads simultaneously execute the first iteration of the loop.

lw x1, 0(x2)
mul x3, x1, x4
sw x3, 0(x5)
addi x2, x2, 4
addi x5, x5, 4
addi x7, x7, -1
bgtz x7, loop

