STC1 and the Scale VT Processor Die Photos


On the right is a small test chip (STC1) with a simple RISC processor, 8KB of RAM, a host interface, and custom clock generator [CSAIL'07]. STC1 was fabricated in a 180 nm technology using an ASIC-style design flow augmented with procedural fine-grain standard-cell placement of datapaths and small memory arrays. STC1 helped us prepare to fabricate the Scale Vector-Thread Processor shown on the right [TODAES'08]. Ronny Krashinsky and I designed and fabricated Scale in 2007, while we were both graduate students at MIT. Scale is shown above and includes a RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and support up to 128 simultaneously active threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-bank, non-blocking, 32-way set-associative, 32 KB cache. The chip has 7.1 million transistors, a core area of 16 sq mm, and runs at 260 MHz while consuming 0.4-1.1 W across of range of kernels.