Low-Latency Router Microarchitecture for Intra-Chip Networks

R. Mullins, A. West, and S. Moore. "Low-Latency Virtual-Channel Routers for On-Chip Networks." International Symposium on Computer Architecture (ISCA), June 2004.

In lecture, we discussed various non-speculative and speculative techniques to reduce the latency of a virtual-channel router. This paper employs even more sophisticated speculative techniques to reduce the zero-load latency of a virtual-channel router to a single cycle with a short critical path. Students are strongly encouraged to read the earlier HPCA'01 paper by Peh and Dally, since this provides the foundation for the later ISCA'04 paper by Mullins et al. Although this paper is relatively sophisticated, students should now have the background to be able to understand all of the key concepts. The free virtual-channel queue scheme is more restrictive then a general allocator, but why is this not a problem? Students will probably need to spend some time studying Section 3.3 to fully understanding how the precomputed arbitration scheme is used in the various types of arbiters. For each of the four types of arbiters, carefully consider the authors' rationalization for whether it is "safe" or "unsafe". Is there any issue with using uniform random traffic for this kind of study? What advantage are the authors able to show in their evaluation? The authors estimate a cycle time of 12-FO4. What is the actually achieved cycle time for the final VLSI implementation quoted in their the later ASPDAC'06 paper? What explanation do the authors give for the discrepancy? After reading the primary ISCA'04 paper, students may want to revisit Section 2.1 of the ISCA'07 paper on express virtual channels.