Univ. Berkeley CS Dept. TI Analysis of Multithreaded Microprocessors under Multiprogramming OR UCB LT CSD 92687 AU David E. Culler AU Michial Gunter AU James C. Lee AB Multithreading has been proposed as a means of tolerating long memory latencies in multiprocessor systems. Fundamentally, it allows multiple concurrent subsystems (CPU, network. and memory) to be utilized simultaneously. This is advantageous on uniprocessor systems as well, since the processor is utilized while the memory system serves misses. We examine multithreading on high-performance uniprocessors to achieve better costperformance on multiple processes. Processor utilization and cache behavior are studied analytically and under simulation by interleaving reference traces to model timesharing and multithreading. Multithreading is advantageous with large on-chip caches (32 kilobytes), associativity of two, and a memory access cost of roughly 50 instruction times. At this point, a small number (24) threads is sufficient, the thread switch need not be extremely fast, and the memory system need support only one or two outstanding misses. The increase in processor real-estate to support multithreading is dest, given the size of the cache and floating-point units. A surprising observation is that miss ratios may be lower with multithreading than with timesharing under a steady-state load. This occurs because switch-on-mass multithreading introduces unfair thread scheduling, giving more CPU[l cycles to processes with better cache behavior.