CMT T1,T2 Utilisation

What does it mean by an "idle" hardware thread on UltraSPARC T1: Conventionally a processor is considered to be idle by the kernel when there is no runnable thread in the system which can be scheduled on that processor. On previous generation SPARC processors, an idle state related to the pipeline of the processor remaining unused. For a CMT processor like T1 if there are not enough runnable threads in the system then one or more hardware threads in a core remain idle.
Main differences in behavior of an idle virtual processor (hardware thread) of T1 compared to the idle CPU in conventional SMP are :

What does it mean by a "stalled" thread on UltraSPARC T1: On a T1 processor when a thread stalls due to a long latency instruction (such as a load missing in the cache), it is taken out of the mix of schedulable threads with allowing the next ready to run thread from the same core to use its time slice. Similar to conventional processors, a stalled thread on T1 is reported as busy by mpstat. On conventional processors a stalled (e.g. on cache miss) thread occupies the pipeline and hence results in low system utilization. In case of T1 the core can still get utilized by other non-stalled runnable threads.

Understanding processor utilization: For a T1 processor a thread being idle and a core becoming idle are two different things and hence need to be understood separately. Here are some commonly asked questions in this regard :

As with any other system on Sun Fire T2000 as the load increases, more threads become busy and core utilization also goes up. Since thread saturation (i.e. virtual CPU saturation) and core saturation are two different aspects of system utilization, we need to monitor both simultaneously in order to determine whether an application is likely to saturate a core by using fewer threads. In that case, applying additional load on the system will not deliver any more throughput. On the other hand if all the threads get saturated but core utilization shows more head room then that means the application has stalls and it is a high CPI application. Application level tuning, partitioning of resources using processor sets (psrset(1M)) or binding of LWPs (pbind(1M)) could be some techniques to improve the performance in such cases.



Back to SDR homepage

This document is Copyright (c) 2008 Stefan Parvu
Document License: PDL