I’ve been doing some stuff at work that involves counting CPU cycles, which caused me to write a little test program to get some baselines for how many cycles we should expect certain constructs to consume. So, I wrote a little test program that does the following:

read cycle counter
if (integer compare) {
	read counter again
	print difference
}
else {
	read counter again
	print difference
}
repeat the whole thing

The results are not quite what you might have expected. Read on for more gory details if you’re into that sort of thing.

The less interesting of the two observations I’m going to make is that the results of running this program are slightly different on different Pentium-4 machines, which probably provides a hint about differences between cores. The more interesting observation is that, while the first iteration always runs in the same number of cycles, the second can vary quite a bit – especially in the case where the integer compare fails. For example, on one machine the first iteration takes 100 cycles, while the second usually varies from 92 to 120 and occasionally goes up as high as 500.

The differences are way too small to be anything like an interrupt. It happens even with hyperthreading turned off, so that’s clearly not it either. Most of the other explanations such as cache misses or memory-bus contention (on an otherwise-idle system?) fail to account for the fact that the variation always occurs in the second iteration while the first remains rock solid. For the time being I’m at a loss to explain this anomaly.