Saturday, October 10, 2009

Quantifying the NetWatch performance hit

I was going through the Makefiles in preparation for adding 440BX support (for QEMU and Crashbox), and decided to skim the README, since I think I made Jacob write it, and I'd never read it in much detail! In the README, he said:

Because NetWatch is invisible to the OS, its CPU usage is difficult to monitor; we do so by comparing the MD5 throughput of the system with NetWatch running versus without. The only way that the OS could detect this performance drain is by spinning tightly and watching for a sudden jump in the CPU's time stamp counters.

I had considered this to be a problem before (indeed, when the system is actively doing a lot of VNC work, not only does processing power notably diminish, but the machine "feels" laggy!), but never really had a mechanism by which to quantify the issue. But rereading Jacob's notes -- in particular, the OS reading the TSCs -- I realized that the OS isn't the only one that could use the TSCs to quantify NetWatch performance. If we were clever, we could read the TSC when we enter and leave SMM each time. Since we come in every 64msec or so, it's unlikely that the TSC will overflow and provide an inaccurate result (it's 64 bits; to roll over in that timeframe would require a 288230 petahertz clock!).

The procedure, then, would be to measure the number of ticks since we last left SMM to when we enter SMM, and call that the amount of time spent by the OS. Then, we use the number of ticks from entering to leaving, and that's the amount of time spent by NetWatch. This should be highly accurate (although the number I'd really care about is a percentage with two significant figures).

I should note, by the way, that this isn't a generalized solution for other such performance problems on "real systems". This only works in NetWatch because we're guaranteed not to change the CPU clock for two reasons -- for one, the system is too old to have SpeedStep, and for the other, NetWatch clobbers the existing ACPI implementation, preventing Linux ACPI from coming up and changing the CPU clock, even if it were possible on this system.

On newer systems, probably the right answer would be to use the HPET, but on ICH2, the HPET simply isn't present! Also, I believe the HPET requires setup and board specific probing; for real OSes, this is done through ACPI, I believe.

Hopefully I'll get a chance to play with this tomorrow and come up with some numbers.

No comments: