Saturday, October 10, 2009

Quantifying the NetWatch performance hit

I was going through the Makefiles in preparation for adding 440BX support (for QEMU and Crashbox), and decided to skim the README, since I think I made Jacob write it, and I'd never read it in much detail! In the README, he said:

Because NetWatch is invisible to the OS, its CPU usage is difficult to monitor; we do so by comparing the MD5 throughput of the system with NetWatch running versus without. The only way that the OS could detect this performance drain is by spinning tightly and watching for a sudden jump in the CPU's time stamp counters.

I had considered this to be a problem before (indeed, when the system is actively doing a lot of VNC work, not only does processing power notably diminish, but the machine "feels" laggy!), but never really had a mechanism by which to quantify the issue. But rereading Jacob's notes -- in particular, the OS reading the TSCs -- I realized that the OS isn't the only one that could use the TSCs to quantify NetWatch performance. If we were clever, we could read the TSC when we enter and leave SMM each time. Since we come in every 64msec or so, it's unlikely that the TSC will overflow and provide an inaccurate result (it's 64 bits; to roll over in that timeframe would require a 288230 petahertz clock!).

The procedure, then, would be to measure the number of ticks since we last left SMM to when we enter SMM, and call that the amount of time spent by the OS. Then, we use the number of ticks from entering to leaving, and that's the amount of time spent by NetWatch. This should be highly accurate (although the number I'd really care about is a percentage with two significant figures).

I should note, by the way, that this isn't a generalized solution for other such performance problems on "real systems". This only works in NetWatch because we're guaranteed not to change the CPU clock for two reasons -- for one, the system is too old to have SpeedStep, and for the other, NetWatch clobbers the existing ACPI implementation, preventing Linux ACPI from coming up and changing the CPU clock, even if it were possible on this system.

On newer systems, probably the right answer would be to use the HPET, but on ICH2, the HPET simply isn't present! Also, I believe the HPET requires setup and board specific probing; for real OSes, this is done through ACPI, I believe.

Hopefully I'll get a chance to play with this tomorrow and come up with some numbers.

Saturday, October 3, 2009

Reanimation

Over the summer, Jacob did a lot of important contributory work to make NetWatch run on AMD64 and to add a GDB slave to NetWatch. Back at CMU, I've spent the last few hours reanimating NetWatch on ICH2. I now have pushed changes to the Git repository; it should build again! Of note, NetWatch can now display all of the registers that SMM provides, and the backtrace is now quite a bit more robust. I have not integrated the GDB support into the ICH2 build yet.

Also on an exciting note, NetWatch had its first application as a real debugger recently. Sully, in the progress of his 15-412 project (a network driver for Pebbles), found a case in which the driver worked in QEMU but instantly grenaded on hardware. Luckily, his 412 test system is the same as the NetWatch test system, so we booted his kernel in NetWatch, and got a backtrace when the fireworks came. Sure enough, the backtrace pointed exactly to where the system blew up! Sully shook my hand for producing a useful tool, and punched me for doing it with System Management Mode.

On a final note, I am considering writing a paper on NetWatch. Stay tuned.