tag:blogger.com,1999:blog-22436550417046103402024-03-08T04:25:11.537-05:00NetWatch DevelopmentJoshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.comBlogger21125tag:blogger.com,1999:blog-2243655041704610340.post-46245464428857096672009-10-10T04:45:00.001-04:002009-10-10T04:56:54.750-04:00Quantifying the NetWatch performance hitI was going through the Makefiles in preparation for adding 440BX support (for QEMU and Crashbox), and decided to skim the README, since I think I made Jacob write it, and I'd never read it in much detail! In the README, he said:<br /><br /><tt>Because NetWatch is invisible to the OS, its CPU usage is difficult to monitor; we do so by comparing the MD5 throughput of the system with NetWatch running versus without. The only way that the OS could detect this performance drain is by spinning tightly and watching for a sudden jump in the CPU's time stamp counters.</tt><br /><br />I had considered this to be a problem before (indeed, when the system is actively doing a lot of VNC work, not only does processing power notably diminish, but the machine "feels" laggy!), but never really had a mechanism by which to quantify the issue. But rereading Jacob's notes -- in particular, the OS reading the TSCs -- I realized that the OS isn't the only one that could use the TSCs to quantify NetWatch performance. If we were clever, we could read the TSC when we enter and leave SMM each time. Since we come in every 64msec or so, it's unlikely that the TSC will overflow and provide an inaccurate result (it's 64 bits; to roll over in that timeframe would require a 288230 petahertz clock!).<br /><br />The procedure, then, would be to measure the number of ticks since we last left SMM to when we enter SMM, and call that the amount of time spent by the OS. Then, we use the number of ticks from entering to leaving, and that's the amount of time spent by NetWatch. This should be highly accurate (although the number I'd really care about is a percentage with two significant figures).<br /><br />I should note, by the way, that this isn't a generalized solution for other such performance problems on "real systems". This only works in NetWatch because we're guaranteed not to change the CPU clock for two reasons -- for one, the system is too old to have SpeedStep, and for the other, NetWatch clobbers the existing ACPI implementation, preventing Linux ACPI from coming up and changing the CPU clock, even if it were possible on this system.<br /><br />On newer systems, probably the right answer would be to use the HPET, but on ICH2, the HPET simply isn't present! Also, I believe the HPET requires setup and board specific probing; for real OSes, this is done through ACPI, I believe.<br /><br />Hopefully I'll get a chance to play with this tomorrow and come up with some numbers.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-9374401921343797852009-10-03T02:59:00.000-04:002009-10-03T03:12:16.068-04:00ReanimationOver the summer, Jacob did a lot of important contributory work to make NetWatch run on AMD64 and to add a GDB slave to NetWatch. Back at CMU, I've spent the last few hours reanimating NetWatch on ICH2. I now have pushed changes to the Git repository; it should build again! Of note, NetWatch can now display all of the registers that SMM provides, and the backtrace is now quite a bit more robust. I have not integrated the GDB support into the ICH2 build yet.<br /><br />Also on an exciting note, NetWatch had its first application as a real debugger recently. Sully, in the progress of his 15-412 project (a network driver for Pebbles), found a case in which the driver worked in QEMU but instantly grenaded on hardware. Luckily, his 412 test system is the same as the NetWatch test system, so we booted his kernel in NetWatch, and got a backtrace when the fireworks came. Sure enough, the backtrace pointed exactly to where the system blew up! Sully shook my hand for producing a useful tool, and punched me for doing it with System Management Mode.<br /><br />On a final note, I am considering writing a paper on NetWatch. Stay tuned.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-66665138160869480332008-12-14T20:53:00.001-05:002008-12-14T20:53:59.478-05:00text rendering<img src="http://nyus.joshuawise.com/netwatch-bx.png">Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-53288878123864272072008-12-13T19:46:00.000-05:002008-12-13T19:47:54.385-05:00Generally, the wisdom is...Generally, the wisdom when you say, "I need a <span style="font-style:italic;">foo</span>" for some common value of <span style="font-style:italic;">foo</span> is to <span style="font-weight:bold;">not</span> write your own, but instead use a preexisting one.<br /><br />But we would like to offer some new wisdom. The first step, we believe, when you want an IP stack is to <span style="font-weight:bold;">burn lwIP to the fucking ground.</span>Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-20295947675571276222008-12-12T20:47:00.001-05:002008-12-12T20:52:56.967-05:00KeyboardJacob and I had a mini-sprint this afternoon, and figured out some things about keyboards, magic numbers, i8042s, UNGET commands (which make life easier), keyboard performance, linking VNC into the keyboard routines, and grub crashes. Major progress has been made in almost all of those areas, except for the magic numbers area, in which more magic is there.<br /><br />Tomorrow: differential updates for better user performance while running X, and text console framebuffer emulation. ... and P3 style files.<br /><br />Tonight: sleep.<br /><br />Code, as always, is in git.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-45202985955628330012008-12-07T04:51:00.000-05:002008-12-07T04:59:09.675-05:00Big changesBig news recently. A week or so ago, Jacob got the VNC/RFB server talking to a client. Performance wasn't great, but we had output. Now, we've done two things that make a huge difference in terms of performance:<br /><br /><ul><li>The network driver no longer waits for packets to finish getting sent or received, and queues up packets as needed, like modern network drivers do. We actually use the full capabilities of the card's bus-master DMA interface, and boy howdy does it pay off; we actually get ping performance in the 'as expected' region, where all pings are handled essentially under 64msec.</li><br /><li>The big win, though, came from something very silly -- turning on the cache! You're supposed to keep SMRAM cacheless when you're outside of SMM, so that the user cannot get in your way and fill your cache with bogons (and hence have you crash in SMM). But if you want performance, you need to turn it on when you get in SMM. Finally, we can actually get the blazing fast speed that our 1GHz P3 promises. This brought us from about 3 minutes per frame to 15 seconds per frame; and the network driver performance improvements are incrementally helping that.</li></ul><br /><br />This is very exciting. VNCviewer now claims that we get good enough throughput to request hextile encoding (instead of ZRLE encoding) -- it thinks we're getting 4.5MByte/sec from the machine. Of course, we haven't implemented hextile (or ZRLE), but... It may be time to declare this performance fun to have come to an end, at least until we get mouse and keyboard support implemented.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-84907314161427081962008-11-25T04:04:00.000-05:002008-11-25T04:09:55.146-05:00Framebuffer accessOK, we now have unaccelerated access to the framebuffer. Maybe DMA soon if I have time. j4cbo is making good progress on RFB. In the interim, here's a picture for you, showing the venerable test phrase given to bootstrap all graphic demos that I've produced to date:<br /><br /><a href="http://nyus.joshuawise.com/netwatch-ass.scale.jpg"><img src="http://nyus.joshuawise.com/netwatch-ass.480.jpg"></a><br /><br />As usual, code is in <a href="http://git.joshuawise.com/?p=netwatch.git;a=summary">Git</a>.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-58823679161089136312008-11-22T02:45:00.000-05:002008-11-22T02:46:11.188-05:00<pre><tt>/****************************************************************************\<br />* *<br />* The video arbitration routines calculate some "magic" numbers. Fixes *<br />* the snow seen when accessing the framebuffer without it. *<br />* It just works (I hope). *<br />* *<br />\****************************************************************************/</tt></pre><br /><br />-- linux/drivers/video/nvidia/nv_hw.cJoshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com2tag:blogger.com,1999:blog-2243655041704610340.post-45091017088941136012008-11-03T05:01:00.000-05:002008-11-03T05:09:14.463-05:00UpdateWe've started work on getting lwIP integrated in, but discovered that our old putting-everything-in-ASEG scheme doesn't really work with the amount of code and data we have now. So of course, we're going to use paging in SMM to spread the code across both ASEG and TSEG. As of tonight, the basic procedure of enabling paging works, but trying to read or write video RAM causes the system to hang. We're getting closer, though...Jacob Potterhttp://www.blogger.com/profile/08450015021399284310noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-8553576205012164752008-10-06T00:23:00.000-04:002008-10-06T00:27:57.760-04:00packets, huh?<a href="http://nyus.joshuawise.com/tmp/netwatch-3.scale.jpg"><img src="http://nyus.joshuawise.com/tmp/netwatch-3.blogscale.jpg"></a>Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-68722378860370791832008-10-05T22:43:00.000-04:002008-10-05T23:01:16.699-04:003c905: transmitting!It turns out that we'd overlooked something about the northbridge's SMM handling: the memory controller blocks PCI DMA access to the SMM segments just like it prevents the CPU from reaching them. This explains why sending packets was failing before; at some point shortly after sending the first packet, the network card would attempt to access the no-longer-open frame descriptor, and be told "no!"<div><br /></div><div>Now that we've set Tseg to be open all the time, packet sending works reliably under both Linux and Pebbles, and the PCI bothering code still does what it's supposed to. The next step is to start sending out screen dumps...</div><div><br /></div><div>This brought up an interesting race condition. Suppose we have some device that we set up to repeatedly attempt to DMA our code into SMM. Most of the time, this will fail. However, at some point, an SMI will occur (and if not, we can enable some more traps); if the DMA happens to occur while the northbridge is in SMM, then shazam...</div>Jacob Potterhttp://www.blogger.com/profile/08450015021399284310noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-83251091865056738392008-10-05T02:23:00.000-04:002008-10-05T02:27:39.735-04:00This driver would receive 0 style points if I were grading itMagic constants abound ... even when they're #defined previously. Comments like:<br /><pre><tt> /** failed after RETRY attempts **/<br /> outputf("Failed to send after %d retries", retries);<br /></tt></pre> Not just bad indentation (i.e., not what I want), but <b>inconsistent</b> indentation. Committing code, too, that you know to be wrong:<br /><pre><tt> /** I don't know what MII MAC only mode is!!! **/</tt></pre><br /><br />The Linux driver, mind you, isn't much better:<br /><pre><tt>/* Update statistics.<br /> Unlike with the EL3 we need not worry about interrupts changing<br /> the window setting from underneath us, but we must still guard<br /> against a race condition with a StatsUpdate interrupt updating the<br /> table. This is done by checking that the ASM (!) code generated uses<br /> atomic updates with '+='.<br /> */<br /></tt></pre><br /><br />By the end of this, this driver will be mostly completely rewritten. And that's fine with me.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-3553331212352842842008-10-05T01:08:00.001-04:002008-10-05T01:11:57.950-04:003c905 statusAfter a few embarassing bugs, I have the network card initializing for me, and giving me its MAC address:<br /><tt><pre>returned 0x2a<br />NetWatch running<br />Probing PCI device: 3c905c-tpo<br />3c90x: Picked I/O address ec00<br />EEPROM adr 00, data 00b0 d097 bcac 9200<br />EEPROM adr 04, data 00ca 0000 0000 6d50<br />EEPROM adr 08, data 2940 0000 00b0 d097<br />EEPROM adr 0c, data bcac 0020 0000 00aa<br />MAC Address = 00:b0:d0:97:bc:ac<br />Connectors present:<br />10Base-T / 100Base-TX<br />.<br />found 3c90x, hopefully!<br /></pre></tt><br /><br />I also have it transmitting packets; it manages to squeeze two packets down the wire before it hangs up and never talks to me again (and worse, sits in an infinite loop in system management mode). These sorts of hangs were pretty damn hard to diagnose last time I did this sort of stuff; hopefully I'll not be in too big trouble here.<br /><br />It occurs to me that I might want better logging for my outputs; a serial logger might be nice. Maybe we should get a USB serial dongle to go with the 412mac, which is currently acting as a nice companion box that runs tcpdump, etc.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-28247016620366989822008-10-02T17:44:00.000-04:002008-10-02T17:49:39.095-04:00Good news!<a href="http://www.youtube.com/watch?v=0oV1ZsJZhOc">I just got new pictures of the Dacia Sandero!</a><br /><br />The even better news is that we found a 3c905 driver that we can pretty much just fully grab, since it has a couple of interesting features -- in particular, it's BSD-compatible (yes!), it's polling-driven (yes!), it's not Linux (yes!), and it doesn't use <tt>malloc()</tt> (yes!).<br /><br />So, uh, Thanks Etherboot. <a href="http://en.wikipedia.org/wiki/Look_Around_You">Thetherboot</a>. <a href="http://www.virtualbox.de/browser/trunk/src/VBox/Devices/PC/Etherboot-src/drivers/net/3c90x.c?rev=1">Here is the driver that we plan to grab</a>, anyway; we'll see what happens tonight.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-16899096821646464372008-09-26T02:31:00.001-04:002008-09-26T02:31:39.159-04:00Presentation uploadedThe presentation given in lecture is now accessible from the web:<br /><br /><a href="http://b.j4cbo.com/temp/netwatch.pdf">NetWatch lecture slides</a>Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-66014827401067261062008-09-26T00:33:00.000-04:002008-09-26T00:47:35.145-04:00Many changesMany changes were made tonight, with quite a few exciting new bits of functionality. But, before I go into detail on any of those, let me start off with this:<br /><br /><pre><tt>[ 3.193058] e100: eth0: e100_probe: addr 0xfbfff000, irq 10, MAC addr 00:02:b3:36:7d:72<br />[ 3.193229] 3c59x 0000:02:0c.0: enabling device (0000 -> 0003)<br />[ 3.193300] PCI: Found IRQ 11 for device 0000:02:0c.0<br />[ 3.193386] 3c59x: Donald Becker and others.<br />[ 3.193451] 0000:02:0c.0: 3Com PCI 3c905C Tornado at e0814000.<br />[ 3.314350] *** EEPROM MAC address is invalid.<br />[ 3.314411] 3c59x: vortex_probe1 fails. Returns -22<br />[ 3.314479] 3c59x: probe of 0000:02:0c.0 failed with error -22<br /></tt></pre><br /><br />We have, indeed, successfully bothered the 3c905 network card to cause it to go away and get off our bus. It wasn't the cleanest solution, but it did work. It turns out that if you set the BARs on a card to garbage, Linux will set them to non-garbage on card probe, and since we can't trap on 0xCF8/0xCFC accesses, those were *two* of the methods that we planned to use to bother this card gone. But, Linux will surely listen to you if you ask often enough; so, we asked the card every 64msec to stop responding to requests. This creates a race; it had better be the case that Linux does PCI enumeration more than 64msec before it loads the 3c905 driver, or else we will be hosed. <br /><br />I'm not sure if there <span style="font-weight:bold;">is</span> a clean way to do this; suggestions would be appreciated. This works, at least, for the time being.<br /><br />There were many more things that happened tonight, though. In particular: <ul><br /><li>SMI handlers now have enable routines -- instead of poking the hardware directly, you can <tt>smi_register_handler(...); smi_enable_event(...);</tt>. Much cleaner code, much more portable.</li><li>Structured packet support was added. If you put some magic values in the registers, then poke the GBL_RLS (BIOS request) bit on the southbridge like an ACPI handler would do, then you can make requests of NetWatch. You can even give it a logical address, and it'll do lookups just as if it were the kernel! Blame Jacob for that... ;)</li><li>We can now scan and probe devices on the PCI bus, with a relatively abstract method.</li><li>Output routines were unified -- now instead of deciding whether to call <tt>puts</tt> or <tt>dolog</tt>, you can call the function pointers <tt>output</tt> and its formatted friend, <tt>outputf</tt> and you'll get the Right Thing no matter whether you are in the loader or the aseg code itself.</ul><br /><br />The code is now much cleaner, and more extensible. Look forward to lwIP coming this weekend, or at least a small network driver.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-67980292711456552102008-09-21T04:56:00.001-04:002008-09-21T05:03:14.175-04:00quick thoughtAlso, we need to unify logging. Having dolog() from some contexts, and sprintf()/puts() from others is not so cool. Also, we need to figure out a way to split the build environments out, and put object files in the right places for the right build environments -- right now, there are three build environments (Linux, grubload, and in-SMM, with grubload being very similar to in-SMM, but for the bigger console). A full unified build system that would allow you to 'make' from the root would be cool too.Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-59138383946521272252008-09-21T04:33:00.000-04:002008-09-21T04:51:36.559-04:00Much-needed dirty workOkay, some much needed dirty work has been done. I moved a bunch of logic out to smi.c, and put together a lot of defines in reg-82801b.h as created from Intel's documentation. There are a few subregisters that need to be trawled, and I need to work out a system to register interest in a type of event, and have a callback for it.<br /><br />Of note also is that we now have 'printf' and 'dologf', which should eliminate disgusting blocks of code like this:<br /><pre><tt> strcpy(s, "READxxxxxxxxxxxxxxxx");<br /> tohex(s+4, cts);<br /> b = inb(cts & 0xFFFF);<br /> tohex(s+12, b);<br /> dolog(s);</tt></pre><br />To be replaced with:<br /><pre><tt> b = inb(cts & 0xFFFF);<br /> dologf("READ: %08x (%02x)", cts, b);</tt></pre><br />This gives much cleaner code, and much cleaner output, too!<br /><br />I'm undecided on this whole 'no magic number' business again. On one hand, things like this: <tt>return pci_read32(0, 31, 0, 0x40) & 0xFF80;</tt> do look pretty magic and hard to maintain; indeed, it has even been the source of a bug, since '31' was originally typoed as '21'! But on the other hand, replacing with named constants is pretty ugly too; my RSI is acting up just thinking about this: <tt>return pci_read32(ICH2_LPC_BUS, ICH2_LPC_DEV, ICH2_LPC_FN, ICH2_LPC_PCI_PMBASE) & ICH2_PMBASE_MASK;</tt>. Yikes.<br /><br />So I guess the next step after this stuff gets cleaned up is to start trying to hide the NIC. Since PCI interception doesn't work, I guess we get to turn off the BARs, and hope that Linux does not turn them back on for us. We shall see...Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-65217317668795985172008-09-13T20:50:00.000-04:002008-09-13T20:55:31.370-04:00Good news and bad newsThe good news is that we now know how to trap on arbitrary I/O reads/writes. We successfully have it wired up such that when you press escape, and when the OS goes to read the escape key, the system reboots, which is great fun for annoying <tt>vi</tt> users. We have both the 'special' traps (keyboard/mouse) and normal traps (arbitrary I/O ports) working.<br /><br />Now, the bad news is that 32-bit accesses to the PCI config I/O ports 0xCF8/0xCFC are caught by the memory controller hub (MCH), which comes before the ICH, and that means that we can't trap on those accesses. So, we can't get in the way there, which kind of sucks. I was trying to get in the way by trapping on the IO ports allocated to the 3c905, but it turns out that's also bound to memory mapped regions; I think that if we intercept PCI before Linux starts and deconfigure the BARs for the MMIO registers on the 3c905, then we can force it to use the I/O ports (and subsequently force Linux not to see them). Perhaps better, we could just deconfigure both BARs and only configure them when we're in the SMM handler and we want to talk to the card... hmmm...Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-32642526125471778632008-09-11T00:23:00.000-04:002008-09-11T00:32:46.208-04:00Pre-sleep updates<ul><li>System now boots with Jacob's ELF loader, and returns with my real mode code to reboot into grub, and can do full Linux boot process with SMM monitor.</li><br /><li>SMM monitor currently breaks ACPI, because it is used for APM ports (0xB2/B3) -- can we fake it well enough, or do we have to chain the old one? (ewww)</li><br /><li>We have a log console for debug messages that shows up at the top right of the screen</li><br /></ul><br /><br />So, both "immediate todos" are complete. Now need new immediate todos... to be discussed tomorrow if we have time. Refactoring is imminent.<br /><br />The git repository is now publicly accessible: <a href="http://git.joshuawise.com/?p=netwatch.git;a=summary">NetWatch git</a><br /><br />Photo of the day:<br /><br /><a href="http://nyus.joshuawise.com/tmp/netwatch-1.jpg"><img src="http://nyus.joshuawise.com/tmp/netwatch-1.scale.jpg" alt="Linux running"></a>Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0tag:blogger.com,1999:blog-2243655041704610340.post-86529096671575951732008-09-10T16:42:00.000-04:002008-09-10T16:47:00.836-04:00Status updateCurrent status as of first update:<br /><br /><ul><li>We can open up aseg and write to it on our test machine, ainfvec (ICH2).</li><br /><li>We can trigger and retrigger SMIs on ainfvec.</li><br /><li>We have working aseg code that writes to video memory from C space.</li><br /><li>We have PCI write skeletons for Linux and raw I/O.</li><br /><li>We have a working ELF loader that 'raw loads' ELFs into memory from inside Linux.</li><br /><li>We have a multiboot kernel that can be loaded from GRUB, using Joshua's p1 console.c. Does nothing more than print parameters and spin.</li><br /><li>Build system surely needs work.</li><br /><li>Modularity need work.</li><br /></ul><br /><br />Immediate todos:<br /><ul><li>Get grubload to return to GRUB later.</li><br /><li>Load ELF/SMM code from grubload.</li></ul><br /><br />Later todos relating to current code:<br /><ul><li>Refactor machine specific/northbridge specific code into users of a PCI module.</li><br /><li>Have PCI code scan bus and enumerate cards.</li></ul><br /><br />Code available:<br /><tt>git clone http://nyus.joshuawise.com/storage/git/netwatch.git</tt>Joshuahttp://www.blogger.com/profile/15663697470213886945noreply@blogger.com0