···11+22+ The Spidernet Device Driver33+ ===========================44+55+Written by Linas Vepstas <linas@austin.ibm.com>66+77+Version of 7 June 200788+99+Abstract1010+========1111+This document sketches the structure of portions of the spidernet1212+device driver in the Linux kernel tree. The spidernet is a gigabit1313+ethernet device built into the Toshiba southbridge commonly used1414+in the SONY Playstation 3 and the IBM QS20 Cell blade.1515+1616+The Structure of the RX Ring.1717+=============================1818+The receive (RX) ring is a circular linked list of RX descriptors,1919+together with three pointers into the ring that are used to manage its2020+contents.2121+2222+The elements of the ring are called "descriptors" or "descrs"; they2323+describe the received data. This includes a pointer to a buffer2424+containing the received data, the buffer size, and various status bits.2525+2626+There are three primary states that a descriptor can be in: "empty",2727+"full" and "not-in-use". An "empty" or "ready" descriptor is ready2828+to receive data from the hardware. A "full" descriptor has data in it,2929+and is waiting to be emptied and processed by the OS. A "not-in-use"3030+descriptor is neither empty or full; it is simply not ready. It may3131+not even have a data buffer in it, or is otherwise unusable.3232+3333+During normal operation, on device startup, the OS (specifically, the3434+spidernet device driver) allocates a set of RX descriptors and RX3535+buffers. These are all marked "empty", ready to receive data. This3636+ring is handed off to the hardware, which sequentially fills in the3737+buffers, and marks them "full". The OS follows up, taking the full3838+buffers, processing them, and re-marking them empty.3939+4040+This filling and emptying is managed by three pointers, the "head"4141+and "tail" pointers, managed by the OS, and a hardware current4242+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr4343+currently being filled. When this descr is filled, the hardware4444+marks it full, and advances the GDACTDPA by one. Thus, when there is4545+flowing RX traffic, every descr behind it should be marked "full",4646+and everything in front of it should be "empty". If the hardware4747+discovers that the current descr is not empty, it will signal an4848+interrupt, and halt processing.4949+5050+The tail pointer tails or trails the hardware pointer. When the5151+hardware is ahead, the tail pointer will be pointing at a "full"5252+descr. The OS will process this descr, and then mark it "not-in-use",5353+and advance the tail pointer. Thus, when there is flowing RX traffic,5454+all of the descrs in front of the tail pointer should be "full", and5555+all of those behind it should be "not-in-use". When RX traffic is not5656+flowing, then the tail pointer can catch up to the hardware pointer.5757+The OS will then note that the current tail is "empty", and halt5858+processing.5959+6060+The head pointer (somewhat mis-named) follows after the tail pointer.6161+When traffic is flowing, then the head pointer will be pointing at6262+a "not-in-use" descr. The OS will perform various housekeeping duties6363+on this descr. This includes allocating a new data buffer and6464+dma-mapping it so as to make it visible to the hardware. The OS will6565+then mark the descr as "empty", ready to receive data. Thus, when there6666+is flowing RX traffic, everything in front of the head pointer should6767+be "not-in-use", and everything behind it should be "empty". If no6868+RX traffic is flowing, then the head pointer can catch up to the tail6969+pointer, at which point the OS will notice that the head descr is7070+"empty", and it will halt processing.7171+7272+Thus, in an idle system, the GDACTDPA, tail and head pointers will7373+all be pointing at the same descr, which should be "empty". All of the7474+other descrs in the ring should be "empty" as well.7575+7676+The show_rx_chain() routine will print out the the locations of the7777+GDACTDPA, tail and head pointers. It will also summarize the contents7878+of the ring, starting at the tail pointer, and listing the status7979+of the descrs that follow.8080+8181+A typical example of the output, for a nearly idle system, might be8282+8383+net eth1: Total number of descrs=2568484+net eth1: Chain tail located at descr=208585+net eth1: Chain head is at 208686+net eth1: HW curr desc (GDACTDPA) is at 218787+net eth1: Have 1 descrs with stat=x408001018888+net eth1: HW next desc (GDACNEXTDA) is at 228989+net eth1: Last 255 descrs with stat=xa08000009090+9191+In the above, the hardware has filled in one descr, number 20. Both9292+head and tail are pointing at 20, because it has not yet been emptied.9393+Meanwhile, hw is pointing at 21, which is free.9494+9595+The "Have nnn decrs" refers to the descr starting at the tail: in this9696+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers9797+to all of the rest of the descrs, from the last status change. The "nnn"9898+is a count of how many descrs have exactly the same status.9999+100100+The status x4... corresponds to "full" and status xa... corresponds101101+to "empty". The actual value printed is RXCOMST_A.102102+103103+In the device driver source code, a different set of names are104104+used for these same concepts, so that105105+106106+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa107107+"full" == SPIDER_NET_DESCR_FRAME_END == 0x4108108+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf109109+110110+111111+The RX RAM full bug/feature112112+===========================113113+114114+As long as the OS can empty out the RX buffers at a rate faster than115115+the hardware can fill them, there is no problem. If, for some reason,116116+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA117117+pointer will catch up to the head, notice the not-empty condition,118118+ad stop. However, RX packets may still continue arriving on the wire.119119+The spidernet chip can save some limited number of these in local RAM.120120+When this local ram fills up, the spider chip will issue an interrupt121121+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit122122+will be set in GHIINT1STS). When the RX ram full condition occurs,123123+a certain bug/feature is triggered that has to be specially handled.124124+This section describes the special handling for this condition.125125+126126+When the OS finally has a chance to run, it will empty out the RX ring.127127+In particular, it will clear the descriptor on which the hardware had128128+stopped. However, once the hardware has decided that a certain129129+descriptor is invalid, it will not restart at that descriptor; instead130130+it will restart at the next descr. This potentially will lead to a131131+deadlock condition, as the tail pointer will be pointing at this descr,132132+which, from the OS point of view, is empty; the OS will be waiting for133133+this descr to be filled. However, the hardware has skipped this descr,134134+and is filling the next descrs. Since the OS doesn't see this, there135135+is a potential deadlock, with the OS waiting for one descr to fill,136136+while the hardware is waiting for a different set of descrs to become137137+empty.138138+139139+A call to show_rx_chain() at this point indicates the nature of the140140+problem. A typical print when the network is hung shows the following:141141+142142+net eth1: Spider RX RAM full, incoming packets might be discarded!143143+net eth1: Total number of descrs=256144144+net eth1: Chain tail located at descr=255145145+net eth1: Chain head is at 255146146+net eth1: HW curr desc (GDACTDPA) is at 0147147+net eth1: Have 1 descrs with stat=xa0800000148148+net eth1: HW next desc (GDACNEXTDA) is at 1149149+net eth1: Have 127 descrs with stat=x40800101150150+net eth1: Have 1 descrs with stat=x40800001151151+net eth1: Have 126 descrs with stat=x40800101152152+net eth1: Last 1 descrs with stat=xa0800000153153+154154+Both the tail and head pointers are pointing at descr 255, which is155155+marked xa... which is "empty". Thus, from the OS point of view, there156156+is nothing to be done. In particular, there is the implicit assumption157157+that everything in front of the "empty" descr must surely also be empty,158158+as explained in the last section. The OS is waiting for descr 255 to159159+become non-empty, which, in this case, will never happen.160160+161161+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".162162+Since its already full, the hardware can do nothing more, and thus has163163+halted processing. Notice that descrs 0 through 254 are all marked164164+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is165165+descr 254, since tail was at 255.) Thus, the system is deadlocked,166166+and there can be no forward progress; the OS thinks there's nothing167167+to do, and the hardware has nowhere to put incoming data.168168+169169+This bug/feature is worked around with the spider_net_resync_head_ptr()170170+routine. When the driver receives RX interrupts, but an examination171171+of the RX chain seems to show it is empty, then it is probable that172172+the hardware has skipped a descr or two (sometimes dozens under heavy173173+network conditions). The spider_net_resync_head_ptr() subroutine will174174+search the ring for the next full descr, and the driver will resume175175+operations there. Since this will leave "holes" in the ring, there176176+is also a spider_net_resync_tail_ptr() that will skip over such holes.177177+178178+As of this writing, the spider_net_resync() strategy seems to work very179179+well, even under heavy network loads.180180+181181+182182+The TX ring183183+===========184184+The TX ring uses a low-watermark interrupt scheme to make sure that185185+the TX queue is appropriately serviced for large packet sizes.186186+187187+For packet sizes greater than about 1KBytes, the kernel can fill188188+the TX ring quicker than the device can drain it. Once the ring189189+is full, the netdev is stopped. When there is room in the ring,190190+the netdev needs to be reawakened, so that more TX packets are placed191191+in the ring. The hardware can empty the ring about four times per jiffy,192192+so its not appropriate to wait for the poll routine to refill, since193193+the poll routine runs only once per jiffy. The low-watermark mechanism194194+marks a descr about 1/4th of the way from the bottom of the queue, so195195+that an interrupt is generated when the descr is processed. This196196+interrupt wakes up the netdev, which can then refill the queue.197197+For large packets, this mechanism generates a relatively small number198198+of interrupts, about 1K/sec. For smaller packets, this will drop to zero199199+interrupts, as the hardware can empty the queue faster than the kernel200200+can fill it.201201+202202+203203+ ======= END OF DOCUMENT ========204204+