Soft timers

Department of Computer Science
Rice University

Introduction

This code implements Soft timers, a new operating system facility that allows the efficient scheduling of software events at a granularity down to tens of microseconds. Soft timers can be used to avoid interrupts and reduce context switches associated with network processing without sacrificing low communication delays. This code also implements two applications of Soft timers, rate-based pacing of packet transmissions and network polling for packet receptions.

This code was developed as part of the ScalaServer project under the direction of Prof. Peter Druschel at the Department of Computer Science, Rice University.

Along with Soft timers, this code also contains implementations of two hardware interrupt based mechanisms for obtaining fine-grained events. The first uses the external 8253 programmable interrupt timer (PIT) chip to generate interrupts at the required frequency. The second uses the on-chip timers available on APICs (asynchronous programmable interrupt controller), which only appeared first in the Intel Pentium III family of the x86 chips.

This document briefly describes the overall design and layout of the code as well as how to compile, install and run it. It does not comprehensively describe all the code details and is only intended to be a guide to help give a quick start to a knowledgeable Systems person. Implementation details have to be gathered by reading the code. This document further assumes that the reader is familiar with the papers given in the bibliography.

Design

The code uses a timing-wheel based data structure for the scheduling of software events. This data structure presents an interface to the rest of the system for scheduling/canceling software events and to check and fire any scheduled events.

For fine-grained specification of time, the time is maintained as a 64-bit variable maintained in cycles of the processor's clock speed. Time measurement is done using the 64-bit cycle counter available on modern x86 hardware. This counter can be read with very little overhead (almost as low as reading a register).

The principal challenge in efficiently supporting event scheduling at a fine-granularity is to check the timing wheel data structure for pending events at a high frequency and with low overhead. This code provides four different methods of checking for pending events.

  1. 8253 PIT based periodic hardware interrupts: This method programs the external 8253 chip to deliver interrupts at the required frequency. While the 8253 can support high interrupt frequencies, the overhead of each interrupt is prohibitively high. As explained in [1,2], interrupts also result in bad cache locality.
  2. APIC timer based periodic hardware interrupts: This method programs the on-chip APIC to deliver interrupts at the desired frequency. These interrupts are less expensive than the 8253 based hardware interrupts [1]. However, as with 8253 interrupts, APIC interrupts also adversely affect cache locality. As APICs only appeared first on the Pentium III, this method is not available on processors prior to Pentium III in the x86 family.
  3. Soft timers: With Soft timers, any pending events are checked from strategic places in the kernel code using procedure calls. In most systems, these trigger states happen very frequently and therefore Soft timers support fine granularity with low overhead (overhead of procedure calls). As explained in [1,2], Soft timers also afford better cache locality than hardware interrupt based mechanisms. However, as there are no guarantees that trigger states shall be reached within any given amount of time, this method cannot be used in situations where real-time constraints are to be met.
  4. Soft timers backed by APIC timers: This approach combines the low overhead of Soft timers with the ability of interrupt based mechanisms to provide real-time guarantees [1]. This method requires two timing parameters for scheduling an event - the time T at which the event is supposed to fire, and a tolerance tol that specifies the maximum tolerable delay for the firing of the event. In other words, T+tol gives the deadline for the firing of the event. For this purpose, an APIC interrupt is scheduled to occur at time T+tol. If a Soft timer trigger state occurs between times T and T+tol , the event is fired using that and the APIC interrupt is canceled; otherwise the event is fired in the context of the APIC interrupt when the deadline is reached. Scheduling and canceling an APIC interrupt has low overhead (writing a register). If trigger states happen at sufficiently high frequency, the APIC interrupt would be canceled most of the time and the high overhead of actually fielding an APIC interrupt won't be incurred. For implementing this method, a second timing wheel is used to maintain the deadlines for events.
Also included in the code are two applications in networking that need fine-grained scheduling of events. These are rate-based pacing and network polling.

Rate-based pacing is applied in the code to TCP connections such that any two packets on a connection are separated by a minimum time interval. This can be used to (1) prevent packet bursts in TCP, and (2) fully utilize available bandwidth in high bandwidth-delay networks if the bandwidth is known a priori . Techniques for measuring bandwidth (such as packet-pair algorithm) are not implemented in this code.

Network polling is used to avoid interrupts generated when packets are received on a network interface. These interrupts have high overhead and adversely affect cache locality. With network polling, these interface interrupts are turned completely off and periodically the network interface is checked (or polled) to see if any packet has arrived. The code does a dynamic computation of the poll interval in order to receive a target number of packets on the interface at any poll. In order to poll the interface, the fine-grained event support in the code is used.

Loadable Module Implementation

In order to facilitate the development and debugging of the code, the kernel parts of the code are implemented as a loadable module. Few code changes are made to the actual FreeBSD-2.2.6 kernel - most of these changes are necessary only to hook the loadable module into the kernel.

Since this project required changes to the network stack for implementing rate-based pacing and network polling, the loadable module installs an additional network stack in the operating system rather than modifying the existing one. In other words, once the module is loaded, the operating system runs with two network stacks. The original stack is used for regular networking services like telnet, rlogin etc. Any bugs (unless they cause memory violations) in the new stack thus still keep regular system services running. All it takes to fix these bugs is to unload the module, recompile the fixed code and load the module back.

The network stack implemented by the loadable module was derived from the original FreeBSD-2.2.6 stack itself. The trick to installing additional network stacks is to install them as a different protocol family. Application programs (such as web servers) choose the network stack they want to use by specifying the protocol family as the first argument to the socket() system call. In FreeBSD, the protocol family for the regular TCP/IP stack is 2 (specified with macro PF_INET or AF_INET). For the alternate stack implemented in the loadable module, this protocol family is chosen to be 21. Therefore, in order to use a conventional web server (e.g., Apache) with this loadable module, the first argument of any socket() calls made in the server code should be changed to 21. (In Apache-1.3.3 source this required a change in the file src/main/http_main.c).

Once the kernel operates with both the stacks loaded, the next question that arises is which stack should service an IP packet that is received at a network interface. The solution to this is of necessity ad hoc, as both network stacks are equally capable of handling all IP packets. The packet is first presented to the stack in the loadable module. This stack quickly makes a decision on whether it wants the packet or not - if not, then the packet is sent up the original stack. Under this implementation, all packets containing a source IP address of the form 192.x.x.x and arriving on any of the fxp0-fxp4 interfaces are accepted by the stack in the loadable module. However, packets from source 192.168.2.75 and packets from all other sources are sent up the original stack (192.168.2.75 was the IP address of our development machine and we wanted to use the regular stack for this address for ftp'ing the compiled code over to the experimental machine). This part of the code is implemented in FreeBSD-stack/netinet/netsys.c and should be modified as necessary.

Code layout

This section provides a brief description of most of the files and directories included in this release.

Configuring

This section describes the various parameters used in the source for the loadable kernel module in order to configure it.

Compiling and Installing

Download the source from the URL given at the beginning of this document. Unzip and untar the tarball.

Running

Follow the following steps for testing the system.
  1. Reboot experimental machine : Reboot the experimental machine such that it runs the modified kernel from the Compiling and Installing section.

  2. Load kernel module : On the experimental machine, go to directory where the two files from FreeBSD-stack/distdir were copied. Execute 'make load' in this directory with super-user privileges. This loads the kernel module and installs the second stack in the kernel.

  3. Start the application : On the experimental machine, start the user application that uses TCP/IP networking (e.g., a web server). This application should have been compiled to use the network stack in the loadable module. If a web server is used as the application, it is now ready to receive HTTP requests from any clients.
For terminating the setup, execute the following steps:
  1. Stop the application running on the experimental machine.
  2. Unload the module by executing the command ' make unload '.

Caveats

Bibliography

  1. "Soft timers: efficient microsecond software timer support for network processing", Mohit Aron and Peter Druschel. In ACM Transactions on Computer Systems (TOCS) , August 2000.

  2. "Soft timers: efficient microsecond software timer support for network processing" , Mohit Aron and Peter Druschel. In Proceedings of the 17th Symposium on Operating Systems Principles (SOSP-17) , Kiawah Island, SC, December 1999.