ResearchMy research interests span the areas of network protocols, operating systems, and distributed systems, particularly in the interactions between these areas. Most of my research over the years has been in the two areas of distributed system fault tolerance and protocols for wireless and mobile networking.
Wireless and Mobile Networking ProtocolsMy current research focuses on the design and evaluation of protocols for wireless and mobile networking, and I am leading the Monarch (MObile Networking ARCHitectures) research group at Rice in this area. I founded the Monarch research group (the Monarch Project) in 1992 while on the faculty at Carnegie Mellon University and have continued it since moving to Rice. In addition to being an acronym for "MObile Networking ARCHitectures", the name "Monarch" derives from the migratory behavior of the monarch butterfly. Each autumn, millions of monarch butterflies migrate from central and eastern United States and Canada to overwintering roosts in central Mexico; with the coming of spring, the monarch population again migrates northward. During migration, they may cover 80 miles or more per day.
Our work in the Monarch research group includes design and extension of protocols for wireless and mobile networks; detailed simulation-based evaluation, testbed implementation, experiments, and measurement; and active contribution to Internet standards through the Internet Engineering Task Force (IETF), the principal protocol standards development organization for the Internet.
Originally, my research in this area centered on the problem of Mobile IP. With the original IP protocol, each time a user with a mobile device (e.g., a wireless laptop or PDA) moves to a new point of attachment to the Internet, all active network connections on the user's device must be restarted and the device may also need to be rebooted. Mobile IP instead allows a mobile user to move about transparently while continuing to use the same IP address (the user's "home address"), avoiding these problems and enabling new mobile applications. In this context, I was one of the main designers of the IETF's official standard Mobile IP protocol for IPv4 (the current version of IP in the Internet today) and was the primary designer of IETF's official standard Mobile IP for IPv6 (the new version of IP that is replacing IPv4). The Internet standard for Mobile IPv4 (for which I was a majority contributor in authorship to this edited document) was originally approved and published as RFC 2002 in 1996 and revised as RFC 3220 in 2002. The Internet standard for Mobile IPv6 (for which I am the first and primary author) was approved in 2004 and published as RFC 3775. Today, Mobile IP has become commercially very important, as it is the basis for mobility management between domains for CDMA2000 and GPRS/UMTS 2.5G and 3G cellular data services, and most cell phones now include an implementation of Mobile IP. Other commercial implementations are also available from Cisco, HP, and Sun, and others.
I have also worked significantly on the problem of routing and other protocols in multihop wireless ad hoc networks, and most of my current work is related to this area. An ad hoc network is a collection of wireless mobile devices (nodes), cooperating to form a network without centralized administration or existing infrastructure such as access points or base stations. Nodes in the network cooperate to forward packets for other mobile nodes that may not be within direct wireless transmission range of each other. This forwarding forms routes over multiple "hops" between neighbor nodes, but these routes may change at any time as nodes move or as radio propagation changes. Some examples of the possible uses of ad hoc networking include sharing information among soldiers on the battlefield or among emergency relief personnel after a hurricane or earthquake; providing inexpensive community-based wireless Internet access; or networking between vehicles to improve safety and traffic management. Our Dynamic Source Routing protocol (DSR) first proposed a protocol for on-demand (or reactive) routing in ad hoc networks in 1994, and similar on-demand mechanisms have subsequently been used in many other routing protocols for ad hoc networks, including ABR, AODV, LAR, and DYMO. On-demand (reactive) routing reacts much more quickly as nodes move and routes must change, yet in general has extremely low (and often, no) overhead. Within the IETF, our DSR protocol has been officially published in 2007 as RFC 4728 as an Experimental protocol for the Internet.
In recent work, I have also studied the problem of security in ad hoc network routing. Most prior research in ad hoc networking has assumed a non-adversarial network setting, without attackers who may attempt to disrupt the routing, such as by forging routing packets to break valid routes in use or to create routes that do not lead to the destination, or who may attempt to exploit the routing protocol or security mechanisms to consume all network (e.g., bandwidth) or node (e.g., CPU capacity or battery power) resources. Unlike traditional wired networks, security in ad hoc network routing cannot be achieved by preconfigured trust relationships with known neighbor routers, since nodes move and neighbor relationships may change frequently. We have designed two new secure ad hoc network routing protocols, Ariadne and SEAD, that are at the same time more secure and more efficient than prior protocols; Ariadne and SEAD are based on very different routing protocols (Ariadne is based on DSR, which is reactive, and SEAD is based on DSDV, which is proactive), requiring different solutions for security. We have also identified the two new attacks the "wormhole" attack, a serious attack on many wireless protocols, and the "rushing" attack, an attack on on-demand routing protocols, and have proposed the first protocols to solve these attacks. We have also designed a number of inexpensive new authentication techniques (e.g., skipchains) that can serve as building blocks for other secure network protocols.
In current research, I am continuing work on protocols for different types of multihop wireless networking, including sensor networking (in the COMPASS project), mesh networking (in the TAPs project), and scalable ad hoc networking (in the Safari project):
Distributed System Fault ToleranceIn previous research, I worked in the area of distributed system fault tolerance, focusing primarily on the recovery of distributed application programs using the mechanisms of message logging and checkpointing. My Ph.D. thesis at Rice University in 1990 was in this area, and I continued to work in the area for many years after that; I initiated the fault-tolerance group at Rice when I independently decided to pursue this area for my thesis as Willy Zwaenepoel's first Ph.D. student. This research was motivated by the development of high-speed local area networks and commodity PCs and workstations, making it possible to harness a network of such machines for executing distributed application programs by utilizing the power of the many individual processors for a single application program. This approach can greatly speed the program's execution, at a cost far less than traditional multiprocessors. However, long-running distributed application programs run the risk of failing completely if even one of the processors on which they are executing should fail. Thus, to fully realize the potential of this approach, processor failures during a program's execution must be tolerated and recovered.
In my research in this area, I developed transparent, low-overhead fault-tolerance support systems for distributed application programs, in order to allow the survival of the program's execution in spite of such failures. Such support must have low overhead, or programmers and users will not use it. Such support should also be transparent, in order to allow the programmer to concentrate on the application problems to be solved, relieving the programmer from the need to also consider the issue of fault tolerance when writing application programs.
In my Ph.D. thesis research in this area at Rice, I designed, implemented, and evaluated a new recovery method based on pessimistic message logging and replay and a separate method based on optimistic message logging and replay; and developed a lattice theory model for reasoning about and proving the correctness of these systems. This pessimistic message logging system, known as sender-based message logging (SBML), was far more efficient than previous pessimistic message logging systems. The optimistic message logging system I developed was the first implementation of any recovery method using optimistic message logging; in addition, in contrast to earlier work in this area, I proved that the resulting recovered system state is always consistent and is the maximum recoverable system state that can be recreated, based on the message logs and checkpoints of individual machines recorded before the failure and recovery.
In later work in this area, we performed the first complete performance evaluation of consistent checkpointing rollback recovery, including the effect of techniques such as incremental checkpointing and copy-on-write checkpointing. I also designed an efficient new recovery technique, based on optimistic message logging and replay, to support low-overhead, low-latency commit of output messages from the system to the outside world. The outside world consists of everything with which processes can communicate that does not participate in the system's rollback-recovery protocols, such as the user's workstation display. Messages sent to the outside world must be delayed until the system can guarantee, based on the state information that has been recorded on stable storage, that the message will never be "unsent" as a result of processes rolling back to recover from any possible future failure. We also developed the first completely asynchronous protocol for optimistic rollback recovery, which avoids all synchronization between processes, both in recording message logging or checkpointing information before a failure, and in recovering after a failure. Our protocol is completely asynchronous and, in contrast to other attempts at reducing such synchronization, may need to roll back each process at most once in response to any failure.
Other ResearchIn addition to my research in wireless and mobile networking and in distributed system fault tolerance, I have also made significant research contributions in a number of other areas.
For example, in the area of high-speed, low-latency networking, in 1992 I designed a number of new techniques for optimizing network remote procedure call (RPC) performance, and implemented an RPC system, called Peregrine, using these techniques. This work was done at Rice after I completed my Ph.D. thesis work. Peregrine's measured performance was faster than previous RPC systems by nearly a factor of two, with performance very close to the minimum possible hardware latency, while still supporting the full generality and functionality of the RPC model. The Peregrine RPC system exploited the differences in requirements between general-purpose message passing and native support for RPC, together with the advanced features of modern computer architectures such as "gather" DMA and virtual memory page remapping.
Also at Rice, between 1981 and 1983 I designed and implemented a complete emulation of the 4.1BSD Unix operating system kernel running on top of Digital Equipment Corporation's VMS operating system on VAXes. This system, called Phoenix, was binary compatible with standard Unix on the VAX at the object code level and allowed virtually all Unix programs to run without changes. Phoenix was one of the first emulations of one operating system on top of a different native "host" operating system. The Phoenix kernel emulation included large portions of the Unix kernel source code in order to ensure precise Unix semantics, but it did not run in kernel privilege mode on the hardware. Phoenix was in daily use on a number of VAXes at Rice for many years, including the primary Computer Science departmental machine at the time, and was distributed to a large number of other universities.