Paper 10
Operating System Support for Safely and Efficiently Programmable
Routers
Prashant Pradhan and Tzi-cker Chiueh
State University of New York at Stony Brook
Placing computation inside the network can yield significant performance
benefits to network-based applications. These benefits accrue from topologically strategic placement of computation and from the ability of
such computation to exploit global network context. For example, congestion control state can be shared between flows passing through an
intranet's portal router, by placing a function in the router that aggregates congestion control state on a path-by-path basis
[1].
To support placement of computation in the network, a router operating system should support appropriate abstractions for composing computation
on network flows, and provide efficient implementations of these abstractions. More importantly, the core router's integrity and
performance should remain unaffected in the presence of such a composable computation framework. Such isolation requires
memory protection and performance protection of the router kernel from dynamically added
functions. With these goals in mind, we have developed a router operating system that allows safe and efficient composition of computation on
network flows. We present the essential features of this OS and describe an example application, Aggregate TCP (ATCP)
[1], that can be implemented using these features.
Computation is composed in terms of the following entities :
- Extension functions: These are preemptible functions that can carry state across invocations. Every extension function invocation is
made in some execution context or flow. An extension function may have multiple pending invocations, issued in different flow contexts.
- Flows: Flows are abstract execution contexts. A flow is a unit of scheduling and resource allocation. Flows may allow other flows
to share their resources through a simple access control mechanism.
Given the above entities, there are two key mechanisms to compose computation and to determine control flow :
- Asynchronous Invocation: A function, invoked in a given flow context, may pass control to another function by posting an invocation to
it. The invocation is asynchronous, and the CPU scheduler determines when to make invocations pending in various flow contexts.
- Static Binding: Flows may statically bind themselves to a stream of packets by specifying a packet filtering rule, and an extension
function that should get control when a packet matches the rule.
- Dynamic Binding: To pass control to a target function, a given function may reference the target function by
names that are strings with semantic connotations. The router OS provides a mechanism to
register and query these names.
To implement this framework with good invocation performance, extension functions may be co-located with the router kernel to avoid expensive
context-switching and TLB flushing overheads. However, the router kernel's safety is not compromised, owing to the use of
intra-address space protection [2]. The extension functions are placed in a lesser
privileged subset of the kernel address space, which provides memory protection to the kernel, but only incurs the overhead of a protected
function call while making extension function invocation. Performance protection is ensured through a preemptive CPU scheduler.
Aggregate TCP congestion control (ATCP) [1] is an ideal function for placement inside the network, since it can exploit global information
about congestion status on various network paths and allow TCP flows to avoid their cold-start phase in congestion estimation. In ATCP, a router
placed at the edge of the network, maintains congestion control related state for flows passing through it, grouped by the destination subnet of
these flows. An ATCP router, upon receiving a TCP connection request, splits it into a local subconnection (L) and a remote subconnection (R). R
starts from a congestion window equal to the warm estimate. On L, an available
credit is maintained, depending upon the congestion window of R and its growth mode (linear/exponential). Since the RTT on L is much
smaller than that on the whole L-R path, the congestion window for L can grow to the warm estimate much faster.
ATCP doesn't require any changes to the end-system TCP implementations and its evaluation using a real web server trace shows a potential improvement
of upto a factor of 2 in normalized HTTP transaction latency. ATCP can be naturally and efficiently implemented using the API exposed by the
proposed operating system.
References
[1] P. Pradhan; T. Chiueh; A. Neogi, "Aggregate TCP Congestion Control Using Multiple Network Probing", Proc. ICDCS-2000.
[2] T. Chiueh; G. Venkitachalam; P. Pradhan, "Integrating Segmentation and Paging Protection for Safe, Efficient and
Transparent Software Extensions" , Proc. ACM SOSP-99.