Paper 8
GulfStream - Dynamic Topology Management in Multi-domain
Server Farms
Sameh A. Fakhouri
IBM T.J. Watson Research Center
A multi-domain server farm is a collection of servers
divided into a number of distinct domains. While the resources of the server farm are shared, each domain
is isolated from the others. Server farm administrative software allocates resources, and enforces isolation
policies. This paper describes GulfStream, a distributed software system that addresses the problem of managing
the network topology of such a server farm. In particular, it addresses the following core problems: topology discovery
and verification, and failure detection.
Unlike most topology discovery and failure detection systems which focus on the nodes in a cluster, GulfStream logically
organizes all of the network adapters of the server farm into groups. Each group contains those adapters that can directly
exchange layer 2 messages. GulfStream dynamically establishes a hierarchy for reporting network topology and availability of network adapters.
We describe a prototype implementation of GulfStream on a 55 node heterogeneous server farm interconnected using switched
fast ethernet. This was done in the context of the Oceano project which provides for dynamic resource allocation in response to
workload variations. We discuss scaling GulfStream to larger environments; we envision
its use in server farms containing thousands of nodes.