The Mercury Suite

Power densities have been increasing rapidly at all levels of server systems. To counter the high temperatures resulting from these densities, systems researchers have recently started work on software-based thermal management. Unfortunately, research in this new area has been hindered by the limitations imposed by simulators and real experiments. In particular, the environment where real experiments take place needs to be isolated from unrelated computations or even trivial "external thermal disruptions", such as somebody opening the door and walking into the machine room. Under these conditions, it is very difficult to produce repeatable experiments. Worst of all, real experiments are inappropriate for studying thermal emergencies, as repeatedly inducing emergencies to exercise some piece of thermal management code may significantly decrease the reliability of the hardware.

In contrast, commercial temperature simulators do not require instrumentation or environment isolation. However, they are typically expensive and may take several hours to days to simulate a realistic system. Worst of all, these simulators are not capable of executing applications or any type of systems software; they typically compute steady-state temperatures based on a fixed power consumption for each hardware component. Other simulators, such as HotSpot, do execute applications (bypassing the systems software) but only model the processor, rather than the entire system.

Mercury is a software suite that avoids these limitations by accurately emulating temperatures based on simple layout, hardware, and component utilization data. Most importantly, Mercury runs the entire software stack natively, enables repeatable experiments, and allows the study of thermal emergencies without harming hardware reliability.

  + A Brief Description of the Mercury Suite
  + Download Mercury 1.1
  + User Guidelines
  + Installing Mercury
  + Calibrating Mercury
  + Running Mercury
  + Experimental Results on a Dell PowerEdge 2850
  + Acknowledgments
Please cite our ASPLOS 2006 paper (full reference below) in case you actually use Mercury in experiments to be reported in the computing literature.

T. Heath, A. P. Centeno, P. George, L. Ramos, Y. Jaluria, and R. Bianchini. "Mercury and Freon: Temperature Emulation and Management for Server Systems". Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006. Earlier version published as Technical Report DCS-TR-596, Department of Computer Science, Rutgers University, January 2006.

A Brief Description of the Mercury Suite

The main components of Mercury are:

Solver. The solver is the part of the suite that actually computes temperatures using finite-element analysis. It receives component utilization data from a trace file or from monitoring daemons. If the utilization data comes from a file, i.e. the system is run off-line, the end result is another file containing all the usage and temperature information for each component in the system over time. If the utilization data comes from the monitoring daemons (each of which running on a different machine than the solver), the applications or system software can query the solver for temperatures. Regardless of where the utilization data comes from, the solver computes temperatures at regular intervals; one iteration per second by default. All objects and air regions start the emulation at a user-defined initial air temperature.
[Man Page]

Monitoring daemon. The monitor daemon, called monitord, periodically samples the utilization of the components of the machine on which it is running and reports that information to the solver. The components considered are the CPU(s), disk(s), and network interface(s) and their utilization information is computed from /proc. For the Intel Pentium 4 processor, we developed an alternative version of monitord that monitors the hardware performance counters of the processor. The observed performance events are translated into an estimated energy, which is linearly translated into a "low-level CPU utilization". The frequency of utilization updates sent to the solver is a tunable parameter set to 1 second by default. Our current implementation uses 128-byte UDP messages to update the solver.
[Man Page]

Sensor library. Applications and system software can use Mercury through a simple runtime library API that comprises three calls: opensensor(), readsensor(), and closesensor(). The call opensensor() takes as parameters the address of the machine running the solver, the port number at that machine, and the component of which we want the temperature. It returns a file descriptor that can then be read with readsensor(). The read call involves communication with the solver to get the emulated sensor reading. The call closesensor() closes the sensor. With this interface, the programmer can treat Mercury as a regular, local sensor device. The program code below illustrates the use of these three calls to read the temperature of the local disk.
[Man Page]

Thermal emergency tool. To simulate temperature emergencies and other environmental changes, we created a tool called fiddle. Fiddle can force the solver to change any constant or temperature on-line. For example, the user can simulate the failure of an air conditioner or the accidental blocking of a machine's air inlet in a data center by explicitly setting high temperatures for some of the machine inlets. Fiddle can also be used to change airflow or power-consumption information dynamically, allowing us to emulate multi-speed fans and CPU-driven thermal management using voltage/frequency scaling or clock throttling.
[Man Page]

User Guidelines

To use Mercury properly, you have to go through three main steps: installing, calibrating, and running. Some of these steps may require kernel recompilation, and therefore we strongly suggest you to go through the guidelines before you start working with the suite.

The Mercury installation is fairly simple. After installing Mercury, you will have to calibrate it against real temperature measurements of your target machine or set of machines. Each new machine architecture or machine-room configuration will require a new calibration. The calibration corrects inaccuracies in all aspects of the thermal modeling. After the calibration phase, Mercury can be used without ever collecting more real temperature measurements, i.e. Mercury will consistently behave as the real system did during the calibration run. Thermal emergencies, as well as other environmental changes, can be introduced at run time with the thermal emergency tool.

Installing Mercury

a) To install Mercury, you will need the following packages: gcc; bison; flex. Make sure they are installed and then go to the Mercury root directory. Type:
   $ cd src/emulator
   $ sh ./ > macnames.c
The script tries to create a file called "macnames.c" under the directory 'src/emulator'. That file contains up to 256 hosts of your network, using the rightmost byte (lowest order octet) of the IP address as the host entry.

Check if "macnames.c" was successfully created. If not, you can still try to run (same directory) or create it manually. Insert at least the hosts you intend to use in your experiments. The resulting file should look like this:

   extern char *MachineName[256];
   void SetUpNames(void)
   Assuming that the three hosts that you intend to use are:
The user may be required to adapt some parameter names within the source-code of the suite in order to match his own system. For example the default disk that we used was a SCSI named /dev/sda. Here are the main source-code lines that should be changed if necessary:
   abuser/complex.c:#define WHICHDISK "/dev/sda"
   abuser/complex.c:#define HDPARM "hdparm -f /dev/sda"
   abuser/disk.c:#define WHICHDISK "/dev/sda"
   abuser/disk.c:#define HDPARM "hdparm -f /dev/sda"
   monitord/monitord.c:#define DISK "sda"
To finish up the Mercury installation procedure, go to 'src' and type 'make'. Then go back to the distribution's root directory and type './'. The binary files should be found under the 'bin' directory.

b) If you wish to use performance counters, you need to download and unzip the Linux kernel 2.6 source code, as well as the performance counters patch, provided with the Mercury distribution. The patch is 'linux-2.6.15-energy-mercury.patch' should be applied on the kernel like this:

    patch -p 1 < linux-2.6.15-energy-mercury.patch 
Then recompile the patched kernel, carefully enabling: (1) "Processor family (Pentium-4/Celeron(P4-based)/Pentium-4 M/Xeon)"; (2) "/dev/cpu/*/msr - Model-specific register support"; and (3) "CPU Energy Estimator". All of these features are found under the option "Processor type and features". Install the new kernel and reboot the computer.

You will also need to enable the performance counters in the source-code of Mercury (they are disabled by default). To do that, uncomment the following lines and go on with the installation procedure.

   src/monitord/monitord.c://#define ENABLE_MSR
   src/reader/cpu_energy.c://#define ENABLE_MSR

Calibrating Mercury

The goal of this step is to build a good model of your system using the dot language. The model encodes heatflow and airflow graphs that are passed to Mercury as input. See details of the file format in the manpages.

a) Micro-benchmarks In the Suite we provide two micro-benchmarks for calibration experiments. The first set exercises the CPU (cpu_simple), putting it through various levels of utilization interspersed with idle periods. The second (disk) does the same for the disk. There is also a more challenging benchmark (complex) that exercises the CPU and disk at the same time, generating widely different utilization over time. This behavior is difficult to track, since utilization change constantly and quickly. In our ASPLOS paper, we used it to validate our thermal model after calibration. Further results on modeling and validation can be found here.

The source code of all benchmarks can be found under the directory 'src/abuser'.

b) Trace collection and analysis Collecting a trace of each micro-benchmark execution can speed up calibration significantly, since the solver can run faster than real time when using traces. Each trace consists of real temperatures and hardware component utilizations. For example, the hardware infrastructure we have used for trace collection was a set of temperature sensors, including strategically located thermocouples and intra-component thermal sensors. Our software infrastructure comprised some Linux drivers and software tools, which can be found under the directory 'src/reader' (their binaries are in the 'bin' directory). An exception is the package ipmitool, which can be used to read the temperature of diverse motherboard components/attachments. If your motherboard supports ipmitool, you can install it, which requires enabling the package in the kernel, before the kernel is recompiled. Enable "IPMI top-level message handler", found in: in Device Drivers; Character Devices; IPMI;

In order to help collecting and analyzing traces, we provide scripts for trace collection (get-trace) and data analysis (emulate-plot). Both require awk and sed, whereas the latter also requires gnuplot and Mercury to be installed. These scripts can be found in the directory 'scripts'.

Running Mercury On-Line

  1. Start up the solver [Man Page]
    i.e.: sudo ./emulator -m -w 1000 -v -d
      or: sudo ./emulator -m -w 1000 -v -d -l /tmp/emulator.log
    (note that you need to be root/be sudoer to run it). 
  2. Start up the monitor on each machine that has to be monitored (one monitor per machine) [Man Page].
  3. While the solver is running, you can use the thermal emergency generator to change temperatures and constants in the emulation [Man Page].
  4. Likewise, you can always query specific emulated temperatures by using the sensor library [Man Page].

Running Mercury Off-Line

  1. Generate a file with inputs in the format of the log file. That is:
  2. The format of each entry in this file is the following: (1) timestamp, (2) machine number,(3) resource number (i.e.: cpu=2, disk=5, net=6), (4) accumulated usage, (5) usage difference (as compared to the last measurement), (6) elapsed time
  3. Then run the solver with that log file as an input [Man Page].
    i.e.: sudo ./emulator -m -v -d -f /tmp/emulator.log

  4. Note that the get-trace and emulate-plot scripts mentioned above already run Mercury off-line. The former script collects traces while running our micro-benchmarks. The latter generates a Mercury log file from the collected trace data, and then runs the emulator, creating plot-files automatically with gnuplot. The plot-files contain both real and emulated temperatures, extracted from the trace-file and from the output of the emulator.

Regardless of how you run it, Mercury will print out the emulated temperatures of all components defined in your dot file. You can redirect the output to a file to save those results.


We would like to thank Frank Bellosa, Andreas Merkel, and Simon Kellner for sharing their performance-counter infrastructure with us. We would also like to thank Enrique V. Carrera and Diego Nogueira, who contributed to the early stages of this work.

The Mercury Suite contains copyrighted software by Frank Bellosa, Andreas Merkel, Simon Kellner, and Martin Waitz. It also contains publicly available software from the Linux Smartsuite 2.1 package, as well as from Graphviz - Graph Visualization Software.