NUMA Integration

Starting with release 5, Corus supports integration with the NUMA hardware architecture. NUMA, which stands for "Non Uniform Memory Access", is a design of computer memory on multiprocessor computers consisting of segmenting memory slots into colocated nodes, in order to provide local access to CPU cores. This "local affinity" is meant to reduce the latency involved in accessing memory, and may provide a significant performance increase for memory-bound applications (although each application should be tested, and no wide-ranging assumptions should be made).

Here are the highlights of the NUMA integration:

  • Automatic discovery of NUMA nodes available on the running server;
  • automatic processor and memory assignment;
  • automatic load-balancing of process execution among available NUMA nodes;
  • possibility of restricting process execution to a NUMA node subset.

Note that NUMA integration is disabled for Docker-based processes: that is because in this case Corus delegates execution of processes to the Docker daemon under its control (it does not invoke the command-line starting up these processes).

Contents:

Introduction

NUMA integration is provided in Corus via the numactl command on Linux to manage process policies. When this feature is enabled, Corus will start assigning each executed process to the next available NUMA node. The assignment algorithm follows a simple round-robbin strategy in order to load balance the processes on all the nodes available on the machine.

The current interation can only assign a single NUMA node for each running process (that is: one process will be "pinned" to a single node).

Node Assignment

The main functionality if Corus' NUMA integration consists dynamic assignment of memory and CPU core, at process startup. Assignment follows a round-robbin strategy in order to evenly distribute the processes over the available NUMA nodes: Corus keeps track of which nodes processes are currently assigned to, and is thus able to determine the node to which the next started process should be pinned.

Usage of the -XX:+UseNUMA JVM argument conflicts with the current Corus/NUMA integration as the JVM is overriding any policies that could be applied by the numactl command. The JVM argument will cause the creation of a memory segment that will spread accross NUMA nodes. It will also trigger special GC behavior to minimize cross-node communication. The numactl command allows for the assignment of processes to a single NUMA node, thus eliminating any cross-node communication and maximizing the performance obtained from the host machine. It is recommended that you test your application with both options, and use the best option for your use-case, not both.

Corus support two modes for defining the inventory of NUMA nodes available for assignment:

  • Automatic node discovery, for ease of use;
  • manual node definition, for specific needs.

Automatic Node Discovery

Corus provides an automatic discovery mechanism that detects, on startup, the available NUMA nodes on the host. This behavior is activated by the corus.server.numa.auto-detection.enabled configuration property. In this mode (which is the default) you have nothing more to do: Corus will automatically pin started processes over the available NUMA nodes.

Manual Node Definition

For advanced use-cases, Corus supports resticting the NUMA nodes on which to load-balance running processes. The corus.server.numa.first.node.id property defines the lower bound of the NUMA node range to use. By the same token, the corus.server.numa.node.count property is also available to define the total number of NUMA nodes managed by Corus. These properties dictate the range within which Corus will pin processes to NUMA nodes.

Note that any subset defined this way consist of a consecutive list of node ids

Viewing Node Assignment

As described above, Corus will assign new processes to NUMA nodes. It offers two ways to visualize such assignements:

  • The ps command of the Corus CLI now provides a -numa option that will list the NUMA node identifier assigned to each process.
  • The same information is also available through the REST API using the get processes request.

Configuration

As partially explained previously, the behavior of the NUMA integration in Corus is determined by configuration properties in the corus.properties file (found under $CORUS_HOME/config). The following table lists these properties and provides a description for each:

Name Description
corus.server.numa.enabled Indicates if the NUMA integration is enabled or not (defaults to false). Attempting to activate NUMA on a host that does not support this architecture will result in errors when starting processes.
corus.server.numa.auto-detection.enabled This flag enables/disables auto discovery of available NUMA nodes on the running host server (defaults to true). If this property is set to false, the definition of available NUMA nodes must be explicitly set with the appropriate properties (see below).
corus.server.numa.bind.cpu This flag determines the processor assignment policy (defaults to true). When set to true, Corus will start new processes to be executed only on the CPUs of the assigned NUMA node, otherwise the default CPU assignment policy will be applied to the process.
corus.server.numa.bind.memory This flag determines the memory allocation policy (defaults to true). When set to true, Corus will start new processes with memory allocated only from the assigned NUMA node - note that allocation will fail when there is not enough memory available on the node. If the value is set to false, then the default memory allocation policy on the host will be in effect.
corus.server.numa.first.node.id Defines the fist NUMA node to use for process binding (defaults to 0). This value is expected to be an integer that can be changed to restrict the first NUMA node from which Corus will start assigning processes (by increasing the lower bound of node range). The value must be greater or equals than 0 and lower than the highest NUMA node identifier on the host server (as implied by the "node count", configured with the next property, below).
corus.server.numa.node.count Defines the total number of NUMA nodes over which Corus will perform process assignment. This property will only be in effect when auto-detection is disabled. The value must be an integer greater than 0 and must not exceed the total number of NUMA nodes available on the host server.

Disabling NUMA support for Specific Processes

It is possible to disable NUMA support, in the Corus descriptor, for specific processes. Just set the numaEnabled attribute to false, either for the <java> or <magnet> element, as illustrated below:

<distribution
  xmlns="http://www.sapia-oss.org/xsd/corus/distribution-5.0.xsd" 
  name="grid" 
  version="1.0">
  <process name="compute-server">
    <port name="test" />
    <java mainClass="org.sapia.corus.examples.GridServer" 
            profile="test"
            javaCmd="java"
            numaEnabled="false">
    <java mainClass="org.sapia.corus.examples.GridServer" 
            profile="prod"
            javaCmd="java">
  </process>  
</distribution>