Posted by & filed under news.

Nor-Tech’s Numascale technology gives Mainframe SMP Performance at Commodity prices.


Numascale’s NumaConnect™ Technology is Plug-and-Play Scalable SMP from Commodity Servers 

Nor-Tech is using NumaConnect™ to connect commodity servers with an AMD HyperTransport bus into a scalable SMP server that presents large scale Big Iron like SMP to your OS. The technology combines all the processors, memory, and IO resources of the system in a fully virtualized environment controlled by standard operating systems.

Due to the limitations of shared memory technology, all large-scale SMP systems employ what is known as Non-Uniform Memory Access (NUMA) to achieve a global memory view. Memory, while globally accessible, it not always local. It may exist in another bank of memory on the same motherboard, or on a motherboard somewhere else in the system. Access times will vary depending upon location.

Another feature of global memory is cache coherence. Data may exist in either main memory or in processor cache memory. Changes that occur in cache memory are not reflected in the main memory until the cache memory is synchronized with the main memory. To make efficient use of the processor caches, NUMA systems need to contain automatic cache coherence mechanisms. In order to fully support global memory access with full cache coherence, NumaConnect™ provides cache coherent NUMA (ccNUMA) capabilities to all connected systems.

Commodity Large Scale SMP

Traditionally, threaded (OpenMP) applications cannot be scaled beyond a single server node that exists within a cluster. Thus, users often chose MPI programming as a means to scale beyond a single server. On the other hand, while MPI codes can run in an SMP environment, they may not be optimal for a shared memory environment. Both MPI and SMP represent solutions that are directed by the underlying hardware.


Clusters allow MPI programs to run efficiently between server nodes, but there is no easy method for OpenMP programs to operate across a cluster. In an SMP system both MPI and OpenMP can run anywhere on the machine.

Until Numascale, the only hardware solution that supports both programming environments for large numbers of processing cores is expensive SMP mainframes. These systems are prohibitively expensive for most HPC users and do not offer the flexibility of commodity solutions.

The desire for SMP features has been the driving force for many cluster software projects as well. Indeed, CLOMP, Mosix, bproc, Kerrighed, and ScaleMP all attempt to bring many of the SMP features (i.e., unified memory and process view/migration) to the commodity cluster. These projects have seen some success, but still operate on top of the island-to-island transport mechanism mentioned above.

The optimal solution would be to connect commodity servers to form a true SMP machine. Such a system would address many of the issues mentioned above. Users and administrators could expect the following:

• Mainframe SMP performance at commodity prices

• Support for both OpenMP (threads) and MPI (messages) in the same environment

• Single memory and process space

• Large amounts of memory

• Single OS image and simple management

The NumaConnect™ technology uses a single chip that combines the ccNUMA control logic with an on-chip seven-way switch; the on-board switch eliminates the need for a separate central switch and long cables like would you see for InfiniBand is a Beowulf cluster Fat Tree topology.

NumaConnect™ SMP adapters have the following features:

• Attaches to coherent HyperTransport (HT) through a standard HTX connector found on commodity motherboards

• Provides up to 4 Gbytes of Remote Cache for each node

• Full 48 bit physical address space, providing a total of 256 Tbytes of memory

• Support for up to 4,096 nodes

• Sub-microsecond MPI latency between nodes (ping-pong/2)

• An on-chip distributed switch fabric

The NumaConnect™ SMP Adapter literally transforms AMD multi-core servers into a fully functioning SMP system. The solution does not require software to be modified or rely on any OS extensions.


Numascale has brought the return of SMP to the HPC mainstream possible. The cost barrier to large-scale SMP deployments has been broken by an interconnect that provides true shared memory (ccNUMA) across commodity servers. Based on the industry standard AMD HyperTransport, the Numascale NumaConnect™ adapter provides the cost savings and convenience of a mainframe SMP system at commodity prices.

The advantages of SMP systems are well known and do not preclude running existing cluster applications (i.e., MPI) on an SMP machine. The NumaConnect technology represents the best of both worlds (mainframe SMP and low cost clusters) and provides a “plug-and-play” SMP environment that does not require emulation software or specialized drivers. A single operating system image provides a unified memory, process, and I/O space in which both threaded (OpenMP) and message passing (MPI) applications can execute, eliminating the “either/or” scenario found on today’s clusters.

Try Out Your Code on an Abaqus HPC Cluster Today

Contact Nor-Tech today for a no cost opportunity to try out this scalable SMP technology for your applications today. See how Nor-Tech’s over ten years of HPC expertise can help you leverage advances from Numascale, AMD, Intel, and NVidia get your products to market faster than your competition.  Click here to get started on an evaluation today.