Our clients’ engineering teams already know all this, but chances are their management teams and even some people on their IT teams don’t. So here is nutshell info about HPC .
First of all, the term “HPC” is an acronym for high performance computing, not high performance computers. So the term “HPC” alone does not refer to the technology—only the capability. Most HPC technology falls somewhere between workstations and supercomputers. However, while supercomputers are also HPC technology, not all HPC technology is a supercomputer. One thing that distinguishes HPC technology is the architecture. Rather than having a monolithic, single-box design, HPC technology uses clusters of servers that work in parallel to perform highly-complex calculations.
An HPC Cluster
An HPC cluster is a set of linked computers that use parallel processing to achieve higher computing power than could be achieved using just one of the computers in the cluster. The computers within a cluster are nodes. A small cluster may use as few as four nodes, and a large cluster may have 1000s. These multiple nodes rely on parallel processing.
An HPC Processor
The architecture of the nodes includes an HPC processor, which may use multi-core central processing units (CPUs) or general-purpose graphics processing units (GPGPUs). CPUs are the brain of a computer. GPGPUs are a general purpose version of GPUs, which process graphics and images. The more cores the processor has, the more powerful it is. Multi-core processors, like multiple clusters, enable parallel processing–in order to affect shared workloads across multiple cores, the cores must be able to communicate with each other.
Linux vs. Windows
HPC technology uses either a Linux or Windows operating system. Linux is a family of free and open-source software operating systems. Linux tends to be more popular for high performance computing than Windows. However Linux can be difficult to integrate, deploy and operate—this is one area where Nor-Tech’s engineers really shine.
Benefits of HPC
The main benefits are speed, cost, a flexible deployment model, fault tolerance, and total cost of ownership.
• Speed. The speed of HPC technology depends on its configuration; more clusters and cores enable faster parallel processing. Performance is also affected by the software that runs on the machine, including the operating system design, application design, and the complexity of the problem being solved.
• Fault tolerance. If part of the system fails, the entire HPC system doesn’t fail. Given that HPC workloads are compute heavy, fault tolerance ensures that the computations continue uninterrupted.
• Cost. HPC is less expensive than supercomputing because it utilizes fewer clusters.
• Total cost of ownership. In addition to the hardware, TOC also includes software licensing, power, cooling, and maintenance. This again, is an area where Nor-Tech’s engineering expertise comes into play—the hardware we build is smartly designed to bring these costs down without affecting performance.
The concepts involved in HPC are actually fairly simple—it’s the designing, developing, and engineering for the specific client and their specific objectives that is extremely complicated.
A big thanks to Lisa Morgan, much of the above is from her excellent 4/5/19 blog post for Datamation, which may be available at: https://www.datamation.com/data-center/high-performance-computing.html