Modern application workloads involved in advanced scientific, engineering, and technology domains are complex and demand high performance, power, and accuracy for problem-solving on large datasets at very high speed. Each innovation for scientific, industrial, and societal advancements now leverages Artificial Intelligence, the Internet of things, data analytics, and simulation technologies as de facto standards in various scientific and industrial use cases. In the areas of weather forecasting, stock market trend analysis, animation graphics rendering, fraud prevention in financial transactions, aircraft design simulations, etc one crucial commonality is the ability to process large data in real-time and provide insights or outcomes almost instantaneously or near real-time. On the other hand drug discovery, scientific simulations involving large scale calculations, Astrophysics or space sciences, molecular modeling, quantum computing, climate research, cryptography or other computational sciences, etc would require a wide range of computationally intensive tasks that can run for days and months to complete, commonly such requirements demand nothing less than High-Performance Computing (HPC) interchangeably also called as the Supercomputer. The cost of obtaining and operating a supercomputer and its custom software may easily run into the millions of dollars, the technology remains far beyond the financial reach of most enterprises. Thus cluster-type HPCs, using relatively inexpensive interconnected computers running off-the-shelf software, are making waves due to ease of deployment and affordability yet to provides supercomputing capabilities is now possible.
What is HPC?
The HPC systems are designed differently. It consists of a cluster of different compute servers working in tandem with a high-performance network (read, Interconnect) that connects these compute servers and data storage seamlessly. The parallel file system, job scheduler services, and other software’s like tools and libraries are essential components besides the hardware. All these components operate seamlessly to complete various interrelated tasks, collectively called ‘Jobs’. Multiple jobs might be grouped under ‘projects’ and can be run as per their respective schedules on the HPC cluster.
The servers where these tasks are ‘run’ are called nodes and a cluster typically has hundreds of nodes working in parallel, thus HPCs are based on parallel processing principle. Each task is split into smaller ‘threads’ corresponding to each core of the processors and can communicate with each other via faster interconnects and thus can share a vast amount of data they work upon and between themselves. The indexing, storage, and retrieval of data are handled by a parallel file system that helps organize and present the data as required by each ‘thread’ associated with each ‘Job’ to all nodes. Additionally, while some data may be used for computation, other data sets reside on the cluster for comparisons, reference points for next ‘projects’ etc, and can be archived to any external storage connected to the HPC system.
HPS clusters generally aim for the maximum capability computing rather than capacity computing. Traditional capacity computing typically provides an efficient cost-effective computing ability to solve a small number of somewhat large problems or a large number of small problems, e.g. many user access requests to a particular database or a website, etc. Capability computing on the other hand is all about engineering the maximum computing ability to solve a single large problem in the shortest period by design.
HPC architecture is more complex than simply selecting a set of components and putting them together because all of the components must interact with one another to shape the actual performance that your application expects. Here is a simplified version of a typical HPC cluster.
A basic HPC cluster must have at least one Management/Head node and multiple worker nodes depending on the sizing of the cluster based on use cases it is targeted to serve. For bigger clusters, additional management nodes are deployed to run cluster-wide services, such as monitoring, workflow, and storage services, etc. The user node is for user access/logins to submit the user jobs which intern assigned to multiple worker nodes via a workload scheduler running on management/head node. HPC system may run many similar jobs with different parameters or data sets and can queue up hundreds of jobs and allow the workload scheduler to manage the workflow. Depending on the resources, all the jobs may run at the same time or some may wait in the queue while other jobs finish.
Worker/Compute Nodes execute or run the jobs assigned to them by the management node. All clusters have worker nodes that do the bulk of the computing and these nodes are almost always identical throughout the cluster. Although cluster building blocks have evolved to include high-end worker nodes with varying amounts of cores and memory. Some, of the nodes, may even have accelerators in the form of FPGAs, GPGPUs, or specially augmented coprocessors. From a pure software standpoint, scheduled jobs on a multi-core node are run in a Symmetric Multi-Processing (SMP) mode. This designation means that multiple programs (or processes) can be running at the same time on various cores. It also allows for a single program to use multiple cores by pulling the program apart into threads also known as Threaded programming, parallel programming designed for a single SMP node. The threaded program starts as a single program but soon breaks itself into separate but highly connected parts to take advantage of the multiple cores in the SMP motherboard node.
A workload scheduler is a critical part of the clustering process because it would be almost impossible to share resources without some form of load balancing tool. All users must submit their jobs to a work queue as part of the job submission processes along with the resources required for the job in terms of cores, memory, time to process, etc. based on these requirements the resource workload scheduler based on site-wide policies determines the job queue and priority of the job to run on the cluster as and when the resources become available.
At the Interconnect level, all the nodes are connected with Gigabit Ethernet (GigE), often supplemented by InfiniBand (IB) to handle the high traffic volume. Additionally, ultra-responsive network interconnects are made available for distributed memory and compute cores by and between nodes. Most of the modern server nodes offer a form of Intelligent Platform Management Interface (IPMI)– an out-of-band network that can be used for rudimentary monitoring and control of compute node hardware status and maintain heartbeats to assess cluster health and availability of worker nodes. The cluster traffic includes computation traffic between compute nodes, file system traffic, and administration traffic that provides node monitoring and job control across the cluster. Depending on the jobs, compute and/or file system traffic may dominate the cluster network thus either 10 Gigabit Ethernet (10GigE) and InfiniBand (100GigE) is used to ensure additional paths are always available. High performance interconnects are usually rated by latency, the fastest time in which a single byte can be sent in nanoseconds or microseconds, and bandwidth, the maximum data rate measures in Gbps.
Storage subsystems providing high-speed parallel access to data are critical as part of any modern HPC clusters using the GbE or IB fabrics, provide all worker/compute nodes access to large amounts of storage. Since HPC workloads are meant to create and process large datasets, the archiving system becomes crucial since it will move data from one storage service to another based upon policies set by the user. The key requirement here is to manage the data throughout its entire life cycle, from the source to cluster to home directories to storage subsystems seamlessly. The HPC file systems are often called “parallel file systems” because they allow for multi-node input and output operations unlike centralizing all storage on a single device, parallel file systems spread out the load across multiple separate storage nodes. One popular and the freely available parallel file system is Lustre that is a vetted, high-performance parallel file system. Other options include PVFS2, which is designed to work with MPI.
On the software side, much of the cluster infrastructure is based on open-source software. In almost all HPC clusters, each worker node runs a separate copy of the Linux OS variant that provides services to the applications on the node. User applications employ message passing libraries (e.g., the Message Passing Interface, MPI) to collectively harness large numbers of computing cores across many server nodes. Nodes that include coprocessors or accelerators often require user applications to use specialized software or programming methods to achieve high performance. An essential part of the software infrastructure is the workload scheduler (such as Slurm, Moab, Univa Grid Engine, Altair PBS Professional, etc.) that allows multiple users to share cluster resources according to scheduling policies that reflect the objectives of the business. In addition to MPI, users need compilers, debuggers, and profilers. The GNU software includes very good compilers and other programming tools to help manage cluster Installation/Provisioning and monitor the operation.
Since most of the HPC applications are sensitive to memory throughput along with the clock speed of the cores. Performance Benchmarks become useful indicators to help us understand what and how to test and the associated results used to fine-tune the cluster. The purpose of running a benchmark eliminates assumptions, there are SPEC ratings or Top500 can be a good start yet the applications and associated workflow are the ultimate benchmarks for any cluster for sure.
Advantages of HPC
As discussed above, the main advantages of HPC systems are speed, cost, a flexible deployment model, fault tolerance, and total cost of ownership. Although the actual benefits realized can vary from domains and particular HPC applications tailored for solving the particular problems.
Today, the digital transformation is driving the demand for performance at scale, the power to model and manipulate our digital world in silicon, and implications thereof in physical wellbeing has enabled vast changes in how we conduct science, business, and even our everyday lives. From the next scientific breakthrough to new and better products or to gain deep insights into areas as diverse as genomics, computational chemistry, financial risk modelling, seismic imaging, weather forecasts, and reducing harmful emission towards a greener world, High-Performance Computing (HPC) is playing a central role in all these efforts. We are at the cusp of the beginning of the High-Performance Computing era since it has moved from an expensive endeavour to a cost-effective technology within reach of virtually every budget, the transformation is becoming reality for sure!