High-Performance Computing: Moving From Expensive to Commodity Computing

High-performance computing in many ways acts as a predictor of how the overall IT market will evolve. Forty years ago, HPC was characterized by extremely expensive machines such as the Cray-1, Thinking Machines, nCube, and others. These machines were purpose-built, with unique architectures designed for the optimum in performance. As X86-based servers became prevalent in the market, the concept of clustering thousands of these commodity servers to create supercomputers took off. Later (approximately a decade ago), supercomputer architects started adding general-purpose graphics processing units (GPGPUs) to these commodity server clusters. A handful of GPGPUs could add tens of thousands of low-power processing cores to each server, improving performance and reducing the overall footprint of the supercomputer. The tradeoff in using GPGPUs is that applications must be recoded in a GPGPU programming language.

The latest trend in supercomputing is to utilize boards with field-programmable gate arrays (FPGAs) or even application-specific integrated circuits (ASICs) to supercomputer clusters to accelerate specific workloads. Both FPGAs and ASICs accelerate performance by actually implementing operations directly in hardware, which provides extremely powerful acceleration, albeit with the requirement to “code” the application in a hardware design language (typically Verilog or VHDL). While FPGAs can be reprogrammed in the field (ASICs cannot), recoding an application into Verilog or VHDL is even more challenging than recoding into a GPGPU language. Because of this, the use of FPGAs or ASICs to accelerate supercomputing is limited to a few applications with extremely high “payoff”.

Given these dynamics, one might ask if there are alternatives to GPGPUs, FPGAs, or ASICS that can provide significant acceleration, but with little or no “work” recoding or redesigning applications. Historically, the discussion is one of how supercomputer architects can add more cores to the workloads. However, the number of cores that can be brought to bear on a workload is only one of the three factors that impact application performance. Today, one of the greatest bottlenecks to high-performance computing for petabyte-scale workloads is the movement of data between the storage system and the supercomputer. This is an area where computational storage could improve the performance of HPC installations and workloads. If you want to find out more about computational storage, please check out the NGD Systems website – we have some great videos and white papers.



About the Author:

Scott has spent over 20 years in the Semiconductor and storage space in MFG, Design and Marketing. His experience spans over 15 years at Micron and time at STEC. His efforts have help lead to products in the market with over $300M in revenue.