In many ways, the combination of high-performance computing (HPC) and artificial intelligence (AI) represents the pinnacle for application execution in the IT ecosystem. HPC machines represent the largest systems available today, with the largest systems having nearly 10,000 CPUs with over 200,000 cores, nearly 28,000 Tesla general-purpose graphical processing units (GPGPUs), and a total of over 600 GB of memory and 800 GB of non-volatile RAM (NVRAM). Such a system could run deep neural networks very efficiently, and with a speed unmatched by conventional systems.
One of the greatest problems confronting computation systems that utilize petabyte-scale data sets is the time it takes to load the data sets. Loading a petabyte of data over a 32-line PCI Express (PCIe) 5.0 bus takes over two (2) hours. Even with the massive networks connecting HPC nodes, this is still a slow process – even with 20GB/s node-node connectivity using EDR InfiniBand and 100 ingress paths, this would still take 500 seconds (nearly 10 minutes). For AI programs that perform extensive computations across a single contiguous dataset, this time is inconsequential. However, there are other AI HPC applications with massive datasets such as biosciences, oil/gas exploration, and drug discovery that this is not the case. To quote Paul Martino of Bullpen Capital (an expert in using HPC for trading and other applications) stated, “Today, it is often the case that businesses are not bound by the compute resources their GPUs but rather by the I/O backchannel to get the datasets from place to place (the data and the GPU need to be in proximity for the computing to be possible at scale) … Moving the data may be a bigger problem than performing the computation on it.”
One option to minimize the issues with moving large data sets like this for problems combining AI and HPC is to use computational storage in the HPC nodes. Computational storage allows data to be operated upon in-situ within the storage device. This allows data to be searched, indexed, and acted upon while in the storage device. SSDs with this capability can significantly accelerate AI applications by providing the specific data required to the HPC compute elements (GPGPUs, CPUs, etc.). This reduces the movement of data to only what is necessary, streamlining the compute process. In our next blog, we will examine some of the AI problems currently being run on HPC clusters, and how computational storage can help their performance.