BLASTing the Analysis of Protein Sequencing

//BLASTing the Analysis of Protein Sequencing

One of the more important tools in biological computation today is known as the Basic Local Alignment Search Tool, or BLAST. The purpose of BLAST, which was developed by the National Institutes of Health (NIH), is to take a genetic protein sequence and compare it against a database of sequences. The output of BLAST is a list of sequences that are identical or similar to the query sequence. It does this by looking at “short words” within the query sequence and seeing how many of these are matches for short words in the sequences in the database. This approach produces a balance between performance and accuracy.
To test how computational storage could accelerate biological computation tools, we recently ran BLAST in our lab in Irvine, California. We used a variety of test databases including one 150TB database containing 88 files for our testing. To ensure realistic results, we utilized unbalanced files with a different number of sequences per file. We utilized a multi-CPU server with a number of NGD Systems Computational Storage solid-state drives (SSDs) installed in them, and 256GB of DDR4 RAM. We varied the number of cores being used on the problem from one to 16. We also varied the number of Computational Storage SSDs being used compute accelerators.
The results of this testing are shown in the graph below (“Speedup” in the graph shows the percentage of improvement gained). The tests showed that increasing the number of CPU cores being utilized for BLAST from one to sixty-four cores can theoretically increase performance by sixty percent (60%).
The we ‘turn on’ up to 64 of the Computational SSDs combination with the 64 cores, the result was an additional potential increase in performance of 100 percent. What this shows is scalability of HW compute without scaling CapEx or forcing platforms to buy expensive RAM and CPU resources.
RESULTS: Achieve like 32-core like performance with 64 drives and 1-core
WITHOUT Computational Storage
32*$10,500(Platinum Xeon devices) = $672,000 + 64 (8TB NVMe SSD)
TOTAL: >$1 Million CapEx
WITH Computational Storage
1*$10,500 = $10,500 + 64 (8TB Computational Storage SSD)
TOTAL: CapEx Savings of over $600,000
We are continuing to develop and refine these results and you can find out more about this ongoing testing on the NGD Systems website.