A formula for determining the real-world performance of AI and others chips when running deep learning workloads.
AI companies generally home in on one criterion: more tera operations per second (TOPS). Unfortunately, when silicon manufacturers promote their TOPS metrics, they are not really providing accurate guidance. In most cases, the numbers being hyped aren’t real TOPS, but peak TOPS. In other words, the TOPS number you think you’re getting in a card is actually the best-case scenario of how the chip would perform in a more than perfect world.
I will discuss the problems the industry has created by mislabeling performance metrics and explain how users can independently evaluate real-world TOPS.
Faux TOPS vs real TOPS
AI application developers generally start performing due diligence by gauging whether a chip manufacturer’s published TOPS performance data is adequate for powering their project.
Say you’re trying to remaster images in full HD on the U-Net neural network at 10 fps (frames per second). Since U-Net operations require 3 TOPS per image, simple math says you’ll need 30 TOPS to complete your project at the desired FPS. So, when shopping for a chip, you would assume that cards claiming to run 50, 40, or even 32 TOPS would be safe for the project. In a perfect world, yes, but you’ll soon find out that the card rarely hits the advertised number. And we’re not talking about drops of just a couple of TOPS; compute efficiency can be as low as 10 percent.
While it’s certainly possible to tweak a neural network to squeeze better performance out of a card, it is extremely unlikely you will ever get close to the peak TOPS listed by vendors. Attempting to get even 60 or 70 percent computing efficiency will be a massive time sink. If any change to the neural network happens, you will have to go back to square one to again optimize everything – and still, it may not even work for your application. The problem is particularly pronounced for small batch processing; you’d be lucky to get more than 15 percent of the peak TOPS.
At this point, you are probably wondering how it’s possible to calculate real TOPS. It’s simple!
To discover how many real TOPS a particular card will deliver, you first need to determine the compute efficiency of that card. Ideally, this can be done by simply running the neural network you need on the targeted card. However, you may not have the card. You can still get an estimate by looking more in the detail of the marketing numbers from vendors. Performance data of neural networks like ResNet50 (or similar ones) are typically available. Assuming this is a typical ResNet50 implementation, you can find the number of giga operations per second (GOPS, not TOPS) to compute a single image. Then, just multiply it by the number of images per second (IPS) the vendor advertises, and voila!, you have a more realistic number of TOPS or the “Real” TOPS.
The efficiency is simply the ratio of real divided by peak TOPS or:
Peak TOPS x Compute Efficiency = Real TOPS
This formula empowers users to compare the true efficiency of cards when running a neural network before buying anything. You can reuse the efficiency with your required TOPS and see if it fits your needs. While factors like power and batch size can affect the outcome, if you know the card’s efficiency, this formula provides a good estimate of its real performance in a real-world use case. Of course, the IPS of a neural network published by vendors can still be questioned, but at least the estimate gives a better idea than comparing your real needed TOPS and the peak TOPS of a card.
It’s also worth noting that this isn’t just a GPU issue. Most specialized ASICs show very low actual efficiency, even if their marketing promotes high efficiency. Just take the IPS, the known GOPS for the network and a simple multiplication will give you an idea of a realistic figure.
While both GPUs and ASICs struggle with efficiency and performance, there is an alternative solution that doesn’t involve either of these chips.
The October 2020 MLPerf results show that FPGAs combined with inference acceleration are dramatically more efficient than alternatives and can thus get closer to the peak TOPS numbers that other chip makers advertise.
FPGAs are more efficient not only for computation, but also when it comes to silicon usage for computation. Essentially, these cards are “doing more with less,” which results in better performing neural networks at a fraction of the cost.
It bears repeating: Buyers should not fall for peak TOPS marketing hype. It’s an exaggerated performance number that most neural networks will never see under real-world conditions. Instead, take advantage of this formula:
Peak TOPS x Compute Efficiency = Real TOPS
Doing so will help you quickly, easily and accurately compare your performance needs to the actual performance of a chip, rather than any exaggerated vendor claims.
This article was originally published on EE Times.
Ludovic Larzul is the founder and CEO of Mipsology.