TOKYO — Nvidia CEO Jensen Huang came to Beijing this week to put Nvidia’s new AI inference platform — TensorRT3 — front and center at the company’s GPU Technology Conference (GTC).

Internet giants everywhere today are wrestling with the explosive growth of data volume generated by AI-enabled services — such as image and speech recognition, natural language processing, visual search and personalized recommendations, Nvidia explained. Every service provider is racing to find solutions for fast, accurate AI inferencing, while angling to cut dramatically the cost of data centers and cloud-service infrastructure.

Nvidia is pitching the combination of TensorRT3 with Nvidia’s GPUs, which it says delivers “ultra-fast and efficient inferencing across all frameworks for AI-enabled services.”

At the GTC, Huang announced that China’s top Internet companies — Alibaba Cloud, Baidu and Tencent — are now upgrading their data centers and cloud-service infrastructure with Tesla V100 GPUs. 

Nvidia CEO Jensen Huang (photo: Nvidia)
Nvidia CEO Jensen Huang (photo: Nvidia)

Nvidia also announced that China’s leading OEMs — including Inspur, Lenovo and Huawei — are using the Nvidia HGX reference architecture to offer Volta-based accelerated systems for hyperscale data centers.

But as Nvidia sees it, hardware alone won’t help AI-based service providers cope with the explosion of data volume in AI inferencing.  Nvidia is pushing its customers to embrace the TensorRT platform.

In Beijing, Nvidia announced that Alibaba, Baidu, Tencent, JD.com and Hikvision are adopting Nvidia TensorRT for programmable inference acceleration.
So far, JD.com is the only company already using TensorRT, according to Paresh Kharya, a group product marketing manager who runs Nvidia’s accelerated computing.

Why TensorRT?
Thus far, many AI-based service companies have done their own “hand optimization” when they take trained neural network frameworks such as TensorFlow or Caffee, and run them on selected GPUs, said Kharya. “TensorRT can fill in that gap,” he told us. 

(Source: Nvidia)
(Source: Nvidia)

Kharya described TensorRT as “like a compiler.” TensorRT allows service providers to pick any trained deep-learning framework, and select a destination-specific GPUs they want to use. 

Nvidia’s position is that no other company offers off-the-shelf “high-performance optimizing compiler and runtime engine” for production deployment of AI applications.

As TensorRT takes description files from the neural network and compiles it to run on the targeted GPUs, it can rapidly “optimize, validate and deploy trained neural networks for inference to hyperscale data centers, embedded or automotive GPU platforms,” said Nvidia.

According to this scenario, does inferencing take place in data centers? Or does it happen on the edge? Nvidia told us it can happen in either place.

Next page on EE Times US: AI-based services get cost prohibitive