Billions of dollars have been poured into AI startups, and the talking point has mostly been about performance. As training models in AI demand more computing resources, the emphasis is shifting to the cost of computing on their chips.
The performance-per-dollar on AI chips has “become very important,” Naveen Rao, whose AI chip company Nervana Systems was acquired by Intel for $350m in 2016, told The Register. Rao previously ran the AI product group at Intel and quit last year.
“Lots of [dollars] have gone into chip companies, and I think a lot of that has come without proper analysis,” said Rao, who started an AI company this year which is still in stealth mode.
There are divergent approaches to AI chip design, and the debate is whether an integrated chip or a decoupled approach would be more economical. To chip makers, this is a familiar battle: it’s a retread of whether components should be integrated in AI megachips or distributed over a network of processing units on a board or a network.
Popular AI systems today harness the power of hundreds of Nvidia’s GPUs distributed in computers. Rao is a proponent of this distributed approach, with AI processing split over a network of cheaper chips and components that include low-cost DDR memory and PCI-Express interconnects.
“The costs of building massive chips is much higher than the tiny chips and cables used to connect multiple chips together. The interconnect cables and chips benefit from economies of scale…these aren’t bespoke to AI compute, they are used in many applications,” he said.
Cerebras Systems CEO Andrew Feldman threw cold water over Rao’s arguments, saying that stringing together a chain of chips as an AI cluster can add to the hardware and electric bills.
“Let’s look at what Tesla did. “Did they use PCI links? No. Did they make a bigger chip? Yes,” Feldman told The Register, adding that “nonlinear scaling combined with all the other infrastructure necessary to tie them together is punishingly power inefficient.”
Cerebras’ own WSE-2 AI megachip, which shipped in August, is the largest processor ever built. It has 850,000 cores, twice that of its predecessor, and speeds up the interconnect bandwidth to 220 Pb/s, which the company claims is more than 45,000 times faster than the bandwidth delivered between graphics processors.
“Our units are expensive, but so is buying [Nvidia] 12 DGX A100s. At every phase, we are less expensive or same as than a comparable amount of GPUs and we use less power,” Feldman said.
There are other hidden costs, like buying 50 CPUs to connect 200 GPUs. “How do you put those GPUs together? You need giant Infiniband or Ethernet switches and each of those has optics pulling power,” Feldman said.
AI chip development is so diverse that you can’t take a one-size-fits-all approach, others said. The software may define the hardware, and some chips may facilitate processing on the edge before feeding relevant data into neural nets.
- Amazon’s AI chips find their way into Astro butler bot, latest wall-hanging display
- Graphcore’s AI chips may not be as powerful as Nvidia’s GPUs, but may provide good bang for your buck
- Machine learning devs can now run GPU-accelerated code on Windows devices on AMD’s chips, OpenAI applies GPT-2 to computer vision
- South Korea rounds up chipmakers and hyperscalers to build AI and server processors
- AI caramba, those neural networks are power-hungry: Counting the environmental cost of artificial intelligence
Hardware platforms are developed without much consideration of software and how it will scale among the platforms, said Rehan Hameed, chief technology officer at Deepvision, in a chat at last month’s AI Hardware Summit. The company develops a software development kit that maps AI models to various hardware.
AI chip design may also play out along the lines of Koomey’s Law, which is a corollary about how electrical efficiency of computation doubled about every 1-1/2 year over six decades. It also factors in sensors, smartphones, and other devices, said Dan Hutcheson, industry analyst at VLSI Research.
The CPU battle going on for decades shifted to energy efficiency after chip makers stopped cranking up the frequency of chips. AI applications are getting complex, and there will be a limit to the amount of electricity being thrown to solve complex problems, Hutcheson said.
“The problem is with self driving cars and electric cars. The car’s AI system should not consume half power of electrical mileage,” Hutcheson said. ®