Business

Google Cloud chief reveals the long game: a decade of silicon and the energy battle behind the AI boom

Published

on



While the world scrambles to adapt to the explosive demand for generative AI, Google Cloud CEO Thomas Kurian says his company isn’t reacting to a trend, but rather executing a strategy set in motion 10 years ago. In a recent panel for Fortune Brainstorm AI, Kurian detailed how Google anticipated the two biggest bottlenecks facing the industry today: the need for specialized silicon, and the looming scarcity of power.

According to Kurian, Google’s preparation began well before the current hype cycle. “We’ve worked on TPUs since 2014 … a long time before AI was fashionable,” Kurian said, referring to Google’s custom Tensor Processing Units. The decision to invest early was driven by a fundamental belief that chip architecture could be radically redesigned to accelerate machine learning.

The energy premonition

Perhaps more critical than the silicon itself was Google’s foresight regarding the physical constraints of computing. While much of the industry focused on speed, Google was calculating the electrical cost of that speed.

“We also knew that the most problematic thing that was going to happen was going to be energy because energy and data centers were going to become a bottleneck alongside chips,” Kurian said.

This prediction influenced the design of their infrastructure. Kurian said Google designed its machines “to be super efficient in delivering the maximum number of flops per unit of energy.” This efficiency is now a critical competitive advantage as AI adoption surges, placing unprecedented strain on global power grids.

Kurian said the energy challenge is more complex than simply finding more power, noting that not all energy sources are compatible with the specific demands of AI training. “If you’re running a cluster for training … the spike that you have with that computation draws so much energy that you can’t handle that from some forms of energy production,” he said.

To combat this, Google is pursuing a three-pronged strategy: diversifying energy sources, utilizing AI to manage thermodynamic exchanges within data centers, and developing fundamental technologies to create new forms of energy. In a moment of recursive innovation, Kurian said “the control systems that monitor the thermodynamics in our data centers are all governed by our AI platform.”

The ‘zero sum’ fallacy

Despite Google’s decade-long investment in its own silicon, Kurian pushed back against the narrative that the rise of custom chips threatens industry giants like Nvidia. He argues that the press often frames the chip market as a “zero sum game,” a view he considers incorrect.

“For those of us who have been working on AI infrastructure, there’s many different kinds of chips and systems that are optimized for many different kinds of models,” Kurian said.

He characterized the relationship with Nvidia as a partnership rather than a rivalry, noting that Google optimizes its Gemini models for Nvidia GPUs and recently collaborated to allow Gemini to run on Nvidia clusters while protecting Google’s intellectual property. “As the market grows,” he said, “we’re creating opportunity for everybody.”

The full stack advantage

Kurian attributed Google Cloud’s status as the “fastest growing” major cloud provider to its ability to offer a complete “stack” of technology. In his view, doing AI well requires owning every layer: “energy, chips or systems infrastructure, models, tools, and applications,” noting that Google is the only player that offers all of the above.

However, he said this vertical integration does not equate to a “closed” system. He argued that enterprises demand choice, citing how 95% of large companies use cloud technology from multiple providers. Consequently, Google’s strategy allows customers to mix and match—using Google’s TPUs or Nvidia’s GPUs, and Google’s Gemini models alongside those from other providers.

Despite the advanced infrastructure, Kurian offered a reality check for businesses rushing into AI. He identified three primary reasons why enterprise AI projects fail to launch: poor architectural design, “dirty” data, and a lack of testing regarding security and model compromise. Furthermore, many organizations fail simply because “they didn’t think about how to measure the return on investment on it.”

For this story, Fortune journalists used generative AI as a research tool. An editor verified the accuracy of the information before publishing.



Source link

Trending

Exit mobile version