AI: Multi-trillion $$ AI Data Center 'Compute' Builds ahead. RTZ #529

AI: Multi-trillion $$ AI Data Center 'Compute' Builds ahead. RTZ #529

The Bigger Picture, November 3, 2024

Long time readers here know that I’ve said many times in these pages that without AI Compute that scales with the LLM AI models, ‘AI’ becomes just two letters in the alphabet. It’s the reason of course that this year in particular has been the year that scaling AI Data Centers has become a ‘crowded trade’. Along with the required multiple Gigawatts of Power, Cooling and other AI Infrastructure. It’s the reason that companies across industries are spending hundreds of billions so early, in this AI Tech Wave.

And it’s why there’s a debate on how long investor patience will last to sustain said AI Compute Capex. But how much ‘Compute’ exactly will we need? Most experts are suggesting that it’s in the trillions over the coming decade. And the range is WIDE. From one to 9+ trillion wide. That’s the ‘Bigger Picture’ I’d like to unpack this Sunday.

As a convenience, I’m using the phrase ‘AI Compute’ to refer to this whole collection of AI Infrastructure investments. From AI GPUs/Networking/CUDA by companies like Nvidia, to all the infrastructure that goes into the AI Data Centers, to the Power, Cooling, Memory, Talent, and a whole growing range of inputs. That’s what I call ‘AI Compute’. It’s also broadly referred to in the industry as ‘AI Data Centers’. So ‘Compute’ is a useful ‘catchall’ term.

First a snap shot of where we are. As I outlined in yesterday’s Weekly Summary:

“Big Tech Reiterates AI Capex Commitments in Earnings Calls: Earnings season kicked off in earnest, with Big Tech companies continuing ‘pedal to the metal’ on AI capex spending. More AI GPUs and data centers as far as the eye can see. With huge helpings of Power from all sources, including the previously taboo Nuclear option. Even the private equity industry is jumping in with both feet, as KKR announces a $50 billion commitment. Not to mention Blackstone and Blackrock earlier.”

Softbank’s Masayoshi San announced a $9 trillion figure for global AI data center and power spend over the next decade, with Nvidia being a key beneficiary. It was an obvious attempt to outdo OpenAI Sam Altman’s $7 trillion number a few months ago, which has since been adjusted down to a few hundred billion for the industry. Nevertheless, investors for now are more or less taking the big public tech companies’ AI capex in stride. More here on 100,000+ AI GPU Data Center race.”

Most of the rest of the Weekly Summary was about how companies ranging from OpenAI to Google to Meta to Apple to Amazon and beyond were rushing to invest aggressively in the AI Compute. In fact, if you haven’t yet, I’d encourage you to read all the five items in yesterday’s AI Weekly Summary to get a sense of the scope of the ‘AI Compute’ ramp up as of just last week.

Now for how big this ‘AI Compute’ amount could be over the next decade. Keep in mind that the traditional Data Center industry over the last couple of decades is in itself a half a trillion plus industry globally.

What industry leaders like Nvidia’s founder/CEO Jensen Huang started to underline a year ago is that the entire data center industry would need an ‘AI makeover’ in terms of the chips, networking, memory, clusters, power, cooling etc., that would run north of a trillion dollars over a decade.

That changed to an estimate of over $2 trillion earlier this year. This was the number also forecast by Goldman Sachs in their recent Cloud Computing report.

Around the same time, as I’ve outlined, OpenAI’s Sam Altman floated numbers as high as $7 trillion, later adjusted to a few hundred billion dollars for the industry in the near term.

Not to be outdone, eternal technology optimist, Softbank founder and CEO Masayoshi Son, last week floated a number as high as $9 trillion for AI Data Centers and Power. His video interview describing how this makes sense is worth watching, at the FII Conference in Saudi Arabia.

So the range of estimates of th AI Compute Capex is quite large. As Masa Son himself justifies it, a $9 trillion spend over a decade represents less than 5% of a quadrillion dollar cumulative global GDP over a decade. Goldman Sachs economists separately are estimating in the near term that the global $120+ billion economy could see a 7% increase over a decade due to AI.

So a trillion to 9 trillion dollar range for AI Compute give or take, before we consider the additional amounts needed for other global priorities.

But of course there will be a wide range of market factors, political realities, financial imperatives, regulatory impediments and dozens of other issues that will prove to be formidable headwinds to Scaling AI Infrastructure and Power. At a pace that can keep up with Scaling LLM AI models, large and small.

But there’s no question that the heads of the biggest Tech Companies, and the largest institutional and sovereign investors, are right now behind scaling AI capex as fast as possible. That is a trend we have not seen this early in other Tech Wave, making this AI Tech Wave relatively unique.

The estimates of the AI Compute needed ahead, is directionally correct. The numbers required, be they hundreds of billions to trillions, are truly daunting given the opacity of AI products, revenues, profits and ultimately the trust and safe utility to mainstream users in the billions. But we’ll get there even if a bit later than desired. The pace of the buildouts may not scale as fast as AI models themselves.

OpenAI’s Sam Altman today reminded folks of these Compute requirements, this week in an online discussion on Reddit, in an AMA (Ask me Anything) session:

“OpenAI CEO Sam Altman said Thursday that his company’s next big AI model release likely won’t come this year as the company is “prioritizing shipping” of existing models that are focused on reasoning and difficult questions.”

“All of these models have gotten quite complex and we can’t ship as many things in parallel as we’d like to,” Altman wrote during a Reddit AMA. He said the company faces “limitations and hard decisions” when it comes to allocating compute resources “towards many great ideas.”

“After a questioner asked whether OpenAI’s vide model Sora was being delayed “due to the amount of compute/time required for inference or due to safety,” OpenAI product chief Kevin Weil wrote, “Need to perfect the model, need to get safety/impersonation/other things right, and need to scale compute!”

What’s true for OpenAI is true for every company in the world, AI or not. All striving to innovate with countless AI applications and services to come (Box # 6 in the chart above). Again, important to remember vs most previous tech cycles is that AI is massively computation driven with every user query. And with variable costs per query to boot.

Building AI Compute at Scale, and then delivering that Compute to users at every declining prices and operating efficiencies, in both computations and power, is the name of the game over the coming decade. Regardless of how many trillions it’ll take to get there. That is the directional AI Bigger Picture to keep in mind for now. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)





Want the latest?

Sign up for Michael Parekh's Newsletter below:


Subscribe Here