No Image Available

Jensen’s Trillion-Dollar Token Factory

I have been crowing about OpenClaw for a few months now. It is AI’s democratization moment. Why it matters isn’t  the actual software, but more about the possibility it represents: intelligence on the cheap. Then Nvidia CEO Jensen Huang spent two hours at his company’s GPU technology conference (GTC) anointing it as the new Windows. And then the new Linux. And then HTML.

The man likes his hyperboles. And he likes to do a leather-jacket imitation of Steve Jobs. And like great Mr. Jobs, he knows how to take you on tales of fancy, fantasy and future.

“OpenClaw gave us exactly what we needed at exactly the right time,” Huang said. “Just as Linux gave the industry exactly what it needed. Just as Kubernetes. Just as HTML.” Every CEO has the same homework assignment: what’s your OpenClaw strategy?

Boom, and we are off to Claw-a-thon.

This is what Jensen does. He takes something the industry is already doing and reframes it as destiny. Of course, Nvidia is at the center of the frame. I mean how else would he add another trillion dollars to his company’s market cap, especially when the war in the Middle East threatens to bring down the whole AI party? NVIDIA is already the most valuable public company in the world, worth $4.2 trillion

GTC was packed with announcements. Seven new chips. The Vera Rubin platform. A reference storage architecture. An orbital compute module for AI inference in space, because why not? NemoClaw. The Nemotron Coalition with Mistral and Perplexity. Over a hundred robots on the conference floor. Robotaxi deals with BYD, Hyundai, Nissan, and Uber. Technically it was one of the most packed keynotes Jensen has ever delivered.

I am surprised he didn’t announce that he was buying Taiwan. I mean, only mere billionaires buy islands in Hawaii. Real trillionaires buy fabs.

All this talk aside, the event was about one thing and one thing only. The arrival of inference inflection.

Give me a moment to explain why this is important.

AI’s first phase, from Nvidia’s context, was training. You needed enormous GPUs to build the models. Nvidia owned that market at roughly 80% share.

Next comes running those models. And more importantly, agents running those models to create more agents. This turbocharges the whole stack. It has completely different dynamics. A hundred tokens per second is fast when reading a chatbot reply. For AI agents communicating with each other, that’s like watching the old MS-DOS machine boot up.

Think about it this way. Training a model is a capital expense. You do it once, maybe update it periodically. Inference is the operating expense; it runs every time someone uses the model. 

Now introduce agents. A single user asking ChatGPT a question is one inference call. That same user running a NemoClaw agent that reasons in steps, checks external tools, spawns sub-agents to handle pieces of the problem, reviews their own outputs and iterates? That’s potentially hundreds of inference calls. Per session. Running continuously. Without a human in the loop asking each time.

The multiplier isn’t linear. It’s the difference between selling a car and selling gasoline. Training was the car. Inference is the gas. Agentic AI is everyone leaving their engine running 24 hours a day. 

Jensen knows this. The hyperscalers know this. It’s why every serious infrastructure bet right now is pointed at inference capacity, not training. And it’s why his seemingly absurd projection that Nvidia’s data center revenue would be four times by the end of 2027, or in excess of $1 trillion has actual math behind it.  

Inference is why Nvidia paid $20 billion for Groq. Its  chips are now part of the new fangled Vera Rubin platform. They give it the requisite oomph for the agent driven future. 

The architecture is simple. The Rubin GPU handles the compute-heavy prefill phase of inference. The Groq LPU handles decode, the latency-sensitive stage that determines how fast a response actually arrives.

 Together, Nvidia claims 35x more throughput per megawatt, and 500 times the memory bandwidth of the previous Hopper generation. 

In the 1990s, Intel convinced the world that MHz was the universal measure of computing value. The whole industry was organized around that metric. Jensen is doing the same with tokens. Once tokens become the universal unit of AI value, Nvidia, the token factory’s factory, wins by default. Add some more zeros to that market cap.

Nvidia went from $27 billion in annual revenue in 2023 to $216 billion in 2026. That is one of the most extraordinary three-year runs in American business history. And Jensen says it’s not over. He projected $1 trillion in orders for Blackwell and Rubin chips through 2027, double what he said a year ago.

One framework that helps make sense of all this is a surprisingly non-AI company, Apple. If you look at their recent gains in computing, it is because they have figured out and doubled down on the idea that computers work better when hardware and software are sympatico.

Jensen is doing the same but on a different kind of computer. He wants to own everything inside a multi-billion dollar data center. Enterprise buyers   historically resist this kind of single-vendor lock-in when they can avoid it. But  look around. They don’t have many choices. At least for now. Even Amazon, despite all its bluster about its own chips, is going to buy a million chips from Nvidia.

From his chips, to his networking technology (Mellanox) to his CUDA to his new found love for OpenClaw, the man is all about vertical integration. 

NemoClaw is Nvidia’s enterprise wrapper around OpenClaw. Cisco, CrowdStrike, Google, and Microsoft have already signed on. The Nemotron Coalition pulls model builders like Mistral and Perplexity into the Nvidia orbit.

The clever part is that Nvidia has built the underpinnings. For example, it has created Dynamo is an inference operating system routing work across GPUs. The 20-year-old  CUDA is essentially two decades of developer investment which in turn has created a flywheel. 

More developers means more algorithms, more algorithms means wider adoption. The software lock-in becomes  harder to escape than the hardware dependency.

Cleverly Huang’s phrase for this pursuit of  vertical control is “horizontal openness.” Nvidia will integrate with whatever platform you’d like. As long as the check clears. 

The twist in the tale, or should I say, fly in the ointment is that sixty percent of Nvidia’s revenue comes from four hyperscalers and Apple. Those same five (Google, Amazon, Microsoft, Meta, Apple) are building their own AI chips as fast as they can. 

Google has TPUs. Amazon has Trainium. They are Nvidia’s best customers and its most motivated competitors simultaneously. 

The whole-data-center ambition Jensen laid out at GTC requires those hyperscalers to keep buying Nvidia instead of building their own stack. As explained above, the  Groq acquisition and the inference push are partly a response. Nvidia is trying to deliver something on inference that hyperscaler custom silicon can’t yet match. 

For now, he has the right story. And the leather jacket. And very loudly he is saying this $216 billion (in revenues) company has its next trillion figured out.

Additional Reading





Want the latest?

Sign up for Om Malik & Fred Vogelstein's Newsletter below:


Subscribe Here