AI: Beyond LLM AIs to 'World Models'. RTZ #909

AI: Beyond LLM AIs to 'World Models'. RTZ #909

Regular readers here know I’ve been talking for a while now about the current AI Tech Wave branching out far beyond just Scaling Large Language Models (LLMs) and chatbots.

Not just in their Small Language Model (SLM) incarnations, but in forms around other modalities like video, the physical world (AI robots here and in China, self-driving cars, and beyond), and even fundamentally different AI approaches beyond the current AI Research in vogue. Even on the core goal of AGI, AISuperintelligence and beyond. That’s sucking up hundreds of billions in investment capital across the board.

This trend is now being discussed more widely as ‘World Models’, and it’s worth absorbing in its coming shapes and scope.

Axios discusses this wider ‘World AI’ trend in “AI’s next act: World models that move beyond language”:

“Move over large language models — the new frontier in AI is world models that can understand and simulate reality.”

“Why it matters: Such models are key to creating useful AI for everything from robotics to video games.”

  • “For all the book smarts of LLMs, they currently have little sense for how the real world works.”

And there are a lot of new ideas getting funded, now and currently in the works:

“Driving the news: Some of the biggest names in AI are working on world models, including Fei-Fei Li whose World Labs announced Marble, its first commercial release.”

  • “Machine learning veteran Yann LeCun reportedly plans to launch a world model startup when he leaves Meta in the coming months.”

  • Google and Meta are also developing world models, both for robotics and to make their video models more realistic.”

  • “Meanwhile, OpenAI has posited that building better video models could also be a pathway toward a world model.”

“As with the broader AI race, it’s also a global battle.”

  • “Chinese tech companies, including Tencent, are developing world models that include an understanding of both physics and three-dimensional data.”

  • “Last week, the United Arab Emirates-based Mohamed bin Zayed University of Artificial Intelligence, a growing player in AI, announced PAN, its first world model.”

“What they’re saying: “I’ve been not making friends in various corners of Silicon Valley, including at Meta, saying that within three to five years, this [world models, not LLMs] will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” LeCun said last month at a symposium at the Massachusetts Institute of Technology, as noted in a Wall Street Journal profile.”

These new models will require Data in entirely different forms, a trend I’ve written about in the context of ‘Synthetic Data’ and ‘Synthetic Content’.

This is an exercise far different than the Data that fed the LLM AIs over the past few years this AI Tech Wave.

“How they work: World models learn by watching video or digesting simulation data and other spatial inputs, building internal representations of objects, scenes and physical dynamics.”

  • “Instead of predicting the next word, as a language model does, they predict what will happen next in the world, modeling how things move, collide, fall, interact and persist over time.”

  • “The goal is to create models that understand concepts like gravity, occlusion, object permanence and cause-and-effect without having been explicitly programmed on those topics.”

In particular, I’ve discussed how AI Researchers like Jim Fan are leading work in this area of synthetic data for robots, at Nvidia as a case in point.

“Context: There’s a similar but related concept called a “digital twin“ where companies create a digital version of a specific place or environment, often with a flow of real-time data for sensors allowing for remote monitoring or maintenance predictions.”

And that Data for World Models is the new, Scarce input. Not easily collected and usable at Scale today.

Between the lines: Data is one of the key challenges. Those building large language models have been able to get most of what they need by scraping the breadth of the internet.”

  • “World models also need a massive amount of information, but from data that’s not consolidated or as readily available.”

  • “One of the biggest hurdles to developing world models has been the fact that they require high-quality multimodal data at massive scale in order to capture how agents perceive and interact with physical environments,” Encord president and co-founder Ulrik Stig Hansen said in an email interview.”

  • “Encord offers one of the largest open source datasets for world models, with 1 billion data pairs across images, videos, text, audio and 3D point clouds as well as a million human annotations assembled over months.”

  • “But even that is just a baseline, Hansen said. “Production systems will likely need significantly more.”

And then there’s the issue of figuring out the right ‘product-market-fit’ for these World AI Models once the Data has been figured out. So a long way to go before these investments can return these massive investments on capital.

“What we’re watching: While world models are clearly needed for a variety of uses, whether they can advance as rapidly as language models remains uncertain.”

  • “Though clearly they’re benefiting from a fresh wave of interest and investment.”

I highlight all this to underline how open-ended this AI Tech Wave is at this point vs prior tech waves like PCs, the Internet, and Mobile in particular. These ‘World Models’ for AI’s next iterations are pretty open ended pathways. With years of work and investments to go.

Measured more likely in half and full decades, rather than just around the corner. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)





Want the latest?

Sign up for Michael Parekh's Newsletter below:


Subscribe Here