AI: The need to 'under-promise and over-deliver'. RTZ #880

AI: The need to 'under-promise and over-deliver'. RTZ #880

OpenAI in its frenetic pace of AI deals and achievements, stubbed their toe publicly on some claims on GPT-5 achievements that in hindsight were overstated.

This happens at this acceleration stage of the AI Tech Wave, as the leading LLM AI companies, and their key executives are under unprecedented pressures to show constantly leap-frogging achievements for their ever improving AI models, applications and their AI Scaling. A key takeaway here is that while this happens in every ferocious tech wave, it’s generally a better practice to under-promise and over-deliver.

This is a lesson that was drilled into me at my training class at Goldman Sachs in 1982, and is something that is baked into the DNA of most professionals at the Firm. Besides being ‘long-term greedy’ of course.

Coming back to the tech and fast moving AI industry, the episode involves a modest overstatement by OpenAI executives that seems unintentional.

The Decoder explains it well in Leading OpenAI researcher announced a GPT-5 math breakthrough that never happened”:

Summary”

  • “OpenAI researchers claimed or suggested that GPT-5 had solved unsolved math problems, but in reality, the model only found known results that were unfamiliar to the operator of erdosproblems.com.”

  • “Mathematician Thomas Bloom and Deepmind CEO Demis Hassabis criticized the announcement as misleading, leading the researchers to retract or amend their original claims.”

  • “According to mathematician Terence Tao, AI models like GPT-5 are currently most helpful for speeding up basic research tasks such as literature review, rather than independently solving complex mathematical problems.”

The detailed narrative provides more useful color and nuance:

OpenAI researchers recently claimed a major math breakthrough on X, but quickly walked it back after criticism from the community, including Deepmind CEO Demis Hassabis, who called out the sloppy communication.”

“It started with a now-deleted tweet from OpenAI manager Kevin Weil, who wrote that GPT-5 had “found solutions to 10 (!) previously unsolved Erdős problems” and made progress on eleven more. He described these problems as “open for decades.” Other OpenAI researchers echoed the claim.”

“The wording made it sound like GPT-5 had independently produced mathematical proofs for tough number theory questions – a potential scientific breakthrough and a sign that generative AI could uncover unknown solutions, showing its ability to drive novel research and open the door to major advances.”

Weil claimed GPT-5 solved classic Erdős problems, but the claim quickly unraveled. | Image: via Stefan Schubert

“Mathematician Thomas Bloom, who runs erdosproblems.com, pushed back right away. He called the statements “a dramatic misinterpretation,” clarifying that “open” on his site just means he personally doesn’t know the solution – not that the problem is actually unsolved. GPT-5 had only surfaced existing research that Bloom had missed.”

“Deepmind-CEO Demis Hassabis called the episode “embarrassing”, and Meta AI chief Yann LeCun pointed out that OpenAI had basically bought into its own hype (”Hoisted by their own GPTards”).”

The feedback by leading AI Experts while scathing, is exacting in its real-time ‘peer-review’ contributions. And the OpenAI reactions were appropriate and done with alacrity.

“The original tweets were mostly deleted, and the researchers admitted their mistake. Still, the incident adds to the perception that OpenAI is an organization under pressure and careless in its approach. It raises questions about why leading AI researchers would share such dramatic claims without verifying the facts, especially in a field already awash in hype, with billions at stake. Bubeck knew what GPT-5 actually contributed, but still used the ambiguous phrase “found solutions.”

Ironically, there was a “glass half full aspect” of the OpenAI capability:

“The real story here is getting overshadowed: GPT-5 actually proved useful as a research tool for tracking down relevant academic papers. This is especially valuable for problems where the literature is scattered or the terminology isn’t consistent.”

Mathematician Terence Tao sees this as the most immediate potential for AI in math—not solving the toughest open problems, but speeding up tedious tasks like literature searches. While there have been some “isolated examples of progress” on difficult questions, Tao says AI is most valuable as a time-saving assistant. He has also said that generative AI could help “industrialize” mathematics and accelerate progress in the field. Still, human expertise is crucial for reviewing, classifying, and safely integrating AI-generated results into real research.”

The overall takeaway from this modest AI Research kerfuffle is again the ongoing need to under-promise whiule over delivering. This could provide above average dividends in AI Research especially, where the underlying technologies are scaling at generally dramatic, and above-average rates. Especially while the leading LLM AI companies scale their AI Compute.

While we can expect more of these types of incidents in this AI Tech Wave, the general direction of these capabilties is still more up and to the right than not. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)





Want the latest?

Sign up for Michael Parekh's Newsletter below:


Subscribe Here