AI: OpenAI's Candid New AI Hallucination Research. RTZ #838

Michael Parekh AI: Reset to Zero

3 months ago

7 MIN READ

Warren Buffett’s long-time investment partner Charlie Munger famously said : “Show me the incentive and I’ll show you the outcome“. Turns out it applies to LLM AI models too in this AI Tech Wave.

As I’ve discussed many times in these pages, a perennial problem with LLM AIs, is that they hallucinate. And Scaling them with better LLMs, AI technologies and AI Compute Infrastructure doesn’t reduce them to zero.

The hallucinations are stil high depending on the applications. Any where from a third to a non-trivial single digit percent of the time. A ‘Forever’ problem in this AI Tech Wave for all the LLM AI companies.

OpenAI did some research on the core aspects of the problem and had some notable insights. And as usual with these things, it’s less to do with technologies and more on how they’re built to do what they do.

In a piece titled “Why language models hallucinate”, they lay it out in detail along with a new AI research paper:

“At OpenAI, we’re working hard to make AI systems more useful and reliable. Even as language models become more capable, one challenge remains stubbornly hard to fully solve: hallucinations. By this we mean instances where a model confidently generates an answer that isn’t true. Our new research paper⁠(opens in a new window) argues that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.”

So a matter of incentives:

“ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations especially when reasoning⁠, but they still occur. Hallucinations remain a fundamental challenge for all large language models, but we are working hard to further reduce them.”

A step back for clearer definitions with timely candor:

“Hallucinations are plausible, but false statements generated by language models. They can show up in surprising ways, even for seemingly straightforward questions. For example, when we asked a widely used chatbot for the title of the PhD dissertation by Adam Tauman Kalai (an author of this paper), it confidently produced three different answers—none of them correct. When we asked for his birthday, it gave three different dates, likewise all wrong.”

And some steps towards ‘WHY’?

“Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.”

Then a practical example we’re all familiar with:

Choice Test Stock Illustrations – 51,056 Choice Test Stock Illustrations, Vectors & Clipart - Dreamstime

“Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

“As another example, suppose a language model is asked for someone’s birthday but doesn’t know. If it guesses “September 10,” it has a 1-in-365 chance of being right. Saying “I don’t know” guarantees zero points. Over thousands of test questions, the guessing model ends up looking better on scoreboards than a careful model that admits uncertainty.”

“For questions where there is a single “right answer,” one can consider three categories of responses: accurate responses, errors, and abstentions where the model does not hazard a guess. Abstaining is part of humility, one of OpenAI’s core values⁠. Most scoreboards prioritize and rank models based on accuracy, but errors are worse than abstentions. Our Model Spec⁠(opens in a new window) states that it is better to indicate uncertainty or ask for clarification than provide confident information that may be incorrect.”

The whole piece then goes into some more examples worth reviewing. But the broader point and conclusions are worth understanding.

Illustration of multiple dice with binary code in place of dots

“Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.”

“Conclusions”

“We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:”

“Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates.
Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.”

“Claim: Hallucinations are inevitable.
Finding: They are not, because language models can abstain when uncertain.”

“Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models.
Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.”

“Claim: Hallucinations are a mysterious glitch in modern language models.
Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.”

“Claim: To measure hallucinations, we just need a good hallucination eval.
Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.”

“Our latest models have lower hallucination rates, and we continue to work hard to further decrease the rates of confident errors output by our language models.”

Again, the whole piece is worth a full read, but the key takeaway is that we’re well down a road of incentives to evaluate LLM AIs with methodologies that don’t allow them to say ‘I Don’t Know’ when they don’t know for sure. They’re rewarded for guessing partial answers based on the matrix math generated probabilities, and they go for it because that’s how they’re tested and evaluated.

To repeat, they don’t have any built in incentive to say ‘I don’t know”.

This was a key lesson that was drilled into my training class at Goldman Sachs in 1982-83, when they trained over two dozen of us to go and interact with the world’s top equity investors. To never ‘guess’ when not sure of an answer, say ‘I Don’t Know’ and come back promptly with the researched answer. It was a lesson drilled into me at 22 that I still use daily today.

Our current AI models are not set up to do that.

Put another way, the models are not incentivized to acknowledge their limitations. A movie metaphor might help.

To put it in the words of Clint Eastwood’s ‘Dirty Harry’ character in the movie Magnum Force in 1973: ‘A Man’s Got to Know his Limitations’ (Here’s a spoiler filled, fuller clip of that famous scene).

That seems to apply to LLM AIs too in this AI Tech Wave. Perhaps we need paraphrase Dirty Harry and say ‘A Model’s Got to Know its Limitations.’

It’ useful to see OpenAI research and candidly explain the core problems in this way. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: OpenAI's Candid New AI Hallucination Research. RTZ #838

Share

“Conclusions”

Want the latest?

More like this

AI: The Ups & Downs of New Tech. RTZ #310

Sunday links: a market conundrum

AI: Long road ahead for AI Agents for enterprises and consumers. RTZ #494

Let’s be friends!