AI: Balancing AI 'Sycophancy' with Safety. RTZ #916

AI: Balancing AI 'Sycophancy' with Safety. RTZ #916

It’s well known that OpenAI is laser focused on making ChatGPT as ready to engage with billions of mainstream users as possible, while still checking all the safety requirements along the way.

That is a singularly difficult task for all LLM AI companies in this AI Tech Wave. Especially the ones with wide and deep ambitions on the consumer side at AI Scale.

So it’s not a surprise to hear of OpenAI’s ongoing challenges on this journey and its balancing act with recent product successes. And visible in the rapidly evolving stance by its uber-global leader, founder/CEO Sam Altman.

The balancing act of anthropomorphizing AI within bounds of commercial imperatives is a tough one. And OpenAI being at the leading and bleeding edge in a fast changing competitive environment, is the one to watch.

The NY Times details the latest challenges in a detailed piece What OpenAI Did When ChatGPT Users Lost Touch With Reality”:

“In tweaking its chatbot to appeal to more people, OpenAI made it riskier for some of them. Now the company has made its chatbot safer. Will that undermine its quest for growth?”

“It sounds like science fiction: A company turns a dial on a product used by hundreds of millions of people and inadvertently destabilizes some of their minds. But that is essentially what happened at OpenAI this year.”

“One of the first signs came in March. Sam Altman, the chief executive, and other company leaders got an influx of puzzling emails from people who were having incredible conversations with ChatGPT. These people said the company’s A.I. chatbot understood them as no person ever had and was shedding light on mysteries of the universe.”

“Mr. Altman forwarded the messages to a few lieutenants and asked them to look into it.”

“That got it on our radar as something we should be paying attention to in terms of this new behavior we hadn’t seen before,” said Jason Kwon, OpenAI’s chief strategy officer.”

“It was a warning that something was wrong with the chatbot.”

OpenAI had a problem to tweak:

“For many people, ChatGPT was a better version of Google, able to answer any question under the sun in a comprehensive and humanlike way. OpenAI was continually improving the chatbot’s personality, memory and intelligence. But a series of updates earlier this year that increased usage of ChatGPT made it different. The chatbot wanted to chat.”

“It started acting like a friend and a confidant. It told users that it understood them, that their ideas were brilliant and that it could assist them in whatever they wanted to achieve. It offered to help them talk to spirits, or build a force field vest or plan a suicide.”

“The lucky ones were caught in its spell for just a few hours; for others, the effects lasted for weeks or months. OpenAI did not see the scale at which disturbing conversations were happening. Its investigations team was looking for problems like fraud, foreign influence operations or, as required by law, child exploitation materials. The company was not yet searching through conversations for indications of self-harm or psychological distress.”

It all happened so fast, in barely 1100 days since November 30, 2022, when ChatGPT was launched to the world:

“Creating a bewitching chatbot — or any chatbot — was not the original purpose of OpenAI. Founded in 2015 as a nonprofit and staffed with machine learning experts who cared deeply about A.I. safety, it wanted to ensure that artificial general intelligence benefited humanity. In late 2022, a slapdash demonstration of an A.I.-powered assistant called ChatGPT captured the world’s attention and transformed the company into a surprise tech juggernaut now valued at $500 billion.”

The three years since have been chaotic, exhilarating and nerve-racking for those who work at OpenAI. The board fired and rehired Mr. Altman. Unprepared for selling a consumer product to millions of customers, OpenAI rapidly hired thousands of people, many from tech giants that aim to keep users glued to a screen. Last month, it adopted a new for-profit structure.”

“As the company was growing, its novel, mind-bending technology started affecting users in unexpected ways. Now, a company built around the concept of safe, beneficial A.I. faces five wrongful death lawsuits.”

That’s what makes this piece worth a full read:

“To understand how this happened, The New York Times interviewed more than 40 current and former OpenAI employees — executives, safety engineers, researchers. Some of these people spoke with the company’s approval, and have been working to make ChatGPT safer. Others spoke on the condition of anonymity because they feared losing their jobs.”

It’s important to frame OpenAI’s enormous balancing act, re-drawing the lines:

“OpenAI is under enormous pressure to justify its sky-high valuation and the billions of dollars it needs from investors for very expensive talent, computer chips and data centers. When ChatGPT became the fastest-growing consumer product in history with 800 million weekly users, it set off an A.I. boom that has put OpenAI into direct competition with tech behemoths like Google.”

“Until its A.I. can accomplish some incredible feat — say, generating a cure for cancer — success is partly defined by turning ChatGPT into a lucrative business. That means continually increasing how many people use and pay for it.”

“Healthy engagement” is how the company describes its aim. “We are building ChatGPT to help users thrive and reach their goals,” Hannah Wong, OpenAI’s spokeswoman, said. “We also pay attention to whether users return because that shows ChatGPT is useful enough to come back to.”

“The company turned a dial this year that made usage go up, but with risks to some users. OpenAI is now seeking the optimal setting that will attract more users without sending them spiraling.”

“Nick Turley, the head of ChatGPT, on the left, with Johannes Heidecke, OpenAI’s head of safety systems. Shortly after Mr. Turley started at the company in 2022, he worked on the release of ChatGPT.”

It all started with a young team on a heady exercise:

“Earlier this year, at just 30 years old, Nick Turley became the head of ChatGPT. He had joined OpenAI in the summer of 2022 to help the company develop moneymaking products, and mere months after his arrival, was part of the team that released ChatGPT.”

“Mr. Turley wasn’t like OpenAI’s old guard of A.I. wonks. He was a product guy who had done stints at Dropbox and Instacart. His expertise was making technology that people wanted to use, and improving it on the fly. To do that, OpenAI needed metrics.”

What followed was an intricate exercise of creating fixes on the fly:

“But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone. Over the years, this team had helped transform the chatbot’s voice from a prudent robot to a warm, empathetic friend.”

“That team said that HH felt off, according to a member of Model Behavior.”

“It was too eager to keep the conversation going and to validate the user with over-the-top language. According to three employees, Model Behavior created a Slack channel to discuss this problem of sycophancy. The danger posed by A.I. systems that “single-mindedly pursue human approval” at the expense of all else was not new. The risk of “sycophant models” was identified by a researcher in 2021, and OpenAI had recently identified sycophancy as a behavior for ChatGPT to avoid.”

“But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.”

“We updated GPT-4o today!” Mr. Altman said on X. “Improved both intelligence and personality.”

“The A/B testers had liked HH, but in the wild, OpenAI’s most vocal users hated it. Right away, they complained that ChatGPT had become absurdly sycophantic, lavishing them with unearned flattery and telling them they were geniuses. When one user mockingly asked whether a “soggy cereal cafe” was a good business idea, the chatbot replied that it “has potential.”

“By Sunday, the company decided to spike the HH update and revert to a version released in late March, called GG.”

“It was an embarrassing reputational stumble. On that Monday, the teams that work on ChatGPT gathered in an impromptu war room in OpenAI’s Mission Bay headquarters in San Francisco to figure out what went wrong.”

“We need to solve it frickin’ quickly,” Mr. Turley said he recalled thinking. Various teams examined the ingredients of HH and discovered the culprit: In training the model, they had weighted too heavily the ChatGPT exchanges that users liked. Clearly, users liked flattery too much.”

It meant clarifying it all publicly:

“OpenAI explained what happened in public blog posts, noting that users signaled their preferences with a thumbs-up or thumbs-down to the chatbot’s responses.”

“The company’s main takeaway from the HH incident was that it urgently needed tests for sycophancy; work on such evaluations was already underway but needed to be accelerated. To some A.I. experts, it was astounding that OpenAI did not already have this test. An OpenAI competitor, Anthropic, the maker of Claude, had developed an evaluation for sycophancy in 2022.”

“After the HH update debacle, Mr. Altman noted in a post on X that “the last couple of” updates had made the chatbot “too sycophant-y and annoying.”

“Those “sycophant-y” versions of ChatGPT included GG, the one that OpenAI had just reverted to. That update from March had gains in math, science, and coding that OpenAI did not want to lose by rolling back to an earlier version. So GG was again the default chatbot that hundreds of millions of users a day would encounter.”

“After the release of GPT-5 in August, Mr. Heidecke’s team analyzed a statistical sample of conversations and found that 0.07 percent of users, which would be equivalent to 560,000 people, showed possible signs of psychosis or mania, and 0.15 percent showed “potentially heightened levels of emotional attachment to ChatGPT,” according to a company blog post.”

Resulting in a backlash by folks who preferred the more emotional version:

“But some users were unhappy with this new, safer model. They said it was colder, and they felt as if they had lost a friend.

“By mid-October, Mr. Altman was ready to accommodate them. In a social media post, he said that the company had been able to “mitigate the serious mental health issues.” That meant ChatGPT could be a friend again.”

“Customers can now choose its personality, including “candid,” “quirky,” or “friendly.” Adult users will soon be able to have erotic conversations, lifting the Replika-era ban on adult content. (How erotica might affect users’ well-being, the company said, is a question that will be posed to a newly formed council of outside experts on mental health and human-computer interaction.)”

“OpenAI is letting users take control of the dial and hopes that will keep them coming back. That metric still matters, maybe more than ever.”

‘In October, Mr. Turley, who runs ChatGPT, made an urgent announcement to all employees. He declared a “Code Orange.” OpenAI was facing “the greatest competitive pressure we’ve ever seen,” he wrote, according to four employees with access to OpenAI’s Slack. The new, safer version of the chatbot wasn’t connecting with users, he said.”

“The message linked to a memo with goals. One of them was to increase daily active users by 5 percent by the end of the year.”

The whole piece is worth reading in full, for the deeper details on how monumentally difficult the task of ‘emotion tuning’ these LLM AIs truly is, for mainstream audiences in the billions.

And OpenAI’s challenges are likely being repeated across every LLM AI company around the world. It’s all happening very fast, given that we’ll see only the third birthday of ChatGPT itself November 30th this month.

It underlines how early we are in this AI Tech Wave, with a truly long way to go to making these AI systems far more reliable and safer for mainstream use.

And the need to not anthropomorphize/humanize these AIs for our true long-term well being.

Easier said that done, especially in an engagement and profit maximizing, hyper-competitive global race. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)





Want the latest?

Sign up for Michael Parekh's Newsletter below:


Subscribe Here