
AI: OpenAI's new 'ChatGPT Agent' goes to work. RTZ #785
OpenAI today crossed over the AI Tech Wave threshold into AI Agents with ‘ChatGPT Agent’, bridging as it were ‘research and actrion.’ These are relatively full featured push into agentic capabilities that just might tickle the fancy of more mainstream users. Yes, the same ones using OpenAI ChatGPT in the hundreds of millions a week today. Remember, this is ‘Level 3’ on OpenAI’s roadmap to AGI, and a critical step towards AI Superintelligence, however that might be defined in time.
And it of course incorporates gobs of AI Reasoning capabilities, which are of course ‘Level 2’ on the roadmap above after chatbots.
It’s important to understand these capabilities for now because of course it’s OpenAI, and will be watched closely if not leap-frogged by competitors both here and over there in China.
The Verge lays out today’s announcement well in “OpenAI’s new ChatGPT Agent can control an entire computer and do tasks for you”:
“OpenAI is going all-in on the most-hyped trend in AI right now: AI agents, or tools that go a step beyond chatbots to complete complex, multi-step tasks on a user’s behalf. The company on Thursday debuted ChatGPT Agent, which it bills as a tool that can complete work on your behalf using its own “virtual computer.”
“In a briefing and demo with The Verge, Yash Kumar and Isa Fulford — product lead and research lead on ChatGPT Agent, respectively — said it’s powered by a new model that OpenAI developed specifically for the product. The company said the new tool can perform tasks like looking at a user’s calendar to brief them on upcoming client meetings, planning and purchasing ingredients to make a family breakfast, and creating a slide deck based on its analysis of competing companies.”
The product fuses capabilities with ‘productivity apps like excel spreadsheets, presentations, and word documents. In addition to using several types of ‘virtual browsers’ to simulate accessing the web like humans would do to complete various tasks. Taking browsing in very different areas.
“The model behind ChatGPT Agent, which has no specific name, was trained on complex tasks that require multiple tools — like a text browser, visual browser, and terminal where users can import their own data — via reinforcement learning, the same technique used for all of OpenAI’s reasoning models. OpenAI said that ChatGPT Agent combines the capabilities of both Operator and Deep Research, two of its existing AI tools.”
“To develop the new tool, the company combined the teams behind both Operator and Deep Research into one unified team. Kumar and Fulford told The Verge that the new team is made up of between 20 and 35 people across product and research.”
The product to date feels like bleeding edge AI research capabilities shoehorned into ‘products’ barely out of the oven:
“In the demo, Kumar and Fulford demonstrated potential use cases for ChatGPT Agent, like asking it to plan a date night by connecting to Google Calendar to see when the user has a free evening, and then cross-referencing OpenTable to find openings at certain types of restaurants. They also showed how a user could interrupt the process by adding, say, another restaurant category to search for. Another demonstration showed how ChatGPT Agent could generate a research report on the rise of Labubus versus Beanie Babies.”
Of course shopping with payment capabilities is always a popular demo, and a pathway for future monetization models that could rival incumbents like Google, Meta, Amazon and more.
“Fulford said she enjoyed using it for online shopping because the combination of tech behind Deep Research and Operator worked better and was more thorough than trying the process solely using Operator. And Kumar said he had begun using ChatGPT Agent to automate small parts of his life, like requesting new office parking at OpenAI every Thursday instead of showing up Monday having forgotten to request it with nowhere to park.”
“Kumar said that since ChatGPT Agent has access to “an entire computer” instead of just a browser, they’ve “enhanced the toolset quite a bit.”
Because it incorpoates AI reasoning, the product can feel slow at times, while its chewing up precious, variable cost AI compute, to come up with the best ways to execute the tasks via AI Agents.
“According to the demo, though, the tool can be a bit slow. When asked about latency, Kumar said their team is more focused on “optimizing for hard tasks” and that users aren’t meant to sit and watch ChatGPT Agent work.”
And given that there are of course risks attendant with AI working on your behalf, there are ‘interruption’ steps built in for humans to take over as needed:
“Before ChatGPT Agent does anything “irreversible,” like sending an email or making a booking, it asks for permission first, Fulford said.”
“Since the model behind the tool has increased capabilities, OpenAI said it has activated the safeguards it created for “high biological and chemical capabilities,” even though the company said it does not have “direct evidence that the model could meaningfully help a novice create severe biological or chemical harm” in the form of weapons. Anthropic in May activated similar safeguards for its launch of one of its Claude models, Opus 4.”
This area will likely be biggest new surface area for malicious actors to take advantage of these new systems, and OpenAI is very much aware of that new threat area:
“When asked about whether the tool is permitted to perform financial transactions, Kumar said those actions have been restricted “for now,” and that there’s an additional protection called Watch Mode, wherein if a user navigates to a certain category of webpages, like financial sites, they must not navigate away from the tab ChatGPT Agent is operating in or the tool will stop working.”
The rollout is fairly aggressive across various pricing tiers:
“OpenAI will start rolling out the tool today to Pro, Plus, and Team users — pick “agent mode” in the tools menu or type “/agent” to access it — and the company said it will make it available to ChatGPT Enterprise and Education users later this summer. There’s no rollout timeline yet for the European Economic Area and Switzerland.”
Agents are of course a long-sought goal for the industry, spurred on by scifi examples galore:
“The concept of AI agents has been a buzzworthy trend in the industry for years. The ideal developers are working toward is something like Iron Man’s J.A.R.V.I.S., a tool that can perform specific job functions, check people’s calendars for the best time to schedule an event, purchase a gift based on a friend’s preferences, and more, but at the moment, they’re somewhat limited to assisting with coding and compiling research reports.”
“The term “AI agent” became more common to investors and tech executives in 2023 and quickly picked up speed, especially after fintech company Klarna announced in February 2024 that in just one month of operation, its own AI agent had handled two-thirds of its customer service chats — the equivalent of 700 full-time human workers.”
“From there, executives at Amazon, Meta, Google, and more started mentioning their AI agent goals on earnings call after earnings call. And since then, AI companies have been strategically hiring to reach those goals: Google, for instance, last week hired Windsurf’s CEO, co-founder and some R&D team members to help further its agentic AI projects.”
For OpenAI, the latest release here is a fast iteration on prior work, and signals a host of other AI applications to come:
“OpenAI’s debut of ChatGPT Agent follows its January release of Operator, which the company billed as “an agent that can go to the web to perform tasks for you” since it was trained to be able to handle the internet’s buttons, text fields and more. It’s also part of a larger trend in AI, as companies large and small chase AI agents that will capture the attention of consumers and ideally become habits.”
The competition of course is not far behind:
“Last October, Anthropic, the Amazon-backed AI startup behind Claude, released a similar tool called “Computer Use,” which it billed as a tool that could use a computer the same way a human can in order to complete tasks on a user’s behalf. Multiple AI companies, including OpenAI, Google and Perplexity, also offer an AI tool that all three have dubbed Deep Research, denoting an AI agent that can write sizable analyses and research reports on anything a user wants.”
There are other takes on today’s rollout also worth perusing here and here. This also presages OpenAI’s entry into the AI Browser arena, joining Google, Perplexity and others in the field.
This latest OpenAI ChatGPT Agent is barely the beginning of this area to change the web as we know it in this AI Tech Wave. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)