AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

Michael Parekh AI: Reset to Zero

1 month ago

11 MIN READ

The Bigger Picture, Sunday, April 19, 2026

The Bigger Picture I’d like to unpack today is how memory fragility is a key bumpy road for today’s AI Agents. And my take on how long our best AI companies will take researching and commercializing solutions for it all. Likely done over stages across several years. Taking longer than we’d like. Let’s get started.

In last Sunday’s Bigger Picture “Working out daily with AI and AI Agents”, AI-RTZ #1054, I outlined the ten point takeaways on where we are on practically using AI Agents for every day work.

The main AI I use every day, Anthropic’s Claude Cowork, is likely one of the fastest growing computer applications in recent memory, going from zero to billions in revenue in a handful of months.

That’s one of the best such AI Agent productivity systems for non-coders I use intensively. Running on a generously-loaded Mac Mini — the kind now stretched to 12+ week waits per the WSJ, ahead of a possible M5 Apple Silicon upgrade.

For me, my current daily work with the best of today’s AI apps and systems, underline how similar these AI systems are today to the early PCs. Those nascent days of fragile Microsoft MS-DOS based personal computers in the early 1980s come back immediately.

As one works through the first sessions with Anthropic’s Claude Cowork running on its latest Opus 4.7 version. At the $200/month tier.

I remember those early PC days vividly, having jumped in feet first back then, just like with today’s AI.

In particular, the multi-year cognitive effort humans had to deal with, to work through the limited and fragile memory issues of the early 640k RAM memory limits. Not to mention the reliance on ‘floppy disks’ before the advent of true mechanical hard drives, leading to today’s extraordinary, and magical solid state digital drives. It feels like just yesterday.

From the beginning days of AI-RTZ, I’ve talked about the exquisite reliance of the coming AI systems on the fragile memory of AI.

From AI chatbots to AI reasoning to AI Agents today, on AI memory systems going through substantial innovations. Even in today’s world of supply chain constraints of physical memory I’ve talked about of late. Well, the Fragility of today’s AI Agents, starts with memory.

First a story from one of my favorite movies where memory is a key actor.

Famed director Christopher Nolan’s second movie Memento (2000) follows Leonard Shelby, who has severe anterograde amnesia and cannot retain new memories for more than a few minutes. As he searches for his wife’s killer.

He navigates with Polaroid photos, handwritten notes, and tattoos on his own skin — a prosthetic memory stitched onto his body. The film’s non-linear structure mirrors his condition; scenes “reset” as his memory does. For Nolan fans, his second film set the foundation for later master works like Inception, Tenet and beyond.

But its fundamental plot device, a protagonist with a memory resetting every ten minutes, is what it’s like working with AI Agents today. Even at the highest $200/month tiers of top products like Anthropic Claude Cowork, OpenAI’s ChatGPT, Google Gemini Ultra or Perplexity Computer today.

The AI agent apps built on top of frontier AI models — Anthropic’s Claude Cowork, OpenAI’s ChatGPT Projects, Google’s Gemini workspaces — behave like Leonard Shelby with a bigger stack of Polaroids. Inside a single session, they reason coherently. Between sessions, they are lossy. The only durable memory is the current context window, and once that window fills or closes, earlier details are dropped or summarized into brittle text.

So AI Agent App users today, do what Leonard does in Memento. They externalize continuity. Markdown logs, handoff files, “what we decided yesterday” recap documents — all fed back in at the start of every new session. The user becomes the persistence layer. That’s the fragility I’m pointing at. Not that the models can’t reason. That the apps wrapped around them still rely on the human to carry state from one day to the next.

To give an ‘in the weeds’ example of what it’s like, let me describe a current dreaded part of sustained interaction with Claude Cowork in a given project ‘session’. Every few minutes, Claude Cowork goes into ‘compaction’ cycles. There’s a progress bar that slowly goes to 100%, often taking five minutes or more. Time to get a refill on the coffee.

These are processes Claude Cowork makes compressed notes on the conversation with the user. So it can free up more of the ‘context’ window to continue the interactions with the user. Claude Cowork visibly interrupts the work with users, while the other AIs typically run the compaction in the background. Interrupt the compaction process with a queued query, and the user runs the risk of a Cowork session that often forgets much of the priorities and details of the interaction with the user.

And it’s because it fails to load up earlier relevant ‘memories’ to remind itself of what the main interaction with the user was about. It feels like working with an ‘intern’ who finally knows the daily workflow and process after a substantial amount of time teaching how you like things done. And coming back to the session from a break, and the ‘intern’ that has forgotten much if not most of what it learned about the project. So there’s a disconcerting period the first time, diagnosing WHY the AI is behaving so differently. And then the heavy task of teaching it all the basics all over again.

It leaves users like me pleading in closing prompts of every Cowork session to take diligent notes on everything done and agreed upon in a given session, with precise instructions in various markdown (md) files in specific locations, that ‘future’ versions of Cowork session agent, can ingest quickly, and ‘BECOME’ the capable version of the Cowork session agent I’m working with now. Then hoping that future computer agent intern has the exact memories and capabilities of the current one. That’ll disappear when the current productive Cowork session ends.

These granular issues are some aspects of what AI luminaries like Andrej Karpathy are describing the true state of where AI Agents and systems are today.

Andrej Karpathy — AGI is still a decade away : r/singularity

The processes I described above, are a necessary patchwork with chewing gum and string.

Why today’s fix feels bolted on. What most AI products currently call “memory” is vector-similarity retrieval over past text, plus a lossy summary. Human memory is structured — causal chains, schemas, narratives about why a decision was made. The mismatch is why bigger context windows and project containers still feel like operational amnesia rather than a colleague who recalls last month’s work.

First, a lot of credit where due: Claude Cowork’s project spaces, ChatGPT’s per-user preference memory, and Gemini’s workspace containers are REAL, impressive progress over a year ago. But they remain bolt-ons. There is SO MUCH work to be done to improve them. And make them truly useful, without the extra daily effort. With far less cognitive load on the human user.

The honest read is that no major platform has yet built native, cross-session episodic memory — the kind where the assistant can reliably recall WHY you decided X three weeks ago, not just that you SAID X.

Who’s pushing hardest? The serious work breaks into four clusters. Much of it is in the AI Research stage. Far from being ready for commercial AI rollout.

Frontier labs — OpenAI, Anthropic, Google — are pushing memory architectures alongside longer context windows. Google in particular has published on newer designs (Titans, MIRAS-style architectures) aimed at dynamic long-term recall that goes beyond stuffing everything into a single context.

And there are dedicated memory-layer startups, treating this as the product, not a feature. Mem0 markets itself as “the memory layer for AI agents,” building extraction, consolidation, storage, and retrieval pipelines for semantic, episodic, and procedural memory across sessions. MemChain targets enterprise-grade shared memory infrastructure, so multiple agents and copilots can reason over the same long-horizon state. And other companies emerging.

Here’s my take on how this evolves. It’s at least three phases, roughly five years apart. My current read on timelines and the ‘waiting’:

Phase 1, now through ~2027 — usable but flaky. Better engineering patterns, standardized memory layers, hosted services. A careful user can hand-build a stack — frontier model plus external memory layer plus a Notion-style knowledge base — that feels non-Memento for the specific workflows they invest in. Still not a default, still breaks in surprising ways. Lots of constant daily attention and cognitive load.

Phase 2, roughly 2028–2030 — non-Memento for serious users. Memory research matures into integrated episodic and semantic memory inside serious agent frameworks. A power user can reasonably expect a computer assistant that reliably recalls projects, decisions, and style across months. In a non-humanized way. This is the earliest window where the Memento feel potentially goes away for sophisticated users.

Phase 3, early 2030s — non-Memento as default. In the ‘best-case’ scenario Persistent memory becomes invisible infrastructure, for the mainstream user. The way cloud sync is today. Hard as it is to remember today, that took years to happen. I know, I invested a number of startups for that along the way. Ultimately, it all becomes doable. Cheap enough, safe enough, reliable enough to stop being something the user actively manages.

The nuance worth stating. These timelines are likely on the optimistic side. They are bounded by real-world constraints that don’t show up in the demos: hardware memory shortages (the same ones showing up in Microsoft Surface laptop pricing and every cloud capex conversation), safety questions about what should persist across sessions.

And the unglamorous fact that much of what vendors today call “solved memory” is still glorified text dumping into a vector database.

Then there’s the issue of VERTICAL SILOs. It’s a fork-in-the-road question every AI agent app user will eventually face: would you rather trust a single vendor’s tightly integrated memory stack and accept the lock-in, or wait for an open, portable memory layer you can wire into whichever model you’re using that week? Today’s engineering patterns favor the former. Tomorrow probably has to be the latter.

I would very much prefer the non-silo option as a guinea pig trying to use the products today. At premium tier prices.

The Fragility of AI Agents today is a fact of life in the near-term AI roadmap by OpenAI and others.

And I haven’t even described in this post, the daily reality of the human users actively tracking, monitoring, and CONSTANTLY ADJUSTING the DOZENS of scheduled AI Agent tasks and sessions. ALL demanding sustained cognitive attention from human users.

Truly draining by the end of the day. Since for most people, these are ADDITIONAL tasks on daily things they have to do independent of AI in the first place. For a while in the early years.

Often raising questions daily for a lot of regular folks, if it’s truly worth it to be mucking around with the AI Agents in the first place. Rather than just buckling down and doing them the traditional way.

How much work really lies ahead in building the AI Reasoning and Agent systems of our over-active imaginations?

This issue alone is an indicator how far AGI, or AI Superintelligence, or GOD AI, truly are. A clear, visceral illustration of the hype vs the reality.

And how truly far the user’s AI Agent doing the remembering. They’re a LONG, LONG way from posing dystopian Skynet type existential risks to humanity currently feared by AI Researchers, Regulators, and Regular folks.

But ironically, I’d still maintain, that despite all the above issues, they are STILL useful computing tools today. At least for the early adopters. Just like all those fragile early PCs were in the distant 1980s.

Until then, the Polaroids and tattoos aren’t going anywhere as a metaphor for today’s latest AI Agents. Will have to live with Anthropic’s Claude Cowork’s current memory jitteriness. Until we don’t eventually.

That is the Bigger Picture to truly keep in mind at this earliest of days for the AI Tech Wave. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

Share

Want the latest?

More like this

Sunday links: pausing and reflecting

Adviser links: true wisdom

The Longest Inversion in History Is Over – Chart of the Day (9/4/24)

Let’s be friends!