Why Your AI Forgets Everything You Say (And What Context Has to Do With It)

Ever wondered why your AI assistant forgets what you just said? Let’s talk about context length, the hidden limit behind LLM memory and what makes some models feel like Dory from Finding Nemo.

Why Your AI Forgets Everything You Say (And What Context Has to Do With It)

The other day I was chatting in the SysArmy group when someone dropped a classic complaint:

“My AI assistant forgets everything. I swear I told it the same thing three times already!”

Not really like that, but I like drama.

It made me laugh because… yeah, we’ve all been there. You write out this detailed prompt, hit enter, get a decent response, then follow up with a second question, and the AI stares at you like it has amnesia.

But it’s not being rude. Or lazy. It’s just… forgetful.

And that forgetfulness? It comes down to something called context length.


The Memory of a Goldfish?

You know Dory from Finding Nemo, right?

Adorable. Enthusiastic. Totally unable to remember anything for more than a few seconds.

That’s how a lot of language models feel when their context window is too small. You’ll explain the whole situation, your code, your problem, the structure, the requirements, and just two prompts later, it’s asking you to repeat everything again or is not taking the previous changes or instructions.

That’s because these models aren’t really “aware” of your full conversation history unless it fits in their context window, like a box of short-term memory they carry around. Once that box is full, they start dropping stuff to make room for new info.


So What Is Context Length?

Context length is how many tokens (basically: chunks of words) the model can handle at once.

Think of it like scrolling back in a WhatsApp chat. Some models can scroll up through 300 messages. Others? Maybe 10 before they get lost.

You can think that person in SysArmy was probably using a model with a 4K or 16K token limit, but according to him was using Claude Max, so, maybe the problem is other, but when we are talking about context length, that might sound like a lot, but throw in some JSON, API logs, and code blocks? You’ll hit the limit fast. Once you do, the model stops seeing the full picture.


Not All Models Are Created Equal

Here’s where things get interesting.

Modern models like GPT-4o or Claude 3 Opus are pushing that memory way further.

GPT-4o gives you 128,000 tokens of memory, enough for a few novels. Claude? 200,000 tokens, practically a whole filing cabinet.

Suddenly, your agent can read through an entire user manual, hold a two-hour conversation, analyze your full backend logs, and still remember that joke you told in the first prompt.

It changes the game.


Why It Matters (Beyond Just Being Annoying)

If you’re just asking the weather, a short memory is fine. But if you’re:

  • debugging complex infra
  • reviewing legal contracts
  • doing research with long PDFs
  • or talking to a customer for more than 5 minutes…

…context length makes or breaks the experience.

When someone says “this model is so much smarter,” often what they’re actually feeling is:

“This model remembers what I said.”

And that’s powerful.


So Next Time…

Next time your AI assistant acts like it’s never met you before, don’t get mad. Just ask:

How big is your brain, buddy?

If it’s running on a model with a tiny context window, maybe it’s not forgetful, just overwhelmed.

And maybe it’s time to give Dory a break and upgrade to something with a little more memory.


Got stories of your AI forgetting your life story mid-chat? Drop them in the comments. Let’s trade notes.

By the way. Join to SysArmy in Discord through the following link:

Join the sysarmy Discord Server!
Comunidad de sistemas que hace +10 años nuclea a profesionales del área para favorecer el contacto y el intercambio | 9460 members