Reclaiming LLM context through summarization

Have you ever exclaimed after a few hours engaged in some fruitless activity, "Well, that's X hours of my life I'll never get back"? Or, when in a reflective mood, "I wish I knew this when I was 16!"?

Well, if you're an LLM, this is totally possible.

On one of our projects last year the agent we were building was facing a task involving examining a significant number of documents, only some of which would turn out to be relevant. In our first attempt the agent would select a document based on some initial estimate of probable relevance.

Here is what that looks like:

Since these documents could be big, we quickly were running out of usable space in the LLMs context window.

But some of the documents provided nothing useful at all, others only a small amount of useful info. What if the tokens spent on ingesting the documents could be returned back into the agent’s context, to be used for further work? That is what we did: when we provided the document to the agent, we asked it to give us a summary of the relevant information the agent was able to extract from the document. Then we replaced the previous message with the summary, and then had the agent continue the conversation.

As the lifetime of the agent is measured in tokens, this was like going back in time: the agent proceeded as if it hadn’t spent all those tokens on processing the input.

This allowed us to keep the context relatively clean, while still getting useful information from the related documents.

This approach can be viewed as an alternative to dispatching a separate sub-agent to handle the document. One advantage of doing it via time travel is that there is no need to generate an explanation for a sub-agent. This can help avoid misunderstandings, where either the main agent fails to convey some important detail to the sub-agent, or the sub-agent fails to understand or follow the main agent's instructions.

As another example, when developing using Claude Code or other similar tools, the time travel approach is available to you as well, through the "Rewind" feature. Activated by double press of Escape key in Claude Code, it allows you to rewind the conversation to a previous state. (You can also choose to restore the workspace to an earlier state at the same time, but here we're not doing that.) So, if in the middle of a development session the agent suddenly goes off on a tangent and spends half its context on either a wild goose chase, or on debugging some unrelated blocking issue, you can rewind to a previous state and continue from there. Before rewinding, you can ask the agent to summarize what it has learned, and copy/paste that into a new message to the agent after the rewind, like a message in a bottle sent from the future.

Sometimes time travel can be your friend.

Some related reading:

Zhang et al, 2026 explores recursive exploration of large contexts.

The split/gather feature of DocETL system developed in UC Berkeley uses a similar trick, accumulating rolling summaries of a large array of documents.

If I could turn back time

Comments

More from this blog

The Prompt That Writes Itself

Applied AI Digest: Volume 3

I wrote a profiling tool for agents

Applied AI Digest: Volume 2

Stop designing chatbots

Command Palette

Comments

More from this blog