Skip to main content

Command Palette

Search for a command to run...

I wrote a profiling tool for agents

Published
2 min read

While building an agent for a client, we hit a scaling problem: a coordinator agent had 4-5 subagents each with 15-20 tool calls. Each tool result got appended to the context, then what started as a reasonable prompt frequently ballooned to hundreds of thousands of tokens by the end of a run. Due to this the large context window was degrading our performance and our eval runtimes.

Figuring out what parts of our system could be pared down was not obvious from looking at the logs. We could tell the output was from each tool call but we couldn't tell how many tokens it was taking up in the agent.

So inspired by tools like cProfile I built a profiler which would allow us to tell what parts of the agent system took up lots of tokens. For example on the form updater:

You can see that right now for this agent

Input dominates token usage here, making up roughly two-thirds of the total cost. The breakdown is less obvious than it looks because a large portion of the input tokens are cache hits, which are billed at a much lower rate:

Token Type Tokens Rate (per million) Cost
Output 17,000 $15.00 ~$0.25
Input (cached) 230,000 $0.25 ~$0.06
Input (uncached) 160,000 $2.50 ~$0.40
Total ~$0.71

Looking at the input breakdown, user messages account for 78.4% of input tokens, with get_form_outline a distant second at 8.5%. If we want to reduce token usage, we need to find a way to give the agent smaller prompts

Here are the results a second agent we fixed using the profiler

The get fields tool consumed almost all of the tokens from this agent. To fix that we updated the fields tool to include the ability to filter by name or description – so instead of dumping every field into the context, the agent could request only the ones it needed.

async def get_fields(
    client: RiskCloudClient,
    scopes: list[Literal["global"] | RiskCloudId],
    string_to_match: str | None = None,
) -> dict[str, Any]:

Both of these examples show off very different fixes for the same problem. A good reason to use a profiler!