Blog · AI & platform

Your AI Agent is Burning Money.

Here's Why — and the Fix.

by Shubhanshu Singh K · exemplar.dev

We've been building on Google's Agent Development Kit (ADK), and we ran into a problem most developers don't notice until it's too late: token bloat.

The "Mega-Prompt" Trap

When you first build an AI agent, the natural instinct is to cram everything into the system prompt — persona, rules, procedures, tool usage, edge cases. It works. Until it doesn't.

Every time a user sends a message, your agent sends all of those instructions back to the model. Even if the user just typed "What's the status of my deployment?"

If your agent has 10 capabilities at ~1,000 tokens each:

  • 10,000+ tokens per call
  • Over a 20-turn conversation: 200,000+ tokens consumed
  • At scale, across thousands of users: you've lit your budget on fire n
Why your agent is burning tokens: monolithic full system prompt (~10,500 tokens per call) vs ADK Skills progressive disclosure (L1 metadata scan, skill selection, ~7,000 tokens); 60% saved over 10 skills and 20 turns.
Why Your Agent Is Burning Tokens — ADK Skills: Progressive Disclosure Architecture

What Google ADK Skills Actually Do

ADK Skills solve this with progressive disclosure — a three-tier architecture that loads context only when needed. Think of it like a restaurant menu vs. reading out every recipe in full before every order.

The three tiers:

  • L1 — Metadata (~100 tokens/skill): Just the skill name + description. Always loaded. Acts as a menu the agent scans.
  • L2 — Instructions (~5,000 tokens): The full how-to. Only fetched when the agent decides a skill is relevant.
  • L3 — Resources (as needed): External docs, style guides, API specs — pulled only when L2 references them.

ADK auto-generates three tools: list_skills, load_skill, and load_skill_resource.

ADK Skills three-tier progressive disclosure: L1 metadata always in context (~100 tokens), L2 instructions loaded on demand (~5,000 tokens), L3 resources pulled only when referenced by L2.
ADK Skills — Three-Tier Progressive Disclosure

A Real Example: The On-Call SRE Agent

Say you're building an SRE on-call agent with 10 capabilities: PagerDuty alert triage, runbook execution, Slack incident channels, log analysis, escalation protocols, post-incident reports, Kubernetes health checks, database diagnostics, cost anomaly alerts, and on-call handoff summaries.

With a monolithic system prompt: Every query carries ~10,000 tokens — including when someone just asks "who's on-call right now?"

With ADK Skills: The agent starts with ~1,000 tokens of L1 metadata. User asks about a PagerDuty alert → agent scans the menu, identifies alert-triage, calls load_skill, fetches 5,000 tokens of instructions. Total: ~7,000 tokens. Not 10,500.

Skill file structure (SKILL.md):

---
name: alert-triage
description: Triages PagerDuty alerts, assesses severity,
and suggests initial remediation steps.
---
## When to use this skill
Use when the user mentions a PagerDuty alert or incident ID.
## Steps
1. Parse alert payload: service name, severity, trigger condition
2. Check runbook index for a matching procedure
3. If P1/P2: suggest creating a Slack incident channel
4. If P3/P4: log and suggest async review
## Resources
- [runbook-index](./runbooks/index.md)
- [escalation-matrix](./escalation.md)
Why your agent is burning tokens: monolithic prompt (~10,500 tokens per call) vs ADK Skills progressive disclosure (~7,000 tokens); 60% saved over 10 skills and 20 turns.
Why Your Agent Is Burning Tokens — ADK Skills: Progressive Disclosure Architecture

The Numbers (Brutally Honest)

Skills always cost +1 LLM call per request. Here's the full trade-off:

Agent SizeMonolithic
(1 LLM call)
ADK Skills
(2 LLM calls)
Token Savings
3 skills3,500 tokens6,300 tokensSkills LOSE −80%
5 skills5,500 tokens6,500 tokensNear break-even
10 skills10,500 tokens7,000 tokensSkills SAVE 33%
20 skills20,500 tokens8,000 tokensSkills SAVE 61%

Multi-turn is where Skills dominate. At 20 skills over a 20-turn conversation:

  • Monolithic: 20,500 × 20 turns = 410,000 tokens
  • Skills: ~8,000 × 20 turns + one-time loads = ~165,000 tokens

That's a ~60% sustained reduction across the full session. The extra round-trip becomes a rounding error.

The Mental Model That Sticks

System prompt = whiteboard that's always visible. Every participant in every conversation sees everything, every time.

Skills = a filing cabinet with a well-organized index. The agent knows what's in there, pulls only what it needs, and does the work.

At small scale, the whiteboard is fine. At production scale — dozens of capabilities, thousands of conversations, multi-turn sessions — the filing cabinet wins every time.

ADK Skills architecture: monolithic whiteboard loads 10,000+ tokens every call vs filing cabinet with L1 index and L2 fetched on demand.
ADK Skills Architecture — whiteboard vs. filing cabinet

When NOT to Use Skills

Use a plain system prompt when:

  • Your agent has ≤ 4 capabilities — the 2-call overhead isn't worth it
  • You're building latency-critical, single-turn workflows
  • Your instructions are deeply interdependent and can't be cleanly isolated

Building agentic infrastructure at exemplar.dev. If you're working on developer tooling, on-call automation, or AI-native platforms — let's connect.

#AI #AgentDevelopmentKit #ADK #LLM #DevTools #GoogleAI #AIEngineering #BuildInPublic

Editorial—general discussion only.