Improving agent quality to optimize AI usage

Introduction

When agents are well-scoped, well-instructed, and operating within clear guardrails, token efficiency improves as a natural outcome. High-quality agents complete tasks in fewer attempts, follow clearer workflows with less rework, and avoid expensive debugging and correction cycles.

Follow the strategies outlined in this article to improve both agent quality and AI credits efficiency.

1. Choose the right model for the right task

Model choice is one of the fastest ways to improve both agent quality and cost efficiency, but it is often overlooked. A common pattern is to default to the most capable model for every task—but this often increases token usage without improving the outcome. In some execution-heavy scenarios, overusing reasoning models can reduce quality because the model may overthink the task or introduce unnecessary changes.

Choose the model based on the work involved:

Reasoning models: Best for architecture decisions, complex debugging, system design, and tasks that require deeper analysis.
Mid-tier models: Best when the plan is already clear and the agent needs to execute efficiently.
Lighter models: Best for refactoring, formatting, documentation updates, and other routine, well-scoped changes.

Use as much capability as the task requires, and as little as necessary. Matching capability to task improves outcomes and directly controls costs at scale.

For a breakdown by model and task type, see Comparing AI models using different tasks.

Configure the reasoning level of the model

Some models also support configurable reasoning levels, which control how much the model reasons before it responds. A higher level can improve answers to complex problems, but it consumes more tokens, and therefore more credits, so you should use the regular level by default and raise it only for harder tasks. Configurable reasoning is available for Visual Studio Code and Copilot CLI for supported models.

See Supported AI models in GitHub Copilot.

Use Copilot auto model selection

Copilot auto model selection chooses a capable model for you, based on the intent of your task.

See About Copilot auto model selection.

2. Provide clear guidance in your prompts

Your prompt sets the direction for everything the agent does. When a prompt is vague, the agent has to infer intent, explore more context, and make judgment calls. That often leads to retries, scope drift, and unnecessary token usage.

Well-structured prompts have three qualities:

A clear task definition. Instead of "fix this issue," explain what the issue is, where it occurs, and what the expected outcome looks like.
Relevant context provided upfront. If you already know which files, services, logs, errors, or inputs matter, include them. This helps the agent avoid unnecessary exploration.
A clear stopping condition. Tell the agent what "done" looks like. Without a stopping point, agents can continue beyond the goal by adding extra commits, refactoring unrelated code, or expanding scope.

This added guidance doesn't meaningfully increase token usage, but it can significantly reduce the number of agent runs needed to reach the right outcome.

For prompt engineering best practices, see Prompt engineering for GitHub Copilot Chat.

3. Keep your context lean

Copilot sends the context it has access to as input tokens, and that context adds up: open editor tabs, attached files, and the full back-and-forth of a long conversation all count as context.

To keep context under control, consider doing the following:

Start a new conversation when you switch problems

A long thread carries its entire history into every new request. When you move on to an unrelated task, start a new conversation. For example:

In Copilot CLI use /new (or /clear)
In Copilot Chat, start a new chat session.

Compact long Copilot CLI sessions that you want to continue

When you need the thread to keep going but it has grown large, run /compact in Copilot CLI to summarize the history and shrink the context window, optionally focusing the summary (for example, /compact focus on the auth module).

In addition, you can use /context to check current usage at any time.

See Managing context in GitHub Copilot CLI.

Give Copilot a map of your project

A well-maintained custom instructions file, such as an AGENTS.md or .github/copilot-instructions.md file, gives agents a structural overview of your repository so they don't have to read large numbers of files just to orient themselves. See Support for different types of custom instructions.

Bring in only the tools you need

Large tool sets (for example, a full MCP server's worth of tools) add to the context on every request. Where it fits your workflow, enable only the toolsets relevant to the task.

See Configuring toolsets for the GitHub MCP Server.

Take advantage of context caching

Copilot reuses context you've already sent through caching, which lowers the cost of follow-up turns. However, cached context expires after a period of inactivity and isn't reused when you switch models mid-session. In both cases, the context is re-sent and billed again as fresh input tokens. To get the most from caching, keep related work in one continuous session and avoid switching models partway through.

4. Reduce repeated errors with a `copilot-instructions.md` file

Persistent instructions improve consistency across agent interactions, but their value depends entirely on how they are written. A copilot-instructions.md file at the repository level is the most direct way to encode this guidance. Personal and organization-level instructions can layer on top for broader consistency.

The best instructions are short, specific, and grounded in real observed agent behavior—not generic best practices that sound good but don't apply to your system.

What to include:

Required frameworks, libraries, or design patterns
Known pitfalls the agent tends to repeat
Output expectations such as "be concise" or "only return code"
Team-specific conventions the agent must follow
Build, test, and lint commands

What to avoid:

Long, generic documentation
AI-generated guidance that doesn't reflect your actual system
One-off preferences or rarely used details
Overloaded instructions that make the context noisy

Keep instructions updated as your codebase, architecture, standards, and workflows evolve. Because these instructions are included in the agent's context on every run, even small improvements can reduce repeated errors and lower wasted token usage over time.

For more information, see Adding repository custom instructions for GitHub Copilot.

5. Research, plan, then implement

One of the biggest shifts in working effectively with agents is moving away from doing everything in a single session. When research, planning, and implementation all happen together, context grows quickly, irrelevant information accumulates, and agent quality degrades over time.

Break work into clear phases:

Research: Use the agent to explore the codebase, identify relevant files, and understand dependencies.
Plan: Create a detailed, structured plan or specification before making changes. This is where reasoning models are most valuable.
- In Copilot CLI, use /plan.
- In Copilot Chat in Visual Studio Code, select "Plan" from the agent dropdown, or type plan in the context window.
Implement: Execute against the plan using focused context and a model suited for execution.

Starting a new session between phases prevents carrying unnecessary context forward. Carrying forward context from earlier phases can increase token usage, introduce bias, and reduce clarity for the agent. Each phase should operate with only what it needs. For guidance on scoping sessions effectively, see Best practices for using GitHub Copilot to work on tasks.

6. Add deterministic guardrails

Agents are non-deterministic and won't be correct every time, especially in multi-step workflows. Without guardrails, small errors can compound quickly: agents build on incorrect outputs, drift further from the goal, and make debugging more expensive and time-consuming.

Deterministic controls introduce clear pass/fail signals:

Unit tests verify the agent's changes produced the expected behavior.
Linters enforce structure and consistency, preventing formatting issues, style drift, and avoidable cleanup work.
Security scans catch risky patterns early, before they are harder to unwind.

Together, these controls create a tight feedback loop: the agent makes a change, a test, rule, or scan evaluates it, and the agent adjusts before moving forward. This prevents long chains of incorrect changes, which are one of the biggest drivers of token waste.

Teams that invest in these guardrails see fewer retries, faster task completion, and more predictable agent behavior. They often reduce total token consumption even if individual steps use slightly more tokens upfront.

Next steps

In addition to improving agent efficiency, you can also monitor and manage your spending to get the most out of your AI credits:

Use your dashboard and budget controls. The "AI usage" page, under https://github.com/settings/billing, breaks down consumption across every feature and model, so you can see where your credits are actually going and adjust accordingly.
Identify expensive patterns before they add up. Within a Copilot CLI session, use /usage to see session-level metrics and to spot expensive patterns as you work. In addition, /chronicle tips analyzes your recent session history and surfaces opportunities to use Copilot more efficiently.
Upgrade for a larger allowance. If you regularly approach your monthly limit, a higher plan may be more economical than paying for additional usage, as higher plans have more AI credit allowance. See About individual GitHub Copilot plans and benefits and Viewing and changing your GitHub Copilot plan.

Improving agent quality to optimize AI usage

In this article