Skip to content

Resource budgets

Three independent budgets per session, all on SessionContext:

Total fieldUsed fieldUnit
budget_total_tokensbudget_used_tokensLLM tokens
budget_total_api_callsbudget_used_api_callsTool / API invocations
budget_total_cost_centsbudget_used_cost_centsCurrency (integer cents)

When any budget shows used >= total, the next governed event is denied — by agent_budget.rego.

A budget is "set" when the corresponding total_* field is non-null. When all total_* are null, no budget is enforced.

Set a budget

python
from kitelogik import SessionContext

context = SessionContext(
    session_id="sess_001",
    user_role="support_agent",
    session_scopes=["read_customer", "approve_refund"],
    budget_total_tokens=100_000,
    budget_used_tokens=0,
    budget_total_api_calls=200,
    budget_used_api_calls=0,
    # cost cents omitted — no cost budget for this session
)

You can mix and match — only set the budgets you want enforced. The others stay null and short-circuit to allow.

Update counters between gate calls

SessionContext is treated as immutable for the duration of a session, so increment via model_copy(update=...):

python
# After a model interaction reports usage
context = context.model_copy(update={
    "budget_used_tokens":    context.budget_used_tokens + tokens_this_call,
    "budget_used_api_calls": (context.budget_used_api_calls or 0) + 1,
})

# Pass the updated context to the next gate evaluation
result = await toolbox.call("approve_refund", {...})  # uses the new context

If your runtime stores SessionContext once and passes it everywhere (common in framework adapters), update the stored reference after every significant call.

Where the gate fires

Per agent_budget.rego, budget rules deny on two event types:

rego
# Explicit agent.budget events
deny if { input.event_type == "agent.budget"; _token_budget_exhausted }

# Opportunistic enforcement on every tool call
deny if { input.event_type == "tool_call"; _token_budget_exhausted }

The second rule is the important one — you don't have to fire explicit agent.budget events to get enforcement. As long as SessionContext carries up-to-date budget_used_* values, every governed tool_call checks the limits.

Pattern: token budget from an OpenAI response

The cleanest pattern is to keep SessionContext on the same object the gate evaluates against — typically a GovernedToolbox, where the context is evaluated per-call:

python
from openai import AsyncOpenAI
from kitelogik import GovernedToolbox

client  = AsyncOpenAI()
toolbox = GovernedToolbox(gate=gate, context=context)
# ... register tools ...

while True:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        # ... tools, etc. ...
    )

    # Update the budget counters from the usage block
    if response.usage:
        context = context.model_copy(update={
            "budget_used_tokens":
                (context.budget_used_tokens or 0) + response.usage.total_tokens,
            "budget_used_api_calls":
                (context.budget_used_api_calls or 0) + 1,
        })
        toolbox = GovernedToolbox(gate=gate, context=context)   # rebuild with updated context
        # re-register tools on the rebuilt toolbox here

    if response.choices[0].finish_reason != "tool_calls":
        break

    # ... dispatch tool calls through `toolbox.call(...)`, append results ...

When budget_used_tokens >= budget_total_tokens, the next toolbox.call(...) will be denied — agent_budget.rego checks the budget on every tool_call event regardless of which tool is being called.

TIP

Framework adapters (OpenAIAdapter, the LangChain helpers, etc.) take the context at construction time. To apply an updated SessionContext to a running adapter, the supported path is to construct a new adapter and re-register the tools — the adapter's internal context is not part of its public API.

Pattern: cost from a token-based price model

If you know the model's per-token price, derive the cost budget from the token usage:

python
PRICE_INPUT_CENTS_PER_1K  = 0.250
PRICE_OUTPUT_CENTS_PER_1K = 1.000

def call_cost_cents(usage):
    return int(
        usage.prompt_tokens     * PRICE_INPUT_CENTS_PER_1K  / 1000 +
        usage.completion_tokens * PRICE_OUTPUT_CENTS_PER_1K / 1000
    )

context = context.model_copy(update={
    "budget_used_cost_cents":
        (context.budget_used_cost_cents or 0) + call_cost_cents(response.usage),
})

The integer-cents convention avoids floating-point drift over long sessions.

Pattern: per-role budget caps

Combine agent_budget.rego with a custom rule that checks role:

rego
package kitelogik.agent_budget

import future.keywords.if

# Cap guest role at $5 cost regardless of explicit budget
deny if {
    input.context.user_role == "guest"
    input.context.budget_used_cost_cents != null
    input.context.budget_used_cost_cents > 500
}

# Cap untrusted-tier sessions at 50K tokens
deny if {
    input.context.user_role in {"guest", "anonymous"}
    input.context.budget_used_tokens != null
    input.context.budget_used_tokens > 50000
}

Same package, denies merge across files — the stricter rule wins.

Pattern: report budget exhaustion to the user

Catch the deny and surface a friendly message rather than a generic GovernanceError:

python
from kitelogik import GovernanceError

try:
    result = await toolbox.call("approve_refund", {...})
except GovernanceError as exc:
    if "budget" in exc.decision.reason.lower():
        return f"Session budget exhausted ({context.budget_used_tokens} tokens used)."
    raise

The decision.reason from main.rego's deny propagation carries the sub-policy's reason, so a budget deny shows up identifiably.

What is NOT enforced by the gate

  • The used_* counters incrementing themselves. Your runtime owns this. The gate trusts what you put in SessionContext.
  • Cross-agent / org-wide budgets. Budgets in the bundled policy are per session. Aggregating spend across agent sessions for an org-wide cap is your orchestrator's job — pass the aggregated counter into SessionContext per session and let agent_budget.rego enforce.
  • Soft warnings before hard exhaustion. The gate is allow-or-deny. Implement warnings in your runtime by checking the ratio yourself.

Released under the Apache 2.0 License.