OpenTelemetry

The gate, the adapter base, and the agent session loop all emit OpenTelemetry spans. Every span carries kitelogik.* semantic attributes describing the policy decision, the risk tier, the session, and (where relevant) the delegation lineage — so a trace view shows the full why of any allow / deny / HITL outcome, not just timing.

Spans the runtime emits

Span name	Where	Carries
`kitelogik.adapter.tool_call`	every adapter (`OpenAIAdapter`, `LangChainAdapter`, …)	`kitelogik.tool.name`, `kitelogik.tool.action`, `kitelogik.session_id`, `kitelogik.allowed`
Gate evaluation spans	`PolicyGate.evaluate_tool_call` / `evaluate_event`	`kitelogik.tool_name`, `kitelogik.action`, `kitelogik.session_id`, `kitelogik.user_role`, `kitelogik.policy.allow`, `kitelogik.policy.deny`, `kitelogik.policy.risk_tier`, `kitelogik.policy.requires_hitl`, `kitelogik.policy.reason`, `kitelogik.event_type`
Sanitiser span	tool-output scrubbing	`kitelogik.sanitizer.was_modified`
Agent session spans	`agents/session.py`	`kitelogik.session_id`, `kitelogik.user_role`, `kitelogik.token_id`, `kitelogik.delegation_depth`, `kitelogik.iteration`

Spans nest naturally: the adapter span wraps the gate span, which wraps any downstream sanitiser span. A multi-agent run produces a tree where the parent session's span contains the delegated children's spans, joined by trace-id propagation through delegation_depth / parent_token_id.

Setup

python

from kitelogik.observability.tracer import setup_tracer

setup_tracer(service_name="my-agent")

That's the whole wire-up. With no further arguments, the runtime:

creates a TracerProvider resource tagged service.name="my-agent"
installs an always-on in-memory exporter (capacity 500 spans, most-recent kept)
installs a BatchSpanProcessor(ConsoleSpanExporter) writing to stderr so it doesn't clobber demo stdout

Subsequent setup_tracer(...) calls re-initialise — the in-memory buffer clears and any previous file handle closes cleanly.

Send to an OTLP collector

For production observability stacks (Tempo, Jaeger, Honeycomb, Grafana Cloud, Datadog, …), pass an OTLP HTTP endpoint:

python

setup_tracer(
    service_name="my-agent",
    otlp_endpoint="http://tempo.observability.svc:4318",
)

The runtime appends /v1/traces and adds an OTLPSpanExporter through a BatchSpanProcessor. The in-memory exporter stays on, so the in-process dashboard panel keeps working alongside the external collector.

OTLP HTTP, not gRPC

The shipped exporter is opentelemetry.exporter.otlp.proto.http. If your collector is gRPC-only, point it at a Tempo / OTel-Collector sidecar that accepts HTTP and re-exports gRPC.

Send to a local file

For replayable local traces (or shipping to log-based observability):

python

setup_tracer(service_name="my-agent", trace_file="traces.jsonl")

Spans land in JSON-lines, one per finished span, via a batched ConsoleSpanExporter writing to the file handle. Re-calling setup_tracer cleanly closes the previous handle.

Test mode

pytest runs that import the runtime should pass testing=True to suppress all file/network export — only the in-memory buffer stays active:

python

setup_tracer(service_name="kitelogik-tests", testing=True)

This is what the bundled tests use. The buffer clears on every setup_tracer call, so tests don't leak spans across cases.

Read finished spans in-process

The in-memory exporter exposes a JSON-friendly snapshot:

python

from kitelogik.observability.tracer import get_finished_spans

spans = get_finished_spans()
# [
#   {
#     "trace_id":      "00000000000000000000000000000001",
#     "span_id":       "0000000000000001",
#     "parent_span_id": "...",
#     "name":          "kitelogik.adapter.tool_call",
#     "start_ms":      1714291200000,
#     "end_ms":        1714291200042,
#     "duration_ms":   42,
#     "status":        "OK",
#     "session_id":    "sess_001",
#     "user_role":     "support_agent",
#     "tool_name":     "approve_refund",
#     "policy_allow":  False,
#     "policy_deny":   True,
#     "policy_hitl":   False,
#     "risk_tier":     "TRANSACTIONAL_HIGH",
#     "reason":        "Refunds over $100 require manager approval",
#     ...
#   },
#   ...
# ]

It's just stdlib OTel and a flat dict — fine to render in your own UI, emit to a metrics pipeline, or assert against in tests. The buffer holds the most recent 500 finished spans; older spans are silently dropped.

Pattern: alert on deny rate

Every gate-evaluation span sets kitelogik.policy.deny. With a collector in front of your trace store, span-metrics or trace-to-metrics rules turn that boolean into a counter:

yaml

# OTel Collector — spanmetrics processor (excerpt)
processors:
  spanmetrics:
    metrics_exporter: prometheus
    dimensions:
      - name: kitelogik.tool_name
      - name: kitelogik.policy.risk_tier
      - name: kitelogik.policy.deny

Then a Prometheus query like:

sum by (tool_name) (
  rate(traces_spanmetrics_calls_total{kitelogik_policy_deny="true"}[5m])
)

…surfaces a denial rate per tool, broken down by risk tier. A sudden spike on kitelogik_policy_deny="true" for a previously-quiet tool is your signal that either an agent is misbehaving or a policy change landed and the agent loop is now hitting it constantly.

Pattern: trace-link an audit record

Every audit record carries the session_id; every span carries kitelogik.session_id. Drop both on the same record in your log shipping pipeline and a click on a denied call in the audit log opens the full trace:

python

log.warning(
    "denied",
    extra={
        "session_id":     record.session_id,
        "tool_name":      record.tool_name,
        "trace_id":       trace.get_current_span().get_span_context().trace_id,
        "policy_version": record.policy_version,
    },
)

In Grafana / Loki this is the standard derived-fields trick: regex out trace_id from the log line and link it into the Traces UI.

Rendering a dashboard

A trace dashboard on top of this is a poll loop: read get_finished_spans(), group by kitelogik.session_id, render a waterfall per session — coloured by allow / deny / HITL — and the operator sees the full agent turn (model → adapter span → gate span → sanitiser span) in one view. The data is the stdlib OTel dict; the dashboard is just a renderer.

Performance benchmarks — measure per-call gate latency; the spans give you the same number in production
Audit trail export — durable record of every decision; pair with traces for full forensics
Architecture — where the tracer sits alongside the gate and the audit store

OpenTelemetry ​

Spans the runtime emits ​

Setup ​

Send to an OTLP collector ​

Send to a local file ​

Test mode ​

Read finished spans in-process ​

Pattern: alert on deny rate ​

Pattern: trace-link an audit record ​

Rendering a dashboard ​

Related ​

OpenTelemetry

Spans the runtime emits

Setup

Send to an OTLP collector

Send to a local file

Test mode

Read finished spans in-process

Pattern: alert on deny rate

Pattern: trace-link an audit record

Rendering a dashboard

Related