Identity · Observability · AI Agents

Trust,
but Verify Identity & Observability
for AI Agents

AI agents that take real actions need more than a system prompt to be safe.

Aydrian Howard Senior Developer Advocate · Auth0 / Okta · @itsaydrian

Identity Observability

The new reality

AI Agents Don't Just Answer.
They Act.

✉️

Read your inbox

Searching, summarizing, and extracting action items from email

📅

Manage your calendar

Finding, creating, and rescheduling events on your behalf

📄

Query private docs

Searching knowledge bases with access to sensitive information

🛒

Make purchases

Completing transactions with real money on your account

This is genuinely useful. It's also genuinely risky.

Two Questions Every Agent
Must Answer

1

Pillar One

"Are you allowed to do that?"

Identity

Delegated authorization, scoped tokens, user-confirmation flows — ensuring the agent acts within sanctioned boundaries.

2

Pillar Two

"What did you actually do?"

Observability

Structured traces of every LLM call, tool invocation, and retrieval — so you can audit, debug, and improve.

Without both, you have a powerful agent you can't trust or debug.

The demo app

Meet Assistant0

A personal AI assistant built on Vercel AI SDK + Next.js. Secured with Auth0 for identity management. Traced end-to-end with Arize AX.

Gmail Google Calendar Google Tasks GitHub Web Search Document RAG Online Shopping

👤 User

Chat UI

→

🤖 LLM

Decision Loop

→

🔧 Tools

8 integrations

→

🌐 APIs

External services

Identity via Auth0 Observability via Arize AX

Pillar 1

Identity

Agents need access to third-party APIs on behalf of a user
Naive approach: hardcoded keys, shared credentials → over-permissioned
Better: delegated authorization — scoped, revocable tokens
User-confirmation flows for consequential actions (purchases, sends)
Key insight: the agent acts as the user, not as a superuser

🔑

Scoped tokens

Read-only Gmail access. Can't send. Can't delete.

⏱️

Short-lived & revocable

Tokens expire. Users can revoke at any time.

🔔

Human-in-the-loop

Purchases and sensitive writes require approval.

Identity in Action

Live Demo

Scoped Authorization

"What are my most recent emails about AI?"

Agent requests a read-only Gmail token
Calls Gmail API — can read, cannot send
Token is scoped, time-limited, revocable

CIBA — Human-in-the-Loop

"Buy me 1 pair of wireless headphones under $150"

1 Agent calls shopOnlineTool

2 CIBA request sent — push notification to your device

3 User approves on phone → agent resumes

4 Purchase completes — only after explicit approval

In Arize AX: trace shows the auth interrupt span with timing, wait period, and execution after approval.

Pillar 2

Observability

A correctly authorized agent is still a black box by default
What model was called? What prompt was sent? Which tool ran?
Observability = structured traces of every decision in the agent loop
Enables: debugging, latency analysis, cost tracking, quality evaluation

🔍

Every LLM call

Model, prompt, token count, latency, cost

🔧

Every tool invocation

Exact JSON inputs, full outputs, duration

📚

Every retrieval

Chunks fetched, similarity scores, auth filters

Observability in Action

Live Demo

Gmail trace

"What are my most recent emails about AI?"

LLM Call (gpt-4o)

842ms

gmailSearchTool

312ms

input: {"query": "AI emails"}

output: 3 emails found

tokens: 1,247 prompt / 89 completion

Shopping / CIBA trace

"Buy me 1 pair of wireless headphones under $150"

LLM Call

4.2s

⏸ Auth Interrupt

3.6s

CIBA push sent — awaiting user approval

shopOnlineTool

580ms

✓ approved → purchase complete

Arize AX: Traces → click trace → expand child spans → inspect I/O tab on each tool span

RAG + Authorization

Live Demo

"What does my knowledge base say about authorization policies?"

🧠

Embed Query

vector

→

🔎

Cosine Search

top-k chunks

→

🔐

FGA Filter

can_view check

→

🤖

LLM Context

authorized only

LLM Call

1.1s

getContextDocumentsTool

480ms

chunks: 8 retrieved · top scores: Sec 2 Scoped Auth 0.94 · Sec 3 FGA Data Access 0.91 · 0.88…

FGA: 5 passed can_view · 3 filtered out

Key: observability makes retrieval quality visible — not just "it answered," but "what did it find and who was allowed to see it?"

Multi-Tool Chaining

Live Demo

"Search my emails for anything about next week's meeting, then create a task to follow up"

Span tree in Arize AX

LLM Call (root)

1,240ms

gmailSearchTool

312ms

query: "next week meeting" → 2 results

createTasksTool

156ms

title: "Follow up on product sync" → created

More chaining examples

getCalendarEventsTool → listRepositories
Each tool span shows exact inputs, outputs, and latency
The full reasoning chain is auditable — not just the final answer

Together

Identity

What the agent
should do

Scoped, delegated tokens
User-confirmed actions
Fine-grained authorization

Observability

What the agent
actually did

Full trace of every call
Tool inputs & outputs
Latency, cost, quality

"Did my agent do the right thing,
and can I prove it?"

This is the baseline for production AI agents.

Key Takeaways

1 Give your agent an identity, not a skeleton key — scoped, revocable tokens, never hardcoded credentials.
2 Require explicit user confirmation for consequential actions — purchases, sends, deletes.
3 Trace everything — LLM calls, tool calls, retrievals. Observability is how you debug, improve, and trust your agent.
4 Observability is not optional — you can't fix what you can't see. Add it from day one.