Identity · Observability · AI Agents

Trust,
but Verify Identity & Observability
for AI Agents

AI agents that take real actions need more than a system prompt to be safe.

Aydrian Howard
Aydrian Howard Senior Developer Advocate · Auth0 / Okta · @itsaydrian
Identity Observability
The new reality

AI Agents Don't Just Answer.
They Act.

✉️
Read your inbox
Searching, summarizing, and extracting action items from email
📅
Manage your calendar
Finding, creating, and rescheduling events on your behalf
📄
Query private docs
Searching knowledge bases with access to sensitive information
🛒
Make purchases
Completing transactions with real money on your account

This is genuinely useful. It's also genuinely risky.

Two Questions Every Agent
Must Answer

1
Pillar One

"Are you allowed to do that?"

Identity

Delegated authorization, scoped tokens, user-confirmation flows — ensuring the agent acts within sanctioned boundaries.

2
Pillar Two

"What did you actually do?"

Observability

Structured traces of every LLM call, tool invocation, and retrieval — so you can audit, debug, and improve.

Without both, you have a powerful agent you can't trust or debug.

The demo app

Meet Assistant0

A personal AI assistant built on Vercel AI SDK + Next.js. Secured with Auth0 for identity management. Traced end-to-end with Arize AX.

Gmail Google Calendar Google Tasks GitHub Web Search Document RAG Online Shopping
👤 User
Chat UI
🤖 LLM
Decision Loop
🔧 Tools
8 integrations
🌐 APIs
External services
Identity via Auth0 Observability via Arize AX
Pillar 1

Identity

  • Agents need access to third-party APIs on behalf of a user
  • Naive approach: hardcoded keys, shared credentials → over-permissioned
  • Better: delegated authorization — scoped, revocable tokens
  • User-confirmation flows for consequential actions (purchases, sends)
  • Key insight: the agent acts as the user, not as a superuser
🔑
Scoped tokens
Read-only Gmail access. Can't send. Can't delete.
⏱️
Short-lived & revocable
Tokens expire. Users can revoke at any time.
🔔
Human-in-the-loop
Purchases and sensitive writes require approval.

Identity in Action

Live Demo
Scoped Authorization
"What are my most recent emails about AI?"
  • Agent requests a read-only Gmail token
  • Calls Gmail API — can read, cannot send
  • Token is scoped, time-limited, revocable
CIBA — Human-in-the-Loop
"Buy me 1 pair of wireless headphones under $150"
1 Agent calls shopOnlineTool
2 CIBA request sent — push notification to your device
3 User approves on phone → agent resumes
4 Purchase completes — only after explicit approval

In Arize AX: trace shows the auth interrupt span with timing, wait period, and execution after approval.

Pillar 2

Observability

  • A correctly authorized agent is still a black box by default
  • What model was called? What prompt was sent? Which tool ran?
  • Observability = structured traces of every decision in the agent loop
  • Enables: debugging, latency analysis, cost tracking, quality evaluation
🔍
Every LLM call
Model, prompt, token count, latency, cost
🔧
Every tool invocation
Exact JSON inputs, full outputs, duration
📚
Every retrieval
Chunks fetched, similarity scores, auth filters

Observability in Action

Live Demo
Gmail trace
"What are my most recent emails about AI?"
LLM Call (gpt-4o)
842ms
gmailSearchTool
312ms
input: {"query": "AI emails"}
output: 3 emails found
tokens: 1,247 prompt / 89 completion
Shopping / CIBA trace
"Buy me 1 pair of wireless headphones under $150"
LLM Call
4.2s
⏸ Auth Interrupt
3.6s
CIBA push sent — awaiting user approval
shopOnlineTool
580ms
✓ approved → purchase complete

Arize AX: Traces → click trace → expand child spans → inspect I/O tab on each tool span

RAG + Authorization

Live Demo
"What does my knowledge base say about authorization policies?"
🧠
Embed Query
vector
🔎
Cosine Search
top-k chunks
🔐
FGA Filter
can_view check
🤖
LLM Context
authorized only
LLM Call
1.1s
getContextDocumentsTool
480ms
chunks: 8 retrieved · top scores: Sec 2 Scoped Auth 0.94 · Sec 3 FGA Data Access 0.91 · 0.88…
FGA: 5 passed can_view · 3 filtered out

Key: observability makes retrieval quality visible — not just "it answered," but "what did it find and who was allowed to see it?"

Multi-Tool Chaining

Live Demo
"Search my emails for anything about next week's meeting, then create a task to follow up"
Span tree in Arize AX
LLM Call (root)
1,240ms
gmailSearchTool
312ms
query: "next week meeting" → 2 results
createTasksTool
156ms
title: "Follow up on product sync" → created
More chaining examples
  • getCalendarEventsToollistRepositories
  • Each tool span shows exact inputs, outputs, and latency
  • The full reasoning chain is auditable — not just the final answer

Together

Identity

What the agent
should do

  • Scoped, delegated tokens
  • User-confirmed actions
  • Fine-grained authorization
Observability

What the agent
actually did

  • Full trace of every call
  • Tool inputs & outputs
  • Latency, cost, quality

"Did my agent do the right thing,
and can I prove it?"

This is the baseline for production AI agents.

Key Takeaways

Questions?

Let's talk identity, observability, and making AI agents worth trusting.

Aydrian Howard
Aydrian Howard @itsaydrian · itsaydrian.dev
Identity Observability #TrustButVerify