Observer Dashboard
The Observer dashboard provides real-time visibility into how Navigator and your A2A agents are performing in production — including safety metrics, usage patterns, costs, latency, and user feedback.
Accessing the Dashboard
The Observer is available to workspace admins from the workspace sidebar:
Workspace view: /workspaces/<your-org>/observer
The Observer requires admin permissions for your workspace. If you don't see it in the sidebar, contact your workspace owner.
Overview
The dashboard is built around widgets — individual metric panels that you can add, remove, and rearrange. Each widget pulls data from your production traces and refreshes automatically.
When you first open the Observer, the system can suggest relevant widgets based on the data available in your workspace. You can accept the suggestions, browse the full widget catalog, or build your dashboard manually.
Time Period
All dashboard widgets respect a global time period selector. Choose from:
7 days — recent activity
30 days — monthly trends
90 days — quarterly view
Widget Categories
The Observer provides 27+ widget types organized into categories:
Usage
Track how your agents are being used:
Trace Count — total number of agent executions
Unique Users — distinct users interacting with agents
Unique Sessions — distinct conversation threads
Trace Activity — chart of daily agent executions over time
Daily Active Users — DAU trend
Top Users — most active users by trace count
Top Trace Names — most frequently executed agent operations
Cost
Monitor LLM API spending:
Total Cost — aggregate LLM cost for the period
Cost Trend — cost over time chart
Model Cost Breakdown — spending split by model (e.g. GPT-4o, Claude, etc.)
Performance
Understand latency and throughput:
Average Latency — mean execution time across traces
Latency Trend — latency over time
Tokens per Trace — average token consumption
Output Speed — tokens per second
Safety
Monitor AI safety scores:
Score Card — displays a safety metric (e.g. hallucination, toxicity) with average value, evaluation count, and 7-day trend
Score Trend — safety score over time
Incident Counter — traces flagged with high safety scores
Flagged Traces Table — detailed view of problematic traces for investigation
Safety scores use an inverted scale where lower values are better (e.g. a low hallucination score means fewer hallucinations). The exception is user feedback, where higher values are positive.
Quality
Score Distribution — histogram showing how scores are distributed across traces
Benchmarks
Benchmark Comparison — Navigator's accuracy on medical benchmark datasets compared to baseline models, with exact-match and LLM-jury evaluation methods
Workspace (Admin)
Workspace-level widgets for administrators:
App Count — total deployed applications
App Executions — usage across all apps
App Health — status overview of all deployed apps
Top Apps — most-used applications
Member Engagement — workspace member activity
Customizing Your Dashboard
Adding Widgets
Click the catalog button to browse all available widget types
Widgets are organized by category — click a category to filter
Click Add on any widget to add it to your dashboard
Use Add All to add every widget in a category at once
AI-Suggested Widgets
When setting up a new dashboard, click Suggestions to have the system analyze your available data and recommend 6–12 relevant widgets. This is the fastest way to get started.
Removing Widgets
Click the remove button on any widget to remove it from your dashboard. Your dashboard configuration is saved automatically.
Resetting to Defaults
Use the reset option to clear your dashboard and start fresh with suggested widgets.
User Feedback
The Observer tracks thumbs-up and thumbs-down feedback from end users interacting with Navigator. This feedback appears in:
User Feedback Feed — chronological list of feedback events
Score Card widgets configured for the
user_feedbackmetric
User feedback uses a direct scale (1 = positive, 0 = negative), unlike safety scores which are inverted.
Last updated
Was this helpful?