Observer Dashboard

The Observer dashboard provides real-time visibility into how Navigator and your A2A agents are performing in production — including safety metrics, usage patterns, costs, latency, and user feedback.

Accessing the Dashboard

The Observer is available to workspace admins from the workspace sidebar:

Workspace view: /workspaces/<your-org>/observer

The Observer requires admin permissions for your workspace. If you don't see it in the sidebar, contact your workspace owner.

Overview

The dashboard is built around widgets — individual metric panels that you can add, remove, and rearrange. Each widget pulls data from your production traces and refreshes automatically.

When you first open the Observer, the system can suggest relevant widgets based on the data available in your workspace. You can accept the suggestions, browse the full widget catalog, or build your dashboard manually.

Time Period

All dashboard widgets respect a global time period selector. Choose from:

  • 7 days — recent activity

  • 30 days — monthly trends

  • 90 days — quarterly view

Widget Categories

The Observer provides 27+ widget types organized into categories:

Usage

Track how your agents are being used:

  • Trace Count — total number of agent executions

  • Unique Users — distinct users interacting with agents

  • Unique Sessions — distinct conversation threads

  • Trace Activity — chart of daily agent executions over time

  • Daily Active Users — DAU trend

  • Top Users — most active users by trace count

  • Top Trace Names — most frequently executed agent operations

Cost

Monitor LLM API spending:

  • Total Cost — aggregate LLM cost for the period

  • Cost Trend — cost over time chart

  • Model Cost Breakdown — spending split by model (e.g. GPT-4o, Claude, etc.)

Performance

Understand latency and throughput:

  • Average Latency — mean execution time across traces

  • Latency Trend — latency over time

  • Tokens per Trace — average token consumption

  • Output Speed — tokens per second

Safety

Monitor AI safety scores:

  • Score Card — displays a safety metric (e.g. hallucination, toxicity) with average value, evaluation count, and 7-day trend

  • Score Trend — safety score over time

  • Incident Counter — traces flagged with high safety scores

  • Flagged Traces Table — detailed view of problematic traces for investigation

Safety scores use an inverted scale where lower values are better (e.g. a low hallucination score means fewer hallucinations). The exception is user feedback, where higher values are positive.

Quality

  • Score Distribution — histogram showing how scores are distributed across traces

Benchmarks

  • Benchmark Comparison — Navigator's accuracy on medical benchmark datasets compared to baseline models, with exact-match and LLM-jury evaluation methods

Workspace (Admin)

Workspace-level widgets for administrators:

  • App Count — total deployed applications

  • App Executions — usage across all apps

  • App Health — status overview of all deployed apps

  • Top Apps — most-used applications

  • Member Engagement — workspace member activity

Customizing Your Dashboard

Adding Widgets

  1. Click the catalog button to browse all available widget types

  2. Widgets are organized by category — click a category to filter

  3. Click Add on any widget to add it to your dashboard

  4. Use Add All to add every widget in a category at once

AI-Suggested Widgets

When setting up a new dashboard, click Suggestions to have the system analyze your available data and recommend 6–12 relevant widgets. This is the fastest way to get started.

Removing Widgets

Click the remove button on any widget to remove it from your dashboard. Your dashboard configuration is saved automatically.

Resetting to Defaults

Use the reset option to clear your dashboard and start fresh with suggested widgets.

User Feedback

The Observer tracks thumbs-up and thumbs-down feedback from end users interacting with Navigator. This feedback appears in:

  • User Feedback Feed — chronological list of feedback events

  • Score Card widgets configured for the user_feedback metric

User feedback uses a direct scale (1 = positive, 0 = negative), unlike safety scores which are inverted.

Last updated

Was this helpful?