Data gen UI improvements #427

scosman · 2025-07-14T18:31:56Z

Summary by CodeRabbit

New Features
- Advanced options in the evaluation configuration form are now grouped within a collapsible panel for improved organization and user experience.
Style
- Updated the appearance and sizing of collapsible UI elements for better consistency.
Documentation
- Updated user-facing terminology throughout the evaluation interface:
  - "Evaluation Method" and "Evaluator" are now referred to as "Judge" or "Judge Algorithm."
  - "Eval Method Dataset" is now "Golden Dataset."
  - All related labels, tooltips, warnings, and descriptions have been revised for clarity and consistency.

coderabbitai · 2025-07-14T18:32:02Z

Walkthrough

Terminology across the evaluation UI has been updated, replacing "evaluation method" and "evaluator" with "judge" or "judge algorithm." Labels, tooltips, section headers, and messages reflect this change. Additionally, advanced evaluation options in the config creation page are now grouped in a collapsible panel. No logic, data flow, or exported entity signatures were changed. The Collapse component’s CSS class logic was also refined.

Changes

File(s)	Change Summary
app/web_ui/src/lib/ui/collapse.svelte	Adjusted conditional CSS class logic for size and text classes in the Collapse component.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte	Replaced "evaluation method"/"evaluator" terminology with "judge" in labels, tooltips, and UI messages.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/[eval_config_id]/[run_config_id]/run_result/+page.svelte	Updated evaluation property keys, subtitles, and section labels to use "Judge" instead of "Eval" or "Evaluation Method."
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/compare_run_methods/+page.svelte	Renamed all references from "evaluation method" to "judge" in UI text, labels, and tooltips.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/create_eval_config/+page.svelte	Changed terminology to "Judge," grouped advanced options in a new collapsible panel, and updated related UI text.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/eval_configs/+page.svelte	Updated all UI text and tooltips from "Eval Method" to "Judge" and from "eval method dataset" to "golden dataset."
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/create_evaluator/+page.svelte	Updated dataset selection section titles and descriptions to use "Golden Dataset" and refer to judges instead of evaluation methods.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI
    participant CollapsePanel

    User->>UI: Open evaluation config creation page
    UI->>CollapsePanel: Render advanced options (hidden)
    User->>CollapsePanel: Click "Advanced Options"
    CollapsePanel->>UI: Show prompts and instructions fields
    User->>UI: Fill out judge algorithm and advanced options
    UI->>UI: Update all labels and tooltips to use "Judge"

Poem

In the warren of code, a new word appears—
"Judge" now presides, as the old name clears.
Collapsible panels, golden sets in view,
Rabbits hop forward, with labels anew.
With every rename, the UI feels right—
A carrot for clarity, and code shining bright!
🥕✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bb55ea and 57bb1de.

📒 Files selected for processing (1)

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Desktop Apps (macos-13)
GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
GitHub Check: Build Desktop Apps (macos-latest)
GitHub Check: Build Desktop Apps (ubuntu-22.04)
GitHub Check: Build Desktop Apps (windows-latest)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (2)
230-235: Use “judge” instead of “evaluator” in tooltip 3 for consistency

All surrounding UI copy has switched to the new terminology, but step 3’s tooltip still says “evaluator”.
This is minor, yet it jumps out in the wizard flow.
-    3: "A 'golden' dataset is a dataset of items that are rated by humans. Rating a 'golden' dataset lets us determine if the evaluator is working by checking how well it aligns to human preferences. ",
+    3: "A 'golden' dataset is a dataset of items that are rated by humans. Rating a 'golden' dataset lets us determine if the judge is working by checking how well it aligns to human preferences.",
570-578: Rename panel header to “Judge Properties”

The left-hand process now speaks almost exclusively about judges, but the side panel header still says “Evaluator Properties”.
Renaming avoids cognitive friction for users skimming the page.
-  <PropertyList
-    ...
-    title="Evaluator Properties"
+  <PropertyList
+    ...
+    title="Judge Properties"
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/eval_configs/+page.svelte (2)
313-341: Update warning copy to the new terminology

The incomplete-run warning still reads “You evals are incomplete”.
Since the table, dialogs and buttons now all speak of judges, adjust the wording for clarity.
-      warnings.push(
-        "You evals are incomplete. Click 'Run All Eval' to generate scores for the missing items.",
-      )
+      warnings.push(
+        "Some judge evaluations are incomplete. Click 'Run All Eval' to generate scores for the missing items.",
+      )
446-452: Header still says “Evaluator Properties”

The page title is “Compare Judges”, but the properties block retains the old header.
Consider renaming for coherence.
-  <div class="text-xl font-bold mb-4">Evaluator Properties</div>
+  <div class="text-xl font-bold mb-4">Judge Properties</div>

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1f8c9d and 2bb55ea.

📒 Files selected for processing (7)

app/web_ui/src/lib/ui/collapse.svelte (1 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (6 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/[eval_config_id]/[run_config_id]/run_result/+page.svelte (2 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/compare_run_methods/+page.svelte (8 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/create_eval_config/+page.svelte (5 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/eval_configs/+page.svelte (14 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/create_evaluator/+page.svelte (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Desktop Apps (macos-latest)
GitHub Check: Build Desktop Apps (macos-13)
GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
GitHub Check: Build Desktop Apps (windows-latest)
GitHub Check: Build Desktop Apps (ubuntu-22.04)

🔇 Additional comments (23)

app/web_ui/src/lib/ui/collapse.svelte (2)

7-9: LGTM! Good improvement to the collapse component sizing.

The conditional class assignment properly handles both small and medium collapse variants, providing better flexibility for different use cases.

14-16: LGTM! Consistent text sizing with the component's size prop.

The conditional text sizing aligns well with the collapse size variants - applying "text-sm" only when the component is in small mode.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/[eval_config_id]/[run_config_id]/run_result/+page.svelte (3)

110-112: LGTM! Consistent terminology update to "Judge".

The function return keys properly reflect the new terminology, changing from evaluation-focused terms to judge-focused terms while maintaining the same data structure.

120-120: LGTM! Clear and consistent subtitle update.

The subtitle now consistently uses "judge" terminology, improving clarity and consistency across the evaluation UI.

163-165: LGTM! Consistent section header and description updates.

Both the section header and description now use "judge" terminology, maintaining consistency with the overall UI terminology updates.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/create_evaluator/+page.svelte (3)

253-257: LGTM! Simplified and clearer dataset section header.

The updated section header "Evaluation Dataset" is more concise than "Task Evaluation Dataset", and the description is clearer about the dataset's purpose.

303-307: LGTM! Better terminology with "Golden Dataset".

"Golden Dataset" is more intuitive and widely understood terminology than "Dataset to Compare Evaluation Methods". The description also clearly explains the purpose of finding the best eval judge.

311-312: LGTM! Consistent form element labeling.

The form element now uses the consistent "Golden Dataset" terminology, and the info description clearly explains the purpose for finding the judge that best matches human preferences.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/create_eval_config/+page.svelte (6)

17-17: LGTM! Appropriate import for the collapsible UI enhancement.

The Collapse component import is needed for the new collapsible advanced options panel.

200-201: LGTM! Consistent terminology in page title and subtitle.

The page title and subtitle now consistently use "judge" terminology, aligning with the overall UI terminology updates.

221-221: LGTM! Consistent button labeling.

The submit button now uses "Create Judge" which is consistent with the page's purpose and terminology.

227-227: LGTM! Clear step title using judge terminology.

The step title "Select Judge Algorithm" is clear and consistent with the new terminology.

267-271: LGTM! Consistent step title and description.

The step title and description now consistently use "judge" terminology, making the interface more coherent.

286-351: LGTM! Great UX improvement with collapsible advanced options.

The collapsible panel effectively groups advanced options (task description and evaluation instructions) that are not needed by all users. The structure is well-organized with:

Clear title "Advanced Options: Prompts and Instructions"

Helpful warning about customizing prompts

Logical grouping of related advanced settings

Proper use of the Collapse component with medium size

This improves the user experience by simplifying the initial view while keeping advanced options accessible.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/compare_run_methods/+page.svelte (9)

190-190: LGTM! Consistent error message terminology.

The error message now uses "judge" terminology, maintaining consistency across the application.

320-324: LGTM! Consistent property naming.

The property names now use "Judge Algorithm" and "Judge Model" which aligns with the terminology updates throughout the evaluation UI.

346-348: LGTM! Consistent dropdown option grouping.

The dropdown option groups now use "Select Judge" and "Manage Judges" which are clear and consistent with the new terminology.

413-413: LGTM! Consistent action button labeling.

The action button now uses "Compare Judges" which is consistent with the page's functionality and terminology.

436-438: LGTM! Consistent section header and description.

The section header and description now use "judge" terminology, maintaining consistency throughout the interface.

444-444: LGTM! Consistent form element labeling.

The form element label now uses "Judge" which is consistent with the terminology updates.

453-459: LGTM! Consistent warning message terminology.

Both warning messages now use "judge" terminology, maintaining consistency in user-facing messages.

474-474: LGTM! Consistent property labeling.

The property label now uses "Judge Quality" which is consistent with the terminology updates.

495-495: LGTM! Consistent tooltip terminology.

The tooltip text now uses "judge" terminology, maintaining consistency in user-facing help text.

scosman added 2 commits July 14, 2025 12:55

Hide advanced prompting in a collapse

b348c11

Better naming. Eval method = Judge, eval method dataset = golden dataset

2bb55ea

coderabbitai bot reviewed Jul 14, 2025

View reviewed changes

improve string

57bb1de

scosman merged commit 6ca4abe into main Jul 14, 2025
14 checks passed

scosman deleted the sdg_ui_improvements branch July 14, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data gen UI improvements #427

Data gen UI improvements #427

Uh oh!

scosman commented Jul 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 14, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Data gen UI improvements #427

Data gen UI improvements #427

Uh oh!

Conversation

scosman commented Jul 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scosman commented Jul 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)