-
Notifications
You must be signed in to change notification settings - Fork 283
Data gen UI improvements #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughTerminology across the evaluation UI has been updated, replacing "evaluation method" and "evaluator" with "judge" or "judge algorithm." Labels, tooltips, section headers, and messages reflect this change. Additionally, advanced evaluation options in the config creation page are now grouped in a collapsible panel. No logic, data flow, or exported entity signatures were changed. The Collapse component’s CSS class logic was also refined. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI
participant CollapsePanel
User->>UI: Open evaluation config creation page
UI->>CollapsePanel: Render advanced options (hidden)
User->>CollapsePanel: Click "Advanced Options"
CollapsePanel->>UI: Show prompts and instructions fields
User->>UI: Fill out judge algorithm and advanced options
UI->>UI: Update all labels and tooltips to use "Judge"
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (2)
230-235
: Use “judge” instead of “evaluator” in tooltip 3 for consistencyAll surrounding UI copy has switched to the new terminology, but step 3’s tooltip still says “evaluator”.
This is minor, yet it jumps out in the wizard flow.- 3: "A 'golden' dataset is a dataset of items that are rated by humans. Rating a 'golden' dataset lets us determine if the evaluator is working by checking how well it aligns to human preferences. ", + 3: "A 'golden' dataset is a dataset of items that are rated by humans. Rating a 'golden' dataset lets us determine if the judge is working by checking how well it aligns to human preferences.",
570-578
: Rename panel header to “Judge Properties”The left-hand process now speaks almost exclusively about judges, but the side panel header still says “Evaluator Properties”.
Renaming avoids cognitive friction for users skimming the page.- <PropertyList - ... - title="Evaluator Properties" + <PropertyList + ... + title="Judge Properties"app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/eval_configs/+page.svelte (2)
313-341
: Update warning copy to the new terminologyThe incomplete-run warning still reads “You evals are incomplete”.
Since the table, dialogs and buttons now all speak of judges, adjust the wording for clarity.- warnings.push( - "You evals are incomplete. Click 'Run All Eval' to generate scores for the missing items.", - ) + warnings.push( + "Some judge evaluations are incomplete. Click 'Run All Eval' to generate scores for the missing items.", + )
446-452
: Header still says “Evaluator Properties”The page title is “Compare Judges”, but the properties block retains the old header.
Consider renaming for coherence.- <div class="text-xl font-bold mb-4">Evaluator Properties</div> + <div class="text-xl font-bold mb-4">Judge Properties</div>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
app/web_ui/src/lib/ui/collapse.svelte
(1 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte
(6 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/[eval_config_id]/[run_config_id]/run_result/+page.svelte
(2 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/compare_run_methods/+page.svelte
(8 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/create_eval_config/+page.svelte
(5 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/eval_configs/+page.svelte
(14 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/create_evaluator/+page.svelte
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
🔇 Additional comments (23)
app/web_ui/src/lib/ui/collapse.svelte (2)
7-9
: LGTM! Good improvement to the collapse component sizing.The conditional class assignment properly handles both small and medium collapse variants, providing better flexibility for different use cases.
14-16
: LGTM! Consistent text sizing with the component's size prop.The conditional text sizing aligns well with the collapse size variants - applying "text-sm" only when the component is in small mode.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/[eval_config_id]/[run_config_id]/run_result/+page.svelte (3)
110-112
: LGTM! Consistent terminology update to "Judge".The function return keys properly reflect the new terminology, changing from evaluation-focused terms to judge-focused terms while maintaining the same data structure.
120-120
: LGTM! Clear and consistent subtitle update.The subtitle now consistently uses "judge" terminology, improving clarity and consistency across the evaluation UI.
163-165
: LGTM! Consistent section header and description updates.Both the section header and description now use "judge" terminology, maintaining consistency with the overall UI terminology updates.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/create_evaluator/+page.svelte (3)
253-257
: LGTM! Simplified and clearer dataset section header.The updated section header "Evaluation Dataset" is more concise than "Task Evaluation Dataset", and the description is clearer about the dataset's purpose.
303-307
: LGTM! Better terminology with "Golden Dataset"."Golden Dataset" is more intuitive and widely understood terminology than "Dataset to Compare Evaluation Methods". The description also clearly explains the purpose of finding the best eval judge.
311-312
: LGTM! Consistent form element labeling.The form element now uses the consistent "Golden Dataset" terminology, and the info description clearly explains the purpose for finding the judge that best matches human preferences.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/create_eval_config/+page.svelte (6)
17-17
: LGTM! Appropriate import for the collapsible UI enhancement.The Collapse component import is needed for the new collapsible advanced options panel.
200-201
: LGTM! Consistent terminology in page title and subtitle.The page title and subtitle now consistently use "judge" terminology, aligning with the overall UI terminology updates.
221-221
: LGTM! Consistent button labeling.The submit button now uses "Create Judge" which is consistent with the page's purpose and terminology.
227-227
: LGTM! Clear step title using judge terminology.The step title "Select Judge Algorithm" is clear and consistent with the new terminology.
267-271
: LGTM! Consistent step title and description.The step title and description now consistently use "judge" terminology, making the interface more coherent.
286-351
: LGTM! Great UX improvement with collapsible advanced options.The collapsible panel effectively groups advanced options (task description and evaluation instructions) that are not needed by all users. The structure is well-organized with:
- Clear title "Advanced Options: Prompts and Instructions"
- Helpful warning about customizing prompts
- Logical grouping of related advanced settings
- Proper use of the Collapse component with medium size
This improves the user experience by simplifying the initial view while keeping advanced options accessible.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/compare_run_methods/+page.svelte (9)
190-190
: LGTM! Consistent error message terminology.The error message now uses "judge" terminology, maintaining consistency across the application.
320-324
: LGTM! Consistent property naming.The property names now use "Judge Algorithm" and "Judge Model" which aligns with the terminology updates throughout the evaluation UI.
346-348
: LGTM! Consistent dropdown option grouping.The dropdown option groups now use "Select Judge" and "Manage Judges" which are clear and consistent with the new terminology.
413-413
: LGTM! Consistent action button labeling.The action button now uses "Compare Judges" which is consistent with the page's functionality and terminology.
436-438
: LGTM! Consistent section header and description.The section header and description now use "judge" terminology, maintaining consistency throughout the interface.
444-444
: LGTM! Consistent form element labeling.The form element label now uses "Judge" which is consistent with the terminology updates.
453-459
: LGTM! Consistent warning message terminology.Both warning messages now use "judge" terminology, maintaining consistency in user-facing messages.
474-474
: LGTM! Consistent property labeling.The property label now uses "Judge Quality" which is consistent with the terminology updates.
495-495
: LGTM! Consistent tooltip terminology.The tooltip text now uses "judge" terminology, maintaining consistency in user-facing help text.
Summary by CodeRabbit
New Features
Style
Documentation