-
Notifications
You must be signed in to change notification settings - Fork 283
Update our topic tree evals and eval gen #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Topic tree prompt more aware of goal (eval v training) - Topic tree prompt moved guidance into system prompt - UI layer: see we're targeting an kiln_requirements eval, and generate proper guidance Note: WIP. Need to repeat this for input gen as well, and output gen.
WalkthroughThe changes introduce a new required field, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WebUI
participant API
participant DataGenTask
User->>WebUI: Navigate to data generation page (with reason param)
WebUI->>WebUI: Set gen_type = "eval" or "training" based on URL
WebUI->>API: POST /generate_categories (includes gen_type, guidance)
API->>DataGenTask: Instantiate DataGenCategoriesTask(gen_type, guidance)
DataGenTask->>DataGenTask: Generate prompt with gen_type and guidance
DataGenTask-->>API: Return generated categories
API-->>WebUI: Respond with categories
WebUI-->>User: Display generated categories
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
app/desktop/studio_server/test_data_gen_api.py (1)
94-121
: Consider adding test coverage for both gen_type values.The current test only covers
gen_type="eval"
, but the PR objectives mention distinguishing between "eval" and "training" contexts. Consider adding a separate test case forgen_type="training"
to ensure comprehensive coverage of both generation types.def test_generate_categories_success_training( mock_task_from_id, mock_langchain_adapter, client, mock_task_run, ): # Arrange input_data = DataGenCategoriesApiInput( node_path=["parent", "child"], num_subtopics=4, human_guidance="Generate tech categories", gen_type="training", model_name="gpt-4", provider="openai", ) # Act & Assert (same as existing test) # ...libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
84-84
: Remove debug print statement before production.The debug print statement should be removed as it could expose sensitive information in production logs and adds unnecessary output.
- print("GENERATE TOPIC TREE PROMPT", instruction)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
app/desktop/studio_server/data_gen_api.py
(3 hunks)app/desktop/studio_server/test_data_gen_api.py
(1 hunks)app/web_ui/src/lib/api_schema.d.ts
(1 hunks)app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/add_data/+page.svelte
(1 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte
(1 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
(3 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
(3 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte
(7 hunks)app/web_ui/src/routes/(fullscreen)/setup/(setup)/subscribe/+page.svelte
(0 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
(2 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_task.py
(3 hunks)libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
(5 hunks)
💤 Files with no reviewable changes (1)
- app/web_ui/src/routes/(fullscreen)/setup/(setup)/subscribe/+page.svelte
🧰 Additional context used
🧬 Code Graph Analysis (2)
app/desktop/studio_server/data_gen_api.py (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
DataGenCategoriesTask
(68-97)
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)
generate_topic_tree_prompt
(9-85)
🔇 Additional comments (27)
app/desktop/studio_server/test_data_gen_api.py (1)
105-105
: LGTM! Test updated to match new API contract.The addition of
gen_type="eval"
correctly aligns with the new required field in the API.app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/add_data/+page.svelte (1)
83-84
: LGTM! Parameter standardization improves consistency.The change from
issue_eval_id
toeval_id
aligns with the broader effort to standardize eval parameter handling across the application.app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (1)
340-340
: LGTM – Compositeeval_id
Parsing Verified
- In
synth_data_guidance.svelte
,full_eval_id.split("::")
correctly yields[project_id, task_id, eval_id]
.- Query-param colons are percent-encoded via
URLSearchParams
and decoded bysearchParams.get()
, so the separator survives round-trip.- Other pages (e.g.
add_data
) forward the composite ID without parsing, relying on route segments for context.No further changes needed here.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (3)
18-18
: LGTM: Proper type definition for the new export property.The
gen_type
property is correctly typed and follows TypeScript best practices.
145-145
: LGTM: Correctly includes gen_type in API payload.The
gen_type
parameter is properly passed to the backend API call, aligning with the API schema changes.
323-323
: LGTM: Proper prop propagation in recursive component.The
gen_type
property is correctly passed down to child component instances, ensuring consistent generation context throughout the component tree.app/desktop/studio_server/data_gen_api.py (3)
1-1
: LGTM: Proper import for type annotation.The
Literal
import is correctly added to support the newgen_type
field typing.
28-30
: LGTM: Well-defined required field with proper typing.The
gen_type
field is correctly defined as required with properLiteral
typing and descriptive documentation.
93-95
: LGTM: Correct task instantiation with new parameters.The
DataGenCategoriesTask
is properly instantiated with the requiredgen_type
and optionalspecific_guidance
parameters, aligning with the task constructor changes.app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (3)
39-41
: LGTM: Proper reactive derivation from URL parameters.The
gen_type
is correctly derived from the URL search parameter "reason", with a sensible default of "training" and proper reactive updates when the URL changes.
395-395
: LGTM: Correct binding to child component.The
gen_type
is properly bound to theGeneratedDataNode
component, ensuring the generation context is passed down correctly.
587-587
: LGTM: Consistent binding pattern.The
task
binding toSynthDataGuidance
component follows the same pattern as other bindings in the component.libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (4)
6-6
: LGTM: Proper import for new functionality.The import of
generate_topic_tree_prompt
is correctly added to test the new prompt generation functionality.
63-63
: LGTM: Updated task constructors reflect API changes.The task constructors are correctly updated to include the required
gen_type
parameter and optionalspecific_guidance
, aligning with the new API.Also applies to: 78-78, 111-111
72-74
: LGTM: Appropriate assertions for dynamic instruction content.The assertions correctly validate that the generated instructions contain the expected content based on the
gen_type
andspecific_guidance
parameters.Also applies to: 80-82
339-470
: LGTM: Comprehensive test coverage for prompt generation.The test suite provides excellent coverage of the
generate_topic_tree_prompt
function, including:
- Different
gen_type
values ("training", "eval")- With and without specific guidance
- Edge cases like empty guidance
- Structural validation of prompt content
- Consistency checks between variants
This thorough testing ensures the prompt generation behaves correctly across all scenarios.
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (3)
2-2
: LGTM: Proper imports for new functionality.The imports for
Literal
typing andgenerate_topic_tree_prompt
function are correctly added.Also applies to: 11-11
75-77
: LGTM: Well-designed constructor signature.The constructor properly accepts
gen_type
as a required parameter andspecific_guidance
as optional, with appropriate type annotations.
81-83
: LGTM: Correct implementation of dynamic prompt generation.The instruction is properly generated using the
generate_topic_tree_prompt
function with the provided parameters, replacing the previous static approach.Also applies to: 90-90
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (3)
6-11
: LGTM! Well-typed function signature.Good use of
Literal
type to constrain thegen_type
parameter to valid values.
16-21
: Consider adding defensive validation for gen_type.While the
Literal
type annotation should catch invalid values during type checking, consider adding runtime validation or an else clause to handle unexpected values defensively.
58-71
: LGTM! Clear conditional guidance handling.The logic properly handles both specific and default guidance cases with clear formatting.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (5)
33-41
: LGTM! Improved naming conventions.The parameter and function names are now more generic and descriptive, better supporting multiple eval types.
62-66
: LGTM! Clear template type mapping.The conditional logic properly maps the eval template types to their corresponding UI templates.
223-244
: LGTM! Well-structured dynamic option building.The function properly builds eval-specific options based on the template type and only adds the group when options exist.
295-320
: LGTM! Well-structured requirements template generation.The function creates clear, structured guidance with proper handling of requirements and systematic XML-like tagging.
322-334
: LGTM! Proper handling of async loading.The reactive function correctly handles the async loading order between task and evaluator, ensuring the requirements template is updated when both are available.
app/web_ui/src/lib/api_schema.d.ts
Outdated
/** | ||
* Gen Type | ||
* @description The type of task to generate topics for | ||
* @enum {string} | ||
*/ | ||
gen_type: "eval" | "training"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Well-structured addition to support context-aware topic generation.
The new gen_type
field properly distinguishes between evaluation and training contexts with clear documentation and appropriate type constraints. The string literal union type ensures type safety and prevents invalid values.
Note that this is a breaking change since it adds a required field to the existing interface. Ensure all frontend components using DataGenCategoriesApiInput
have been updated to provide this field.
🏁 Script executed:
#!/bin/bash
# Verify that all usages of DataGenCategoriesApiInput have been updated to include gen_type
ast-grep --pattern 'DataGenCategoriesApiInput'
Length of output: 428
Update backend model and tests for new required gen_type
field
Since gen_type
was added as a required field in the TypeScript API schema, the corresponding Python Pydantic model and its tests must be updated to avoid runtime errors:
• app/desktop/studio_server/data_gen_api.py
– In the DataGenCategoriesApiInput(BaseModel)
class definition, add a gen_type: Literal["eval", "training"]
field with matching docstring.
• app/desktop/studio_server/test_data_gen_api.py
– Update the input_data = DataGenCategoriesApiInput(...)
call (around line 101) to include a valid gen_type
value.
– Add test cases covering both "eval"
and "training"
scenarios if not already present.
🤖 Prompt for AI Agents
In app/desktop/studio_server/data_gen_api.py, update the
DataGenCategoriesApiInput class to include a new required field gen_type with
type Literal["eval", "training"] and add a matching docstring. Then, in
app/desktop/studio_server/test_data_gen_api.py around line 101, modify the
input_data initialization to include a valid gen_type value and add test cases
to cover both "eval" and "training" values to ensure proper validation and avoid
runtime errors.
…mmit for topic gen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (1)
32-32
: Fix prop naming inconsistency between component interface and API.The component still accepts
human_guidance
as a prop (line 32) but sends it asguidance
to the API (line 111). This inconsistency suggests incomplete refactoring.Either rename the prop to match the API:
-export let human_guidance: string | null = null +export let guidance: string | null = nullAnd update line 111:
-guidance: human_guidance ? human_guidance : null, // clear empty string +guidance: guidance ? guidance : null, // clear empty stringOr keep the prop name and update the API field. The first option seems more aligned with the PR's goal of replacing
human_guidance
withguidance
.Also applies to: 111-111
🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)
9-17
: Add defensive else clause for robustness.While the
Literal
type hint should prevent invalid values, consider adding an else clause for defensive programming:def generate_goal_description(gen_type: Literal["training", "eval"]) -> str: """ Generate a goal description for the given generation type. """ if gen_type == "training": return "I want to train a large language model and you should help me generate training data for it." elif gen_type == "eval": return "I want to evaluate a large language model and you should help me generate eval data for it." + else: + raise ValueError(f"Invalid gen_type: {gen_type}. Must be 'training' or 'eval'.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
app/desktop/studio_server/data_gen_api.py
(7 hunks)app/desktop/studio_server/test_data_gen_api.py
(2 hunks)app/web_ui/src/lib/api_schema.d.ts
(3 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
(2 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
(4 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
(3 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_task.py
(3 hunks)libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- app/desktop/studio_server/test_data_gen_api.py
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
- app/desktop/studio_server/data_gen_api.py
- app/web_ui/src/lib/api_schema.d.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (macos-13)
🔇 Additional comments (2)
libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (1)
342-610
: Excellent test coverage for the new prompt generation functions!The comprehensive tests cover all scenarios including:
- Both generation types ("training" and "eval")
- With and without guidance
- Empty and None guidance cases
- Validation of prompt structure and content
- Consistency checks between prompt types
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
195-210
: Ignore removal suggestion –wrap_task_with_guidance
is in useThe
wrap_task_with_guidance
function is imported and invoked in the DataGen API, so it should not be removed:• Location of usage
app/desktop/studio_server/data_gen_api.py
- Imported alongside
DataGenSampleTask
- Called when
sample.guidance
is provided:if sample.guidance and sample.guidance.strip(): task.instruction = wrap_task_with_guidance( task.instruction, sample.guidance )The function is actively used to wrap task instructions with human guidance. Please disregard the suggestion to remove it.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (1)
365-377
: Consider adding input validation for custom topics.The custom topic input accepts comma-separated values but doesn't validate the format. Consider adding validation to handle edge cases like trailing commas or empty values between commas.
You could enhance the
add_custom_topics
function:function add_custom_topics() { if (!custom_topics_string) { return } - const topics = custom_topics_string.split(",").map((t) => t.trim()) + const topics = custom_topics_string + .split(",") + .map((t) => t.trim()) + .filter((t) => t.length > 0) add_subtopics(topics)Also applies to: 117-125
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
app/web_ui/src/lib/utils/form_element.svelte
(1 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
(9 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
(4 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
(7 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte
(2 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
🔇 Additional comments (7)
app/web_ui/src/lib/utils/form_element.svelte (1)
11-11
: LGTM! Clean addition of header-only input type.The addition of
"header_only"
to the input type union is well-implemented. When this type is used, the component will render only the label and description sections (lines 90-128) without any input element, which is the expected behavior for a header-only display.app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (3)
18-19
: Good architectural improvement with centralized data model.The refactoring from individual props to a single
guidance_data
object improves encapsulation and reduces prop drilling throughout the component hierarchy.
16-17
: Clean implementation of dual-mode topic entry.The custom topic mode toggle provides flexibility while defaulting to the normal generation mode when reopening the modal.
Also applies to: 67-68
141-142
: Proper handling of guidance data with null safety.Good practice converting empty strings to null to maintain clean API payloads.
Also applies to: 151-152
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (2)
1-26
: Clean refactoring to remove local state management.The component now properly delegates all state management to the centralized data model, which improves consistency across the application.
89-99
: Good conditional warning display implementation.The warning system properly uses the data model's custom_warning method to display context-appropriate messages.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
363-493
: Excellent template definitions with comprehensive guidance.The static templates are well-structured with clear instructions for each evaluation type. The
suggest_uncensored
flag is appropriately set for sensitive content generation.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
Outdated
Show resolved
Hide resolved
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
Outdated
Show resolved
Hide resolved
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
Show resolved
Hide resolved
…cs, inputs and outputs are not longer the same. This lets us be better at giving examples. Improved our base templates (topic gen, sample gen) as well as specific use case (toxicity, bias, etc). Still WIP, need to do issue/requirements templates next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)
34-49
: Ensure destroy() is called in parent components to prevent memory leaks.The destroy method properly cleans up subscriptions, but it must be called when the parent component unmounts.
#!/bin/bash # Description: Verify that SynthDataGuidanceDataModel.destroy() is called in parent components # Search for instantiation of SynthDataGuidanceDataModel echo "=== Searching for SynthDataGuidanceDataModel instantiation ===" rg -A 20 "new SynthDataGuidanceDataModel" # Search for destroy() calls on guidance_data echo -e "\n=== Searching for destroy() calls ===" rg "guidance_data.*destroy|guidanceModel.*destroy" --type svelte --type ts # Search for onDestroy usage in files that use SynthDataGuidanceDataModel echo -e "\n=== Searching for onDestroy in parent components ===" fd -e svelte -x grep -l "SynthDataGuidanceDataModel" {} \; | xargs grep -n "onDestroy"
76-115
: Add user-facing error handling for eval loading failures.Currently, errors are only logged to console. Consider adding an error store to inform users when eval loading fails:
export class SynthDataGuidanceDataModel { // ... existing properties ... + public loadError: Writable<string | null> = writable(null) // In load_eval method: } catch (error) { - // TODO console.error(error) + this.loadError.set("Failed to load evaluation data. Please try again.") } finally { this.loading.set(false) }Then display the error in the UI component when
$loadError
is not null.
149-188
: Address TODO comments for unique guidance per type.The method currently sets identical guidance for all three types with multiple TODO comments. Consider implementing type-specific guidance:
private apply_selected_template(template: string) { if (template == "custom") { - // TODO make each unique this.topic_guidance.set(null) this.input_guidance.set(null) this.output_guidance.set(null) } if (template == "issue_eval_template" && this.evaluator) { - // TODO make each unique - const issue_eval_template = this.issue_eval_template(this.evaluator) - this.topic_guidance.set(issue_eval_template) - this.input_guidance.set(issue_eval_template) - this.output_guidance.set(issue_eval_template) + this.topic_guidance.set(this.issue_eval_template_topics(this.evaluator)) + this.input_guidance.set(this.issue_eval_template_inputs(this.evaluator)) + this.output_guidance.set(this.issue_eval_template_outputs(this.evaluator)) }
🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)
9-17
: Add explicit error handling for invalid gen_type.While the
Literal
type hint helps during development, consider adding runtime validation:def generate_goal_description(gen_type: Literal["training", "eval"]) -> str: """ Generate a goal description for the given generation type. """ if gen_type == "training": return "I want to train a large language model and you should help me generate training data for it." elif gen_type == "eval": return "I want to evaluate a large language model and you should help me generate eval data for it." + else: + raise ValueError(f"Invalid gen_type: {gen_type}. Must be 'training' or 'eval'.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
(5 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte
(2 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
(2 hunks)libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
app/web_ui/src/lib/ui/fancy_select_types.ts (2)
OptionGroup
(1-4)Option
(5-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
🔇 Additional comments (10)
libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (3)
6-9
: LGTM!The imports for the new prompt generation functions are correctly added.
64-77
: Test updates correctly reflect the new API structure.The tests have been properly updated to use the new
gen_type
andguidance
parameters, and the assertions verify that the generated instructions contain the expected text based on these parameters.Also applies to: 79-85, 114-114, 165-176
342-610
: Excellent test coverage for the new prompt generation functions!The new tests comprehensively cover all combinations of
gen_type
andguidance
parameters, including edge cases like empty strings and None values. The tests properly verify prompt content, structure, and consistency.app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (3)
5-7
: Appropriate imports for the refactored component.The imports correctly support the new architecture with external data model and lifecycle management.
15-23
: Good lifecycle management with proper cleanup.The
onDestroy
handler correctly prevents the bind directives from setting undefined values when the component unmounts by disconnecting the local store references.
35-51
: Clean refactoring to pure UI component.The component now properly delegates all state management and logic to the external
SynthDataGuidanceDataModel
, making it a clean UI layer that binds to the model's reactive stores.Also applies to: 53-109
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (2)
19-94
: Well-structured prompt generation with proper guidance handling.The function correctly generates different prompt sections based on whether guidance is provided, maintaining clear structure and helpful examples throughout.
96-159
: Excellent implementation with comprehensive guidance examples.The function provides clear instructions and particularly good examples in the custom guidance section showing how to properly incorporate guidance into generated samples.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)
1-33
: Well-structured data model with proper TypeScript typing.The class provides a clean abstraction for managing synthetic data guidance with appropriate use of Svelte stores for reactivity.
360-658
: Comprehensive and well-documented static templates.The static templates provide excellent coverage of various evaluation types with appropriate warnings about potentially harmful content. Each template properly differentiates guidance for topics, inputs, and outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)
149-192
: TODO comments still present - guidance types remain identical for static templates.The
apply_selected_template
method still has the TODO comment about making each guidance type unique, and static templates continue to set identical values for all three guidance types. While the issue and requirements eval templates now generate unique content per type, the static templates still duplicate the same content.Consider implementing template-specific guidance for static templates:
const static_template = static_templates.find((t) => t.id == template) if (static_template) { - // Set each guidance type with its specific template - this.topic_guidance.set(static_template.topic_template) - this.input_guidance.set(static_template.input_template) - this.output_guidance.set(static_template.output_template) + // Set each guidance type with its specific template + this.topic_guidance.set(static_template.topic_template) + this.input_guidance.set(static_template.input_template) + this.output_guidance.set(static_template.output_template) }The static templates already have separate
topic_template
,input_template
, andoutput_template
fields, so this should provide the desired uniqueness.
109-114
: Add user-facing error handling for eval loading failures.The error handling still only logs to console without providing user feedback when evaluation data fails to load.
Consider adding an error state to inform users:
+public error_message: Writable<string | null> = writable(null) } catch (error) { console.error(error) + this.error_message.set("Failed to load evaluation data. Please try again.") } finally { this.loading.set(false) }This would allow the UI to display meaningful error messages to users.
🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
194-200
: Extend uncensored suggestion logic to eval templates.The TODO comment indicates that uncensored suggestions should also apply to issue and requirements eval templates, but the implementation only checks static templates.
Consider extending the logic to handle eval templates:
public suggest_uncensored(selected_template: string): boolean { - // TODO: also for issue and requirements?? + // Handle eval templates + if (selected_template === "issue_eval_template" || + selected_template === "requirements_eval_template") { + return true // These templates typically involve sensitive content + } + const static_template = static_templates.find( (t) => t.id == selected_template, ) return static_template?.suggest_uncensored ?? false }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)
528-812
: Well-structured static templates with appropriate content warnings.The static templates are comprehensive and well-documented with clear content warnings. The harmful content is appropriately contextualized for evaluation purposes.
The editor's note at line 526 appropriately explains the presence of harmful content and its purpose for building robust evaluation systems.
239-247
: Good practice: Dynamic requirements handling with overall score addition.The requirements template method correctly maps task requirements and dynamically adds the "Overall Score" requirement, which demonstrates good understanding of the eval structure.
The implementation properly handles the case where task requirements may be undefined and adds the necessary overall score requirement.
34-49
: Proper subscription management with cleanup.The constructor sets up template subscription correctly, and the destroy method properly cleans up to prevent memory leaks.
From the past review comments, it appears the parent components should ensure this
destroy()
method is called during component teardown to prevent memory leaks.
…ays missing) which impropves instruction following in cases where input can't capture it. Also actually pass the guidance (was null + TODO) rename samples to inputs in API for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
109-114
: Error handling still needs improvement.The error handling only logs to console without informing users of failures. This was identified in previous reviews but remains unaddressed.
🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
234-454
: Consider refactoring large template methods for better maintainability.The
requirements_eval_template
andissue_eval_template
methods are quite long and complex. While the logic is correct, consider extracting common template building patterns into helper methods to improve readability and maintainability.For example, you could extract common patterns like:
+private buildRequirementsSection(requirements: any[]): string { + let section = "\n## All Requirements\nThe requirements of the eval are listed below...\n" + for (const [index, requirement] of requirements.entries()) { + section += `\n<requirement_${index + 1}>\nName: ${requirement.name}\nDescription: ${requirement.instruction}\n</requirement_${index + 1}>\n` + } + return section +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
app/desktop/studio_server/data_gen_api.py
(7 hunks)app/desktop/studio_server/test_data_gen_api.py
(2 hunks)app/web_ui/src/lib/api_schema.d.ts
(6 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
(9 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
(4 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- app/desktop/studio_server/test_data_gen_api.py
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
- app/desktop/studio_server/data_gen_api.py
- app/web_ui/src/lib/api_schema.d.ts
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (4)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (4)
34-49
: Good lifecycle management implementation.The constructor properly sets up the reactive subscription and the
destroy()
method correctly cleans up resources. This addresses the memory leak concerns from previous reviews.
149-192
: Excellent refactoring that addresses previous TODO comments.The method now properly generates distinct guidance for each type (topics, inputs, outputs) based on the template, eliminating the code duplication and addressing the TODO comments from previous reviews. The logic flow is clear and each template type gets appropriate treatment.
524-577
: Appropriate handling of sensitive content for evaluation purposes.The static templates contain harmful content with a clear editor's note explaining their purpose for building evaluation systems. This is a legitimate approach for testing model safety - you need to generate harmful content to test against it. The warning is comprehensive and the use case is well-justified.
15-33
: Well-structured reactive data model with proper TypeScript practices.The class design effectively uses Svelte stores for reactivity, maintains good separation of concerns, and provides a comprehensive API for managing synthetic data guidance. The implementation properly addresses the complex requirements of dynamic template generation while maintaining type safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)
34-49
: Destroy method properly handles cleanup but ensure it's called.This relates to the past review comment about ensuring destroy() is invoked on component teardown to prevent memory leaks.
76-115
: Error handling needs user-facing feedback.This relates to the past review comment about adding user-facing error handling for eval loading failures instead of only console logging.
149-192
: Address TODO comments and reduce code duplication.This relates to the past review comment about the TODO comments indicating duplicate guidance values and the need to generate unique guidance for each type.
🧹 Nitpick comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)
233-330
: Consider extracting template generation to separate utility functions.The
requirements_eval_template
method is quite lengthy (97 lines) and handles multiple template types. Consider breaking this into smaller, focused functions for better maintainability:private requirements_eval_template( evaluator: Eval, task: Task, task_type: "topics" | "inputs" | "outputs" ): string { + const baseTemplate = this.buildRequirementsBaseTemplate() + const taskSpecificTemplate = this.buildTaskSpecificTemplate(task_type) + const requirementsSection = this.buildRequirementsSection(task.requirements) + + return baseTemplate + taskSpecificTemplate + requirementsSection - // Current 97-line implementation }
332-453
: Consider extracting issue template generation to separate utility functions.Similar to the requirements template, the
issue_eval_template
method is lengthy (121 lines) and could benefit from being broken into smaller, focused functions for better readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
app/desktop/studio_server/data_gen_api.py
(7 hunks)app/desktop/studio_server/test_data_gen_api.py
(3 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
(9 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
(4 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
(7 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_task.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- libs/core/kiln_ai/adapters/data_gen/data_gen_task.py
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
- app/desktop/studio_server/data_gen_api.py
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
🔇 Additional comments (8)
app/desktop/studio_server/test_data_gen_api.py (3)
105-105
: LGTM: API parameter update aligns with refactoring.The addition of
gen_type="eval"
correctly reflects the new required parameter in the API schema.
132-133
: LGTM: Consistent API changes for guidance refactoring.The parameter changes from
num_samples
/human_guidance
togen_type
/guidance
and the endpoint update to/generate_inputs
are consistent with the broader API refactoring.Also applies to: 140-140
316-432
: LGTM: Comprehensive test coverage for guidance generation.The parameterized test provides excellent coverage of different guidance and topic path scenarios, including edge cases like empty guidance strings and combinations of guidance with topic paths. The test properly mocks the guidance wrapping function and verifies both the call patterns and final instruction state.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (4)
9-14
: LGTM: Good refactoring to centralized guidance model.The consolidation from individual props to the
SynthDataGuidanceDataModel
represents better state management and aligns with the broader refactoring effort.
101-118
: LGTM: API integration properly updated.The changes to use
guidance_data.guidance_for_type("inputs")
, the new endpoint/generate_inputs
, and thegen_type
parameter are all consistent with the backend API refactoring.
254-256
: LGTM: Nice UI improvement for topic path display.The styled span with
font-mono text-xs bg-gray-100
provides better visual distinction for the topic path.
269-271
: LGTM: Good integration of centralized guidance component.The
SynthDataGuidance
component integration withguidance_type="inputs"
follows the established pattern for consistent guidance management across the UI.app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
526-810
: LGTM: Comprehensive static template definitions.The static templates provide thorough coverage of different evaluation scenarios (toxicity, bias, maliciousness, jailbreak, factual correctness) with appropriate warnings and uncensored flags. The editor's note appropriately explains the purpose of potentially harmful content for evaluation purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)
234-331
: Consider breaking down this large template generation method.The
requirements_eval_template
method is quite long (97 lines) and handles multiple responsibilities. Consider extracting the template building logic into smaller, focused methods.+ private build_requirements_topic_template(requirements: any[]): string { + let template = `## Top Level Topic Generation +For the top level topics, we want to use the exact requirement name. Here is the list of topics to select from when generating top-level topics. +` + // ... existing topic logic + return template + } + + private build_requirements_input_template(): string { + // ... existing input logic + } + + private build_requirements_output_template(): string { + // ... existing output logic + } private requirements_eval_template( evaluator: Eval, task: Task, task_type: "topics" | "inputs" | "outputs", ): string { const requirements = (task?.requirements || []).map((r) => ({ name: r.name, instruction: r.instruction, })) // Add overall score requirement requirements.push({ name: "Overall Score", instruction: "An overall score from 1 to 5 stars.", }) + switch (task_type) { + case "topics": + return this.build_requirements_topic_template(requirements) + case "inputs": + return this.build_requirements_input_template() + case "outputs": + return this.build_requirements_output_template() + }
333-454
: Similar refactoring opportunity for issue template method.This method also handles multiple template types and could benefit from the same modularization approach suggested for the requirements template method.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
(9 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
(1 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
🔇 Additional comments (6)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)
150-193
: Good resolution of TODO comments from previous reviews.The
apply_selected_template
method now properly generates unique guidance for each type (topics, inputs, outputs) instead of setting identical values. This addresses the code duplication concern raised in previous reviews.
527-811
: Appropriate handling of sensitive content for eval purposes.The static templates correctly include an editor's note explaining that the harmful content is intentional for building safety evaluations. The disclaimer clearly states the purpose and context.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (4)
29-31
: Excellent memory leak prevention.The addition of
onDestroy
to clean up the guidance data model subscriptions properly addresses the concern raised in previous reviews about potential memory leaks.
85-114
: Good async loading flow with proper error handling.The refactored
onMount
properly sequences the task loading before guidance data initialization, and includes appropriate error handling for both operations.
278-278
: Clean integration with the new guidance model.The transition from
human_guidance
string toguidance_data.guidance_for_type("outputs")
maintains the same functionality while leveraging the more structured approach.
534-535
: Consistent usage of guidance data model throughout the UI.The
SynthDataGuidance
component properly receives theguidance_data
instance, maintaining consistency with the new architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)
127-127
: Consider using more type-safe mock objects.Multiple instances of
as unknown as Type
casting bypass TypeScript's type checking. While sometimes necessary for tests, consider using partial types or mock builders for better type safety.For example, instead of:
const mockEval = { id: "eval1", name: "Test Eval", template: "kiln_issue", template_properties: {}, } as unknown as EvalConsider using TypeScript's
Partial
type or a mock builder:const mockEval: Partial<Eval> = { id: "eval1", name: "Test Eval", template: "kiln_issue", template_properties: {}, } // Then cast only when necessary at the point of useAlso applies to: 390-390, 408-408, 417-417, 457-457, 462-462, 486-486, 542-542, 560-560
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
(2 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
(1 hunks)libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
(2 hunks)libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
- libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
- libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
SynthDataGuidanceDataModel
(16-511)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-13)
🔇 Additional comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)
15-643
: Comprehensive test coverage!The test suite provides excellent coverage of the
SynthDataGuidanceDataModel
class, including:
- Initialization and lifecycle management
- Loading with various configurations
- Error handling
- Template application logic
- Reactive store behavior
- Edge cases like empty requirements and missing properties
The tests are well-structured and follow good testing practices with proper setup/teardown.
model.input_guidance.set("existing") | ||
model.output_guidance.set("existing") | ||
|
||
model["apply_selected_template"]("custom") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider refactoring to avoid testing private methods directly.
The test suite extensively tests private methods using bracket notation (e.g., model["apply_selected_template"]
). This is generally considered an anti-pattern because:
- It breaks encapsulation
- Makes tests brittle to implementation changes
- Suggests these methods might need to be part of the public API
Consider either:
- Testing these behaviors through the public interface
- Making these methods protected/public if they represent important contracts
- Extracting them to separate utility functions that can be tested independently
Also applies to: 373-373, 392-393, 420-421, 444-448, 464-468, 488-488, 504-504, 544-544, 562-562, 569-569, 578-578
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around line 365 and other specified lines, the test suite is directly accessing
private methods using bracket notation, which breaks encapsulation and makes
tests brittle. Refactor the tests to avoid calling private methods directly by
either testing the related functionality through the public API, changing these
methods to protected or public if they represent important behaviors, or
extracting the logic into separate utility functions that can be tested
independently.
requirements: [ | ||
{ instruction: "Be helpful and harmless" }, | ||
{ instruction: "Avoid bias" }, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incomplete mock data structure in this test as well.
Same issue as above - the requirements
array needs the name
property.
Apply this diff:
requirements: [
- { instruction: "Be helpful and harmless" },
- { instruction: "Avoid bias" },
+ { name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
+ { name: "Avoid bias", instruction: "Avoid bias" },
],
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
requirements: [ | |
{ instruction: "Be helpful and harmless" }, | |
{ instruction: "Avoid bias" }, | |
], | |
requirements: [ | |
{ name: "Be helpful and harmless", instruction: "Be helpful and harmless" }, | |
{ name: "Avoid bias", instruction: "Avoid bias" }, | |
], |
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around lines 413 to 416, the mock data for the requirements array is missing the
required 'name' property on each object. Update each object in the requirements
array to include the 'name' property alongside the 'instruction' property to
complete the mock data structure as expected by the test.
requirements: [ | ||
{ instruction: "Be helpful and harmless" }, | ||
{ instruction: "Avoid bias" }, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incomplete mock data structure for requirements.
The requirements
array is missing the name
property that is used in the actual implementation. Based on the requirements_eval_template
method in the implementation, requirements should have both name
and instruction
properties.
Apply this diff to fix the mock data:
requirements: [
- { instruction: "Be helpful and harmless" },
- { instruction: "Avoid bias" },
+ { name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
+ { name: "Avoid bias", instruction: "Avoid bias" },
],
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
requirements: [ | |
{ instruction: "Be helpful and harmless" }, | |
{ instruction: "Avoid bias" }, | |
], | |
requirements: [ | |
{ name: "Be helpful and harmless", instruction: "Be helpful and harmless" }, | |
{ name: "Avoid bias", instruction: "Avoid bias" }, | |
], |
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around lines 69 to 72, the mock data for the requirements array is missing the
required name property. Update each object in the requirements array to include
both name and instruction properties to match the actual implementation's
expected structure.
Note: WIP. Need to repeat this for input gen as well, and output gen.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests