Skip to content

Update our topic tree evals and eval gen #409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 8, 2025
Merged

Update our topic tree evals and eval gen #409

merged 15 commits into from
Jul 8, 2025

Conversation

scosman
Copy link
Collaborator

@scosman scosman commented Jul 7, 2025

  • Topic tree prompt more aware of goal (eval v training)
  • Topic tree prompt moved guidance into system prompt
  • UI layer: see we're targeting an kiln_requirements eval, and generate proper guidance

Note: WIP. Need to repeat this for input gen as well, and output gen.

Summary by CodeRabbit

  • New Features

    • Added support for specifying generation type ("training" or "eval") in data category and sample generation workflows.
    • Introduced enhanced and dynamic guidance templates, including new "requirements eval" templates that generate instructions based on task requirements.
    • Enabled passing generation type and customized guidance through the UI components for data generation.
    • Added a new synthetic data guidance management model to improve guidance handling and UI integration.
    • Added a toggleable custom topic mode for manual topic entry during synthetic data generation.
  • Improvements

    • Standardized use of "eval_id" as the URL parameter for evaluation context across the app.
    • Refined synthetic data generation prompts to dynamically adapt based on generation type and user guidance.
    • Simplified URL parameter logic for evaluation pages by removing special cases.
    • Replaced previous manual guidance editing UI with structured guidance components bound to reactive data models.
    • Extended form input types with a new "header_only" option.
  • Bug Fixes

    • Removed obsolete debug log statements for a cleaner user experience.
  • Tests

    • Expanded test coverage to verify prompt generation logic and task initialization with generation type and guidance parameters.
    • Added comprehensive tests for synthetic data guidance management model, including template application and reactive updates.

 - Topic tree prompt more aware of goal (eval v training)
 - Topic tree prompt moved guidance into system prompt
 - UI layer: see we're targeting an kiln_requirements eval, and generate proper guidance

Note: WIP. Need to repeat this for input gen as well, and output gen.
Copy link
Contributor

coderabbitai bot commented Jul 7, 2025

Walkthrough

The changes introduce a new required field, gen_type ("eval" or "training"), throughout the data generation API, backend logic, and frontend components. Prompt generation for topic trees and samples is now dynamic, adapting to the gen_type and any specific guidance provided. Related tests, schemas, and UI flows are updated to support these changes. Additionally, eval/template handling in guidance components is refactored with support for a new requirements eval template.

Changes

File(s) Change Summary
app/desktop/studio_server/data_gen_api.py Replaced human_guidance with required gen_type and optional guidance fields; updated endpoint handlers to pass these.
app/desktop/studio_server/test_data_gen_api.py Updated tests to include gen_type and replace human_guidance with guidance in input data; updated endpoint path.
app/web_ui/src/lib/api_schema.d.ts Added `gen_type: "eval"
app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/add_data/+page.svelte Changed query parameter from "issue_eval_id" to "eval_id" for URL navigation.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte Removed special case for "kiln_issue" template; always set "eval_id" parameter uniformly.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte Added reactive gen_type variable based on URL "reason" parameter; replaced human_guidance string with guidance_data model; removed related UI; passed gen_type and guidance_data to child components.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte Added exported guidance_data prop; included gen_type and guidance in API payload and recursive rendering; replaced human_guidance. Added custom topic mode UI.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte Added exported guidance_data prop; included gen_type and renamed guidance in API request payload; added <SynthDataGuidance> component.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte Refactored guidance management to use external SynthDataGuidanceDataModel; removed internal state and eval fetching logic; added support for new "requirements_eval_template".
app/web_ui/src/routes/(fullscreen)/setup/(setup)/subscribe/+page.svelte Removed a debug log statement.
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py Removed static prompt constants; added functions generating prompts dynamically based on gen_type and optional guidance.
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py Removed human_guidance from inputs; constructors now require gen_type and guidance; prompts generated dynamically.
libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py Updated tests to remove human_guidance; added tests for new prompt generation functions and task initialization parameters.
app/web_ui/src/lib/utils/form_element.svelte Extended inputType export union type with new "header_only" option.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts Added new SynthDataGuidanceDataModel class managing guidance templates, loading eval data, and reactive state for UI binding.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte Added text-center CSS class to heading divs for centered text display.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant WebUI
    participant API
    participant DataGenTask

    User->>WebUI: Navigate to data generation page (with reason param)
    WebUI->>WebUI: Set gen_type = "eval" or "training" based on URL
    WebUI->>API: POST /generate_categories (includes gen_type, guidance)
    API->>DataGenTask: Instantiate DataGenCategoriesTask(gen_type, guidance)
    DataGenTask->>DataGenTask: Generate prompt with gen_type and guidance
    DataGenTask-->>API: Return generated categories
    API-->>WebUI: Respond with categories
    WebUI-->>User: Display generated categories
Loading

Possibly related PRs

  • Add guidance templates in app instead of docs! Automatically select t… #404: Refactors synthetic data guidance handling by replacing the single human_guidance string with a structured guidance_data model and adds explicit gen_type parameters; introduces built-in guidance templates and automatic template selection for evaluations, closely related to guidance template management and application.

Poem

Oh, what a change for the data tree,
Now "eval" or "training" is clear as can be!
Prompts grow dynamic, guidance anew,
From backend to frontend, the purpose shines through.
Rabbits rejoice with a hop and a cheer,
For clarity and context are finally here! 🐇🌱


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
app/desktop/studio_server/test_data_gen_api.py (1)

94-121: Consider adding test coverage for both gen_type values.

The current test only covers gen_type="eval", but the PR objectives mention distinguishing between "eval" and "training" contexts. Consider adding a separate test case for gen_type="training" to ensure comprehensive coverage of both generation types.

def test_generate_categories_success_training(
    mock_task_from_id,
    mock_langchain_adapter,
    client,
    mock_task_run,
):
    # Arrange
    input_data = DataGenCategoriesApiInput(
        node_path=["parent", "child"],
        num_subtopics=4,
        human_guidance="Generate tech categories",
        gen_type="training",
        model_name="gpt-4",
        provider="openai",
    )
    
    # Act & Assert (same as existing test)
    # ...
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)

84-84: Remove debug print statement before production.

The debug print statement should be removed as it could expose sensitive information in production logs and adds unnecessary output.

-        print("GENERATE TOPIC TREE PROMPT", instruction)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0050406 and d588766.

📒 Files selected for processing (12)
  • app/desktop/studio_server/data_gen_api.py (3 hunks)
  • app/desktop/studio_server/test_data_gen_api.py (1 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (1 hunks)
  • app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/add_data/+page.svelte (1 hunks)
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (1 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (3 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (3 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (7 hunks)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/subscribe/+page.svelte (0 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (2 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (3 hunks)
  • libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (5 hunks)
💤 Files with no reviewable changes (1)
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/subscribe/+page.svelte
🧰 Additional context used
🧬 Code Graph Analysis (2)
app/desktop/studio_server/data_gen_api.py (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
  • DataGenCategoriesTask (68-97)
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)
  • generate_topic_tree_prompt (9-85)
🔇 Additional comments (27)
app/desktop/studio_server/test_data_gen_api.py (1)

105-105: LGTM! Test updated to match new API contract.

The addition of gen_type="eval" correctly aligns with the new required field in the API.

app/web_ui/src/routes/(app)/dataset/[project_id]/[task_id]/add_data/+page.svelte (1)

83-84: LGTM! Parameter standardization improves consistency.

The change from issue_eval_id to eval_id aligns with the broader effort to standardize eval parameter handling across the application.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/[eval_id]/+page.svelte (1)

340-340: LGTM – Composite eval_id Parsing Verified

  • In synth_data_guidance.svelte, full_eval_id.split("::") correctly yields [project_id, task_id, eval_id].
  • Query-param colons are percent-encoded via URLSearchParams and decoded by searchParams.get(), so the separator survives round-trip.
  • Other pages (e.g. add_data) forward the composite ID without parsing, relying on route segments for context.

No further changes needed here.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (3)

18-18: LGTM: Proper type definition for the new export property.

The gen_type property is correctly typed and follows TypeScript best practices.


145-145: LGTM: Correctly includes gen_type in API payload.

The gen_type parameter is properly passed to the backend API call, aligning with the API schema changes.


323-323: LGTM: Proper prop propagation in recursive component.

The gen_type property is correctly passed down to child component instances, ensuring consistent generation context throughout the component tree.

app/desktop/studio_server/data_gen_api.py (3)

1-1: LGTM: Proper import for type annotation.

The Literal import is correctly added to support the new gen_type field typing.


28-30: LGTM: Well-defined required field with proper typing.

The gen_type field is correctly defined as required with proper Literal typing and descriptive documentation.


93-95: LGTM: Correct task instantiation with new parameters.

The DataGenCategoriesTask is properly instantiated with the required gen_type and optional specific_guidance parameters, aligning with the task constructor changes.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (3)

39-41: LGTM: Proper reactive derivation from URL parameters.

The gen_type is correctly derived from the URL search parameter "reason", with a sensible default of "training" and proper reactive updates when the URL changes.


395-395: LGTM: Correct binding to child component.

The gen_type is properly bound to the GeneratedDataNode component, ensuring the generation context is passed down correctly.


587-587: LGTM: Consistent binding pattern.

The task binding to SynthDataGuidance component follows the same pattern as other bindings in the component.

libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (4)

6-6: LGTM: Proper import for new functionality.

The import of generate_topic_tree_prompt is correctly added to test the new prompt generation functionality.


63-63: LGTM: Updated task constructors reflect API changes.

The task constructors are correctly updated to include the required gen_type parameter and optional specific_guidance, aligning with the new API.

Also applies to: 78-78, 111-111


72-74: LGTM: Appropriate assertions for dynamic instruction content.

The assertions correctly validate that the generated instructions contain the expected content based on the gen_type and specific_guidance parameters.

Also applies to: 80-82


339-470: LGTM: Comprehensive test coverage for prompt generation.

The test suite provides excellent coverage of the generate_topic_tree_prompt function, including:

  • Different gen_type values ("training", "eval")
  • With and without specific guidance
  • Edge cases like empty guidance
  • Structural validation of prompt content
  • Consistency checks between variants

This thorough testing ensures the prompt generation behaves correctly across all scenarios.

libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (3)

2-2: LGTM: Proper imports for new functionality.

The imports for Literal typing and generate_topic_tree_prompt function are correctly added.

Also applies to: 11-11


75-77: LGTM: Well-designed constructor signature.

The constructor properly accepts gen_type as a required parameter and specific_guidance as optional, with appropriate type annotations.


81-83: LGTM: Correct implementation of dynamic prompt generation.

The instruction is properly generated using the generate_topic_tree_prompt function with the provided parameters, replacing the previous static approach.

Also applies to: 90-90

libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (3)

6-11: LGTM! Well-typed function signature.

Good use of Literal type to constrain the gen_type parameter to valid values.


16-21: Consider adding defensive validation for gen_type.

While the Literal type annotation should catch invalid values during type checking, consider adding runtime validation or an else clause to handle unexpected values defensively.


58-71: LGTM! Clear conditional guidance handling.

The logic properly handles both specific and default guidance cases with clear formatting.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (5)

33-41: LGTM! Improved naming conventions.

The parameter and function names are now more generic and descriptive, better supporting multiple eval types.


62-66: LGTM! Clear template type mapping.

The conditional logic properly maps the eval template types to their corresponding UI templates.


223-244: LGTM! Well-structured dynamic option building.

The function properly builds eval-specific options based on the template type and only adds the group when options exist.


295-320: LGTM! Well-structured requirements template generation.

The function creates clear, structured guidance with proper handling of requirements and systematic XML-like tagging.


322-334: LGTM! Proper handling of async loading.

The reactive function correctly handles the async loading order between task and evaluator, ensuring the requirements template is updated when both are available.

Comment on lines 1285 to 1290
/**
* Gen Type
* @description The type of task to generate topics for
* @enum {string}
*/
gen_type: "eval" | "training";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Well-structured addition to support context-aware topic generation.

The new gen_type field properly distinguishes between evaluation and training contexts with clear documentation and appropriate type constraints. The string literal union type ensures type safety and prevents invalid values.

Note that this is a breaking change since it adds a required field to the existing interface. Ensure all frontend components using DataGenCategoriesApiInput have been updated to provide this field.


🏁 Script executed:

#!/bin/bash
# Verify that all usages of DataGenCategoriesApiInput have been updated to include gen_type
ast-grep --pattern 'DataGenCategoriesApiInput'

Length of output: 428


Update backend model and tests for new required gen_type field

Since gen_type was added as a required field in the TypeScript API schema, the corresponding Python Pydantic model and its tests must be updated to avoid runtime errors:

• app/desktop/studio_server/data_gen_api.py
– In the DataGenCategoriesApiInput(BaseModel) class definition, add a gen_type: Literal["eval", "training"] field with matching docstring.
• app/desktop/studio_server/test_data_gen_api.py
– Update the input_data = DataGenCategoriesApiInput(...) call (around line 101) to include a valid gen_type value.
– Add test cases covering both "eval" and "training" scenarios if not already present.

🤖 Prompt for AI Agents
In app/desktop/studio_server/data_gen_api.py, update the
DataGenCategoriesApiInput class to include a new required field gen_type with
type Literal["eval", "training"] and add a matching docstring. Then, in
app/desktop/studio_server/test_data_gen_api.py around line 101, modify the
input_data initialization to include a valid gen_type value and add test cases
to cover both "eval" and "training" values to ensure proper validation and avoid
runtime errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (1)

32-32: Fix prop naming inconsistency between component interface and API.

The component still accepts human_guidance as a prop (line 32) but sends it as guidance to the API (line 111). This inconsistency suggests incomplete refactoring.

Either rename the prop to match the API:

-export let human_guidance: string | null = null
+export let guidance: string | null = null

And update line 111:

-guidance: human_guidance ? human_guidance : null, // clear empty string
+guidance: guidance ? guidance : null, // clear empty string

Or keep the prop name and update the API field. The first option seems more aligned with the PR's goal of replacing human_guidance with guidance.

Also applies to: 111-111

🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)

9-17: Add defensive else clause for robustness.

While the Literal type hint should prevent invalid values, consider adding an else clause for defensive programming:

 def generate_goal_description(gen_type: Literal["training", "eval"]) -> str:
     """
     Generate a goal description for the given generation type.
     """
     if gen_type == "training":
         return "I want to train a large language model and you should help me generate training data for it."
     elif gen_type == "eval":
         return "I want to evaluate a large language model and you should help me generate eval data for it."
+    else:
+        raise ValueError(f"Invalid gen_type: {gen_type}. Must be 'training' or 'eval'.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d588766 and 384596c.

📒 Files selected for processing (8)
  • app/desktop/studio_server/data_gen_api.py (7 hunks)
  • app/desktop/studio_server/test_data_gen_api.py (2 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (3 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (2 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (4 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (3 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (3 hunks)
  • libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/desktop/studio_server/test_data_gen_api.py
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/desktop/studio_server/data_gen_api.py
  • app/web_ui/src/lib/api_schema.d.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-13)
🔇 Additional comments (2)
libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (1)

342-610: Excellent test coverage for the new prompt generation functions!

The comprehensive tests cover all scenarios including:

  • Both generation types ("training" and "eval")
  • With and without guidance
  • Empty and None guidance cases
  • Validation of prompt structure and content
  • Consistency checks between prompt types
libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (1)

195-210: Ignore removal suggestion – wrap_task_with_guidance is in use

The wrap_task_with_guidance function is imported and invoked in the DataGen API, so it should not be removed:

Location of usage
app/desktop/studio_server/data_gen_api.py

  • Imported alongside DataGenSampleTask
  • Called when sample.guidance is provided:
    if sample.guidance and sample.guidance.strip():
        task.instruction = wrap_task_with_guidance(
            task.instruction, sample.guidance
        )

The function is actively used to wrap task instructions with human guidance. Please disregard the suggestion to remove it.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (1)

365-377: Consider adding input validation for custom topics.

The custom topic input accepts comma-separated values but doesn't validate the format. Consider adding validation to handle edge cases like trailing commas or empty values between commas.

You could enhance the add_custom_topics function:

 function add_custom_topics() {
   if (!custom_topics_string) {
     return
   }
-  const topics = custom_topics_string.split(",").map((t) => t.trim())
+  const topics = custom_topics_string
+    .split(",")
+    .map((t) => t.trim())
+    .filter((t) => t.length > 0)
   add_subtopics(topics)

Also applies to: 117-125

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 384596c and 3b71c39.

📒 Files selected for processing (6)
  • app/web_ui/src/lib/utils/form_element.svelte (1 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (9 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (4 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (7 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (2 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
🔇 Additional comments (7)
app/web_ui/src/lib/utils/form_element.svelte (1)

11-11: LGTM! Clean addition of header-only input type.

The addition of "header_only" to the input type union is well-implemented. When this type is used, the component will render only the label and description sections (lines 90-128) without any input element, which is the expected behavior for a header-only display.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (3)

18-19: Good architectural improvement with centralized data model.

The refactoring from individual props to a single guidance_data object improves encapsulation and reduces prop drilling throughout the component hierarchy.


16-17: Clean implementation of dual-mode topic entry.

The custom topic mode toggle provides flexibility while defaulting to the normal generation mode when reopening the modal.

Also applies to: 67-68


141-142: Proper handling of guidance data with null safety.

Good practice converting empty strings to null to maintain clean API payloads.

Also applies to: 151-152

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (2)

1-26: Clean refactoring to remove local state management.

The component now properly delegates all state management to the centralized data model, which improves consistency across the application.


89-99: Good conditional warning display implementation.

The warning system properly uses the data model's custom_warning method to display context-appropriate messages.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)

363-493: Excellent template definitions with comprehensive guidance.

The static templates are well-structured with clear instructions for each evaluation type. The suggest_uncensored flag is appropriately set for sensitive content generation.

…cs, inputs and outputs are not longer the same. This lets us be better at giving examples.

Improved our base templates (topic gen, sample gen) as well as specific use case (toxicity, bias, etc).

Still WIP, need to do issue/requirements templates next.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)

34-49: Ensure destroy() is called in parent components to prevent memory leaks.

The destroy method properly cleans up subscriptions, but it must be called when the parent component unmounts.

#!/bin/bash
# Description: Verify that SynthDataGuidanceDataModel.destroy() is called in parent components

# Search for instantiation of SynthDataGuidanceDataModel
echo "=== Searching for SynthDataGuidanceDataModel instantiation ==="
rg -A 20 "new SynthDataGuidanceDataModel"

# Search for destroy() calls on guidance_data
echo -e "\n=== Searching for destroy() calls ==="
rg "guidance_data.*destroy|guidanceModel.*destroy" --type svelte --type ts

# Search for onDestroy usage in files that use SynthDataGuidanceDataModel
echo -e "\n=== Searching for onDestroy in parent components ==="
fd -e svelte -x grep -l "SynthDataGuidanceDataModel" {} \; | xargs grep -n "onDestroy"

76-115: Add user-facing error handling for eval loading failures.

Currently, errors are only logged to console. Consider adding an error store to inform users when eval loading fails:

 export class SynthDataGuidanceDataModel {
   // ... existing properties ...
+  public loadError: Writable<string | null> = writable(null)
   
   // In load_eval method:
   } catch (error) {
-    // TODO
     console.error(error)
+    this.loadError.set("Failed to load evaluation data. Please try again.")
   } finally {
     this.loading.set(false)
   }

Then display the error in the UI component when $loadError is not null.


149-188: Address TODO comments for unique guidance per type.

The method currently sets identical guidance for all three types with multiple TODO comments. Consider implementing type-specific guidance:

 private apply_selected_template(template: string) {
   if (template == "custom") {
-    // TODO make each unique
     this.topic_guidance.set(null)
     this.input_guidance.set(null)
     this.output_guidance.set(null)
   }
 
   if (template == "issue_eval_template" && this.evaluator) {
-    // TODO make each unique
-    const issue_eval_template = this.issue_eval_template(this.evaluator)
-    this.topic_guidance.set(issue_eval_template)
-    this.input_guidance.set(issue_eval_template)
-    this.output_guidance.set(issue_eval_template)
+    this.topic_guidance.set(this.issue_eval_template_topics(this.evaluator))
+    this.input_guidance.set(this.issue_eval_template_inputs(this.evaluator))
+    this.output_guidance.set(this.issue_eval_template_outputs(this.evaluator))
   }
🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (1)

9-17: Add explicit error handling for invalid gen_type.

While the Literal type hint helps during development, consider adding runtime validation:

 def generate_goal_description(gen_type: Literal["training", "eval"]) -> str:
     """
     Generate a goal description for the given generation type.
     """
     if gen_type == "training":
         return "I want to train a large language model and you should help me generate training data for it."
     elif gen_type == "eval":
         return "I want to evaluate a large language model and you should help me generate eval data for it."
+    else:
+        raise ValueError(f"Invalid gen_type: {gen_type}. Must be 'training' or 'eval'.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b71c39 and 0e6a965.

📒 Files selected for processing (5)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (5 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (2 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (2 hunks)
  • libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
app/web_ui/src/lib/ui/fancy_select_types.ts (2)
  • OptionGroup (1-4)
  • Option (5-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
🔇 Additional comments (10)
libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (3)

6-9: LGTM!

The imports for the new prompt generation functions are correctly added.


64-77: Test updates correctly reflect the new API structure.

The tests have been properly updated to use the new gen_type and guidance parameters, and the assertions verify that the generated instructions contain the expected text based on these parameters.

Also applies to: 79-85, 114-114, 165-176


342-610: Excellent test coverage for the new prompt generation functions!

The new tests comprehensively cover all combinations of gen_type and guidance parameters, including edge cases like empty strings and None values. The tests properly verify prompt content, structure, and consistency.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance.svelte (3)

5-7: Appropriate imports for the refactored component.

The imports correctly support the new architecture with external data model and lifecycle management.


15-23: Good lifecycle management with proper cleanup.

The onDestroy handler correctly prevents the bind directives from setting undefined values when the component unmounts by disconnecting the local store references.


35-51: Clean refactoring to pure UI component.

The component now properly delegates all state management and logic to the external SynthDataGuidanceDataModel, making it a clean UI layer that binds to the model's reactive stores.

Also applies to: 53-109

libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (2)

19-94: Well-structured prompt generation with proper guidance handling.

The function correctly generates different prompt sections based on whether guidance is provided, maintaining clear structure and helpful examples throughout.


96-159: Excellent implementation with comprehensive guidance examples.

The function provides clear instructions and particularly good examples in the custom guidance section showing how to properly incorporate guidance into generated samples.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)

1-33: Well-structured data model with proper TypeScript typing.

The class provides a clean abstraction for managing synthetic data guidance with appropriate use of Svelte stores for reactivity.


360-658: Comprehensive and well-documented static templates.

The static templates provide excellent coverage of various evaluation types with appropriate warnings about potentially harmful content. Each template properly differentiates guidance for topics, inputs, and outputs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)

149-192: TODO comments still present - guidance types remain identical for static templates.

The apply_selected_template method still has the TODO comment about making each guidance type unique, and static templates continue to set identical values for all three guidance types. While the issue and requirements eval templates now generate unique content per type, the static templates still duplicate the same content.

Consider implementing template-specific guidance for static templates:

 const static_template = static_templates.find((t) => t.id == template)
 if (static_template) {
-  // Set each guidance type with its specific template
-  this.topic_guidance.set(static_template.topic_template)
-  this.input_guidance.set(static_template.input_template)
-  this.output_guidance.set(static_template.output_template)
+  // Set each guidance type with its specific template
+  this.topic_guidance.set(static_template.topic_template)
+  this.input_guidance.set(static_template.input_template)
+  this.output_guidance.set(static_template.output_template)
 }

The static templates already have separate topic_template, input_template, and output_template fields, so this should provide the desired uniqueness.


109-114: Add user-facing error handling for eval loading failures.

The error handling still only logs to console without providing user feedback when evaluation data fails to load.

Consider adding an error state to inform users:

+public error_message: Writable<string | null> = writable(null)

 } catch (error) {
   console.error(error)
+  this.error_message.set("Failed to load evaluation data. Please try again.")
 } finally {
   this.loading.set(false)
 }

This would allow the UI to display meaningful error messages to users.

🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)

194-200: Extend uncensored suggestion logic to eval templates.

The TODO comment indicates that uncensored suggestions should also apply to issue and requirements eval templates, but the implementation only checks static templates.

Consider extending the logic to handle eval templates:

 public suggest_uncensored(selected_template: string): boolean {
-  // TODO: also for issue and requirements??
+  // Handle eval templates
+  if (selected_template === "issue_eval_template" || 
+      selected_template === "requirements_eval_template") {
+    return true // These templates typically involve sensitive content
+  }
+  
   const static_template = static_templates.find(
     (t) => t.id == selected_template,
   )
   return static_template?.suggest_uncensored ?? false
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e6a965 and e881698.

📒 Files selected for processing (1)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)

528-812: Well-structured static templates with appropriate content warnings.

The static templates are comprehensive and well-documented with clear content warnings. The harmful content is appropriately contextualized for evaluation purposes.

The editor's note at line 526 appropriately explains the presence of harmful content and its purpose for building robust evaluation systems.


239-247: Good practice: Dynamic requirements handling with overall score addition.

The requirements template method correctly maps task requirements and dynamically adds the "Overall Score" requirement, which demonstrates good understanding of the eval structure.

The implementation properly handles the case where task requirements may be undefined and adds the necessary overall score requirement.


34-49: Proper subscription management with cleanup.

The constructor sets up template subscription correctly, and the destroy method properly cleans up to prevent memory leaks.

From the past review comments, it appears the parent components should ensure this destroy() method is called during component teardown to prevent memory leaks.

…ays missing) which impropves instruction following in cases where input can't capture it. Also actually pass the guidance (was null + TODO)

rename samples to inputs in API for clarity
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)

109-114: Error handling still needs improvement.

The error handling only logs to console without informing users of failures. This was identified in previous reviews but remains unaddressed.

🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)

234-454: Consider refactoring large template methods for better maintainability.

The requirements_eval_template and issue_eval_template methods are quite long and complex. While the logic is correct, consider extracting common template building patterns into helper methods to improve readability and maintainability.

For example, you could extract common patterns like:

+private buildRequirementsSection(requirements: any[]): string {
+  let section = "\n## All Requirements\nThe requirements of the eval are listed below...\n"
+  for (const [index, requirement] of requirements.entries()) {
+    section += `\n<requirement_${index + 1}>\nName: ${requirement.name}\nDescription: ${requirement.instruction}\n</requirement_${index + 1}>\n`
+  }
+  return section
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e881698 and 5da14a6.

📒 Files selected for processing (6)
  • app/desktop/studio_server/data_gen_api.py (7 hunks)
  • app/desktop/studio_server/test_data_gen_api.py (2 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (6 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (9 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (4 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • app/desktop/studio_server/test_data_gen_api.py
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte
  • app/desktop/studio_server/data_gen_api.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (4)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (4)

34-49: Good lifecycle management implementation.

The constructor properly sets up the reactive subscription and the destroy() method correctly cleans up resources. This addresses the memory leak concerns from previous reviews.


149-192: Excellent refactoring that addresses previous TODO comments.

The method now properly generates distinct guidance for each type (topics, inputs, outputs) based on the template, eliminating the code duplication and addressing the TODO comments from previous reviews. The logic flow is clear and each template type gets appropriate treatment.


524-577: Appropriate handling of sensitive content for evaluation purposes.

The static templates contain harmful content with a clear editor's note explaining their purpose for building evaluation systems. This is a legitimate approach for testing model safety - you need to generate harmful content to test against it. The warning is comprehensive and the use case is well-justified.


15-33: Well-structured reactive data model with proper TypeScript practices.

The class design effectively uses Svelte stores for reactivity, maintains good separation of concerns, and provides a comprehensive API for managing synthetic data guidance. The implementation properly addresses the complex requirements of dynamic template generation while maintaining type safety.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (3)

34-49: Destroy method properly handles cleanup but ensure it's called.

This relates to the past review comment about ensuring destroy() is invoked on component teardown to prevent memory leaks.


76-115: Error handling needs user-facing feedback.

This relates to the past review comment about adding user-facing error handling for eval loading failures instead of only console logging.


149-192: Address TODO comments and reduce code duplication.

This relates to the past review comment about the TODO comments indicating duplicate guidance values and the need to generate unique guidance for each type.

🧹 Nitpick comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)

233-330: Consider extracting template generation to separate utility functions.

The requirements_eval_template method is quite lengthy (97 lines) and handles multiple template types. Consider breaking this into smaller, focused functions for better maintainability:

private requirements_eval_template(
  evaluator: Eval,
  task: Task, 
  task_type: "topics" | "inputs" | "outputs"
): string {
+  const baseTemplate = this.buildRequirementsBaseTemplate()
+  const taskSpecificTemplate = this.buildTaskSpecificTemplate(task_type)
+  const requirementsSection = this.buildRequirementsSection(task.requirements)
+  
+  return baseTemplate + taskSpecificTemplate + requirementsSection
-  // Current 97-line implementation
}

332-453: Consider extracting issue template generation to separate utility functions.

Similar to the requirements template, the issue_eval_template method is lengthy (121 lines) and could benefit from being broken into smaller, focused functions for better readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5da14a6 and 4e8d127.

📒 Files selected for processing (7)
  • app/desktop/studio_server/data_gen_api.py (7 hunks)
  • app/desktop/studio_server/test_data_gen_api.py (3 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (9 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (4 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (7 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_task.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_task.py
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/desktop/studio_server/data_gen_api.py
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
🔇 Additional comments (8)
app/desktop/studio_server/test_data_gen_api.py (3)

105-105: LGTM: API parameter update aligns with refactoring.

The addition of gen_type="eval" correctly reflects the new required parameter in the API schema.


132-133: LGTM: Consistent API changes for guidance refactoring.

The parameter changes from num_samples/human_guidance to gen_type/guidance and the endpoint update to /generate_inputs are consistent with the broader API refactoring.

Also applies to: 140-140


316-432: LGTM: Comprehensive test coverage for guidance generation.

The parameterized test provides excellent coverage of different guidance and topic path scenarios, including edge cases like empty guidance strings and combinations of guidance with topic paths. The test properly mocks the guidance wrapping function and verifies both the call patterns and final instruction state.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte (4)

9-14: LGTM: Good refactoring to centralized guidance model.

The consolidation from individual props to the SynthDataGuidanceDataModel represents better state management and aligns with the broader refactoring effort.


101-118: LGTM: API integration properly updated.

The changes to use guidance_data.guidance_for_type("inputs"), the new endpoint /generate_inputs, and the gen_type parameter are all consistent with the backend API refactoring.


254-256: LGTM: Nice UI improvement for topic path display.

The styled span with font-mono text-xs bg-gray-100 provides better visual distinction for the topic path.


269-271: LGTM: Good integration of centralized guidance component.

The SynthDataGuidance component integration with guidance_type="inputs" follows the established pattern for consistent guidance management across the UI.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)

526-810: LGTM: Comprehensive static template definitions.

The static templates provide thorough coverage of different evaluation scenarios (toxicity, bias, maliciousness, jailbreak, factual correctness) with appropriate warnings and uncensored flags. The editor's note appropriately explains the purpose of potentially harmful content for evaluation purposes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)

234-331: Consider breaking down this large template generation method.

The requirements_eval_template method is quite long (97 lines) and handles multiple responsibilities. Consider extracting the template building logic into smaller, focused methods.

+  private build_requirements_topic_template(requirements: any[]): string {
+    let template = `## Top Level Topic Generation
+For the top level topics, we want to use the exact requirement name. Here is the list of topics to select from when generating top-level topics. 
+`
+    // ... existing topic logic
+    return template
+  }
+
+  private build_requirements_input_template(): string {
+    // ... existing input logic
+  }
+
+  private build_requirements_output_template(): string {
+    // ... existing output logic
+  }

   private requirements_eval_template(
     evaluator: Eval,
     task: Task,
     task_type: "topics" | "inputs" | "outputs",
   ): string {
     const requirements = (task?.requirements || []).map((r) => ({
       name: r.name,
       instruction: r.instruction,
     }))
     // Add overall score requirement
     requirements.push({
       name: "Overall Score",
       instruction: "An overall score from 1 to 5 stars.",
     })

+    switch (task_type) {
+      case "topics":
+        return this.build_requirements_topic_template(requirements)
+      case "inputs":
+        return this.build_requirements_input_template()
+      case "outputs":
+        return this.build_requirements_output_template()
+    }

333-454: Similar refactoring opportunity for issue template method.

This method also handles multiple template types and could benefit from the same modularization approach suggested for the requirements template method.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e8d127 and 47ffd51.

📒 Files selected for processing (3)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (9 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte (1 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
🔇 Additional comments (6)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (2)

150-193: Good resolution of TODO comments from previous reviews.

The apply_selected_template method now properly generates unique guidance for each type (topics, inputs, outputs) instead of setting identical values. This addresses the code duplication concern raised in previous reviews.


527-811: Appropriate handling of sensitive content for eval purposes.

The static templates correctly include an editor's note explaining that the harmful content is intentional for building safety evaluations. The disclaimer clearly states the purpose and context.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (4)

29-31: Excellent memory leak prevention.

The addition of onDestroy to clean up the guidance data model subscriptions properly addresses the concern raised in previous reviews about potential memory leaks.


85-114: Good async loading flow with proper error handling.

The refactored onMount properly sequences the task loading before guidance data initialization, and includes appropriate error handling for both operations.


278-278: Clean integration with the new guidance model.

The transition from human_guidance string to guidance_data.guidance_for_type("outputs") maintains the same functionality while leveraging the more structured approach.


534-535: Consistent usage of guidance data model throughout the UI.

The SynthDataGuidance component properly receives the guidance_data instance, maintaining consistency with the new architecture.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)

127-127: Consider using more type-safe mock objects.

Multiple instances of as unknown as Type casting bypass TypeScript's type checking. While sometimes necessary for tests, consider using partial types or mock builders for better type safety.

For example, instead of:

const mockEval = {
  id: "eval1",
  name: "Test Eval",
  template: "kiln_issue",
  template_properties: {},
} as unknown as Eval

Consider using TypeScript's Partial type or a mock builder:

const mockEval: Partial<Eval> = {
  id: "eval1",
  name: "Test Eval",
  template: "kiln_issue",
  template_properties: {},
}
// Then cast only when necessary at the point of use

Also applies to: 390-390, 408-408, 417-417, 457-457, 462-462, 486-486, 542-542, 560-560

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47ffd51 and 099a302.

📒 Files selected for processing (4)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte (2 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1 hunks)
  • libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py (2 hunks)
  • libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/data_gen_intro.svelte
  • libs/core/kiln_ai/adapters/data_gen/test_data_gen_task.py
  • libs/core/kiln_ai/adapters/data_gen/data_gen_prompts.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.ts (1)
  • SynthDataGuidanceDataModel (16-511)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-13)
🔇 Additional comments (1)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts (1)

15-643: Comprehensive test coverage!

The test suite provides excellent coverage of the SynthDataGuidanceDataModel class, including:

  • Initialization and lifecycle management
  • Loading with various configurations
  • Error handling
  • Template application logic
  • Reactive store behavior
  • Edge cases like empty requirements and missing properties

The tests are well-structured and follow good testing practices with proper setup/teardown.

model.input_guidance.set("existing")
model.output_guidance.set("existing")

model["apply_selected_template"]("custom")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring to avoid testing private methods directly.

The test suite extensively tests private methods using bracket notation (e.g., model["apply_selected_template"]). This is generally considered an anti-pattern because:

  1. It breaks encapsulation
  2. Makes tests brittle to implementation changes
  3. Suggests these methods might need to be part of the public API

Consider either:

  • Testing these behaviors through the public interface
  • Making these methods protected/public if they represent important contracts
  • Extracting them to separate utility functions that can be tested independently

Also applies to: 373-373, 392-393, 420-421, 444-448, 464-468, 488-488, 504-504, 544-544, 562-562, 569-569, 578-578

🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around line 365 and other specified lines, the test suite is directly accessing
private methods using bracket notation, which breaks encapsulation and makes
tests brittle. Refactor the tests to avoid calling private methods directly by
either testing the related functionality through the public API, changing these
methods to protected or public if they represent important behaviors, or
extracting the logic into separate utility functions that can be tested
independently.

Comment on lines +413 to +416
requirements: [
{ instruction: "Be helpful and harmless" },
{ instruction: "Avoid bias" },
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incomplete mock data structure in this test as well.

Same issue as above - the requirements array needs the name property.

Apply this diff:

        requirements: [
-          { instruction: "Be helpful and harmless" },
-          { instruction: "Avoid bias" },
+          { name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
+          { name: "Avoid bias", instruction: "Avoid bias" },
        ],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
requirements: [
{ instruction: "Be helpful and harmless" },
{ instruction: "Avoid bias" },
],
requirements: [
{ name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
{ name: "Avoid bias", instruction: "Avoid bias" },
],
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around lines 413 to 416, the mock data for the requirements array is missing the
required 'name' property on each object. Update each object in the requirements
array to include the 'name' property alongside the 'instruction' property to
complete the mock data structure as expected by the test.

Comment on lines +69 to +72
requirements: [
{ instruction: "Be helpful and harmless" },
{ instruction: "Avoid bias" },
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incomplete mock data structure for requirements.

The requirements array is missing the name property that is used in the actual implementation. Based on the requirements_eval_template method in the implementation, requirements should have both name and instruction properties.

Apply this diff to fix the mock data:

      requirements: [
-        { instruction: "Be helpful and harmless" },
-        { instruction: "Avoid bias" },
+        { name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
+        { name: "Avoid bias", instruction: "Avoid bias" },
      ],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
requirements: [
{ instruction: "Be helpful and harmless" },
{ instruction: "Avoid bias" },
],
requirements: [
{ name: "Be helpful and harmless", instruction: "Be helpful and harmless" },
{ name: "Avoid bias", instruction: "Avoid bias" },
],
🤖 Prompt for AI Agents
In
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth_data_guidance_datamodel.test.ts
around lines 69 to 72, the mock data for the requirements array is missing the
required name property. Update each object in the requirements array to include
both name and instruction properties to match the actual implementation's
expected structure.

@scosman scosman merged commit dadde68 into main Jul 8, 2025
14 checks passed
@scosman scosman deleted the guidance_templates branch July 8, 2025 03:01
@scosman scosman restored the guidance_templates branch July 8, 2025 03:22
@coderabbitai coderabbitai bot mentioned this pull request Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant