feat: implement unified search_events tool with natural language queries #375

dcramer · 2025-07-11T05:46:48Z

Summary

This PR implements a unified search_events tool that replaces the separate find_errors and find_transactions tools, providing a more flexible and powerful way to search Sentry data using natural language queries. Additionally, this PR modernizes the entire evaluation system to use vitest-evals 0.4.0 with tool-call focused testing.

Key Changes

New `search_events` Tool

Natural Language Processing: Uses OpenAI GPT-4o to translate user queries into Sentry API calls
Unified Interface: Single tool handles errors, logs, traces, and spans datasets
Smart Dataset Detection: Automatically determines the appropriate dataset based on query content
Parallel API Calls: Efficiently fetches custom attributes and search results concurrently

Tool Consolidation

Removes find_errors and find_transactions from tool exports
Updates find_errors_in_file prompt to use the new search_events tool
Streamlines the tool interface while maintaining all functionality

Evaluation System Overhaul

Migrated to vitest-evals 0.4.0: Moved from mock-based testing to tool-call focused evaluation
Added ToolPredictionScorer: AI-based scorer that predicts tool calls without execution for faster testing
Natural Language Prompts: Updated all 41 evaluation tests to use realistic user queries instead of explicit tool instructions
Improved Test Coverage: All tests now pass with ≥0.6 confidence thresholds using natural language
Enhanced Mock Infrastructure: Better test fixtures and data generation for realistic scenarios

Dependencies

Adds @ai-sdk/openai for natural language query processing
Integrates with existing Sentry API client infrastructure

Examples

"Show me Python exceptions from the last 24 hours" → errors dataset with time filter
"Find slow API calls over 1 second" → spans dataset with duration filter
"Search for authentication errors" → errors dataset with keyword filter

This change improves both the user experience by allowing more intuitive, natural language queries and the development experience with a modernized, more realistic evaluation system.

codecov · 2025-07-11T05:48:33Z

Codecov Report

Attention: Patch coverage is 84.89388% with 121 lines in your changes missing coverage. Please review.

Project coverage is 62.94%. Comparing base (a69d3a6) to head (903600d).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/mcp-server/src/tools/search-events.ts	80.10%	110 Missing ⚠️
packages/mcp-server/src/prompts.ts	0.00%	5 Missing ⚠️
packages/mcp-server/src/api-client/client.ts	96.82%	4 Missing ⚠️
packages/mcp-server/src/tools/index.ts	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
+ Coverage   62.65%   62.94%   +0.28%     
==========================================
  Files          77       78       +1     
  Lines        6879     7491     +612     
  Branches      620      666      +46     
==========================================
+ Hits         4310     4715     +405     
- Misses       2569     2776     +207

Flag	Coverage Δ
evals	`77.14% <100.00%> (?)`
unittests	`62.74% <83.80%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

packages/mcp-server-mocks/src/mock-data-generator.ts

dcramer · 2025-07-11T23:55:32Z

Turning into a large patch as I'm trying to clean up evals in addition as I had to expand them..

- Fix update-issue.eval.ts: remove unnecessary find_organizations call when issueUrl provided - Fix prompts.ts: correct template literal interpolation for search_events example - Fix package.json: replace hardcoded vitest path with portable command - Fix ToolPredictionScorer: better handle "org/project" format and team assignment description - Update tool descriptions to accurately reflect capabilities All critical functionality bugs resolved, 38/41 tests now passing. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

….4.0 - Changed all test data from 'expectedTools' to 'expected' field name - Fixed ToolPredictionScorer to follow expected sequences exactly - All 41 tests now passing with perfect scores - Resolved discrepancy between tests expecting/not expecting discovery calls The vitest-evals 0.4.0 requires 'expected' field name for test expectations. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Removed legacy TaskRunner (all tests use SimpleTaskRunner) - Removed unused scoring functions: Factuality, ToolUsage, SearchEventsScorer - Removed detectTool, detectToolWithConfidence, getRelatedTools - Removed TOOL_PATTERNS and all related pattern matching code - Removed captureToolCalls option (always returns TaskResult format) - Simplified TaskRunner options handling Reduced file from 908 lines to 150 lines (83% reduction) while maintaining all functionality. All 41 tests still pass. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Replace \S+ with [^\s]+ in regular expressions to prevent catastrophic backtracking. This addresses the polynomial regular expression security issue flagged in PR review.

Copilot

Pull Request Overview

This PR introduces a unified search_events tool to replace find_errors and find_transactions, updates dependencies (including vitest-evals to 0.4.0), and modernizes testing to use tool-call focused evals.

Add new search_events tool with natural language query support
Remove old find_errors and find_transactions exports
Upgrade to vitest-evals 0.4.0 and switch to ToolPredictionScorer

Reviewed Changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 1 comment.

File	Description
pnpm-workspace.yaml	Added `date-fns` and bumped `vitest-evals` to 0.4.0
packages/mcp-server/src/tools/search-events.ts	Implemented new `search_events` tool
packages/mcp-server/src/tools/search-events.test.ts	Added tests for `search_events`
packages/mcp-server/src/tools/index.ts	Swapped out `find_errors`/`find_transactions` for `search_events`

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Comments suppressed due to low confidence (3)

packages/mcp-server/src/tools/search-events.ts:173

The rules string contains a stray '+-' prefix. Remove the leading '+' so each item starts with a single '-' for proper list formatting.

- Use level field for severity (error, warning, info, debug)

packages/mcp-server/src/tools/search-events.ts:188

The rules string for logs has an extra '+' before the dash. Remove the '+' so the list marker is just '-'.

- Use severity field for log levels (fatal, error, warning, info, debug, trace)

packages/mcp-server/src/tools/search-events.ts:178

In the examples string, remove the leading '+' before the '-' so the example list renders correctly.

- "unhandled errors in production" → error.handled:false AND environment:production

Copilot · 2025-07-13T16:08:02Z

packages/mcp-server/src/tools/search-events.ts

+      .default(false)
+      .describe("Include explanation of how the query was translated"),
+  },
+  async handler(params, context: ServerContext) {


[nitpick] The handler function spans hundreds of lines and mixes prompt construction, API interaction, and output formatting. Consider extracting parts into smaller helper functions to improve readability and maintainability.

- Extract formatErrorResults() for error-specific formatting (35 lines) - Extract formatLogResults() for log-specific formatting (68 lines) - Extract formatSpanResults() for span-specific formatting (36 lines) - Extract buildSystemPrompt() for AI prompt construction - Extract getProjectId() for project slug to ID conversion Reduces handler from ~320 lines to ~80 lines, improving readability and maintainability without changing functionality.

Instead of fetching all projects with listProjects() and filtering client-side, we now use a dedicated getProject() method that fetches a single project by slug or ID. This is more efficient and reduces API load. - Added getProject() method to SentryApiService - Updated search-events to use getProject() instead of listProjects() - Updated tests to mock the single project endpoint - Maintains the requirement that search API needs numeric project IDs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

packages/mcp-server-mocks/src/mock-data-generator.ts

+  const searchTerms = cleanQuery
+    .replace(/\w+\.\w+:[^\s]+/g, "") // Remove field.subfield:value pairs


cursor

Bug: Misleading Parameter Name Causes API Confusion

The searchEvents method's projectSlug parameter is misleadingly named, as it expects a numeric project ID (as a string) rather than a project slug. While current callers correctly pass stringified project IDs, this naming mismatch creates a confusing API interface that could lead to incorrect usage and future bugs if developers attempt to pass actual project slugs.

packages/mcp-server/src/api-client/client.ts#L1369-L1424

sentry-mcp/packages/mcp-server/src/api-client/client.ts

Lines 1369 to 1424 in 903600d

    
             /** 
        
              * Searches for events in Sentry using a general query. 
        
              * This method is used by the search_events tool for semantic search. 
        
              */ 
        
             async searchEvents( 
        
               { 
        
                 organizationSlug, 
        
                 query, 
        
                 fields, 
        
                 limit = 10, 
        
                 projectSlug, 
        
                 dataset = "spans", 
        
                 statsPeriod, 
        
                 sort = "-timestamp", 
        
               }: { 
        
                 organizationSlug: string; 
        
                 query: string; 
        
                 fields: string[]; 
        
                 limit?: number; 
        
                 projectSlug?: string; 
        
                 dataset?: "spans" | "errors" | "ourlogs"; 
        
                 statsPeriod?: string; 
        
                 sort?: string; 
        
               }, 
        
               opts?: RequestOptions, 
        
             ) { 
        
               const queryParams = new URLSearchParams(); 
        
               queryParams.set("per_page", limit.toString()); 
        
               queryParams.set("query", query); 
        
               queryParams.set("referrer", "sentry-mcp"); 
        
               queryParams.set("dataset", dataset); 
        
               if (statsPeriod) { 
        
                 queryParams.set("statsPeriod", statsPeriod); 
        
               } 
        
               queryParams.set("sort", sort); 
        
               // Add project filter if specified 
        
               if (projectSlug) { 
        
                 queryParams.set("project", projectSlug); 
        
               } 
        
               // Add dataset-specific parameters 
        
               if (dataset === "spans") { 
        
                 queryParams.set("allowAggregateConditions", "0"); 
        
                 queryParams.set("useRpc", "1"); 
        
               } 
        
               // Add each field as a separate parameter 
        
               for (const field of fields) { 
        
                 queryParams.append("field", field); 
        
               } 
        
               const apiUrl = `/organizations/${organizationSlug}/events/?${queryParams.toString()}`; 
        
               return await this.requestJSON(apiUrl, undefined, opts); 
        
             }

Fix in Cursor • Fix in Web

Was this report helpful? Give feedback by reacting with 👍 or 👎

dcramer added 5 commits July 10, 2025 19:15

Spec for new search_events

9581bbf

mvp implementation

f3822c8

various fixes

0bf7e94

pure codegen trace-items implementation

0a8f6e9

it works

12039a1

dcramer had a problem deploying to Actions July 11, 2025 05:46 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

clean up specs

8097a8f

dcramer had a problem deploying to Actions July 11, 2025 05:49 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Improve mocks

df1b0a5

dcramer had a problem deploying to Actions July 11, 2025 06:03 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

remove debug logs

b36d3bc

dcramer had a problem deploying to Actions July 11, 2025 06:10 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

append fields, dont set

81c10f9

dcramer had a problem deploying to Actions July 11, 2025 15:26 — with GitHub Actions Failure

remove code

2710a2b

dcramer had a problem deploying to Actions July 11, 2025 15:36 — with GitHub Actions Failure

revert previous change

4cd1840

dcramer had a problem deploying to Actions July 11, 2025 15:46 — with GitHub Actions Failure

Clean up prompt

e7722a5

dcramer had a problem deploying to Actions July 11, 2025 16:10 — with GitHub Actions Failure

better errors

f54c42a

dcramer had a problem deploying to Actions July 11, 2025 16:48 — with GitHub Actions Failure

improvements

0cc872b

dcramer had a problem deploying to Actions July 11, 2025 17:45 — with GitHub Actions Failure

Unused import

8bbbe48

This comment was marked as outdated.

Sign in to view

optimize evals

03ccdc6

dcramer had a problem deploying to Actions July 11, 2025 23:49 — with GitHub Actions Failure

github-advanced-security bot found potential problems Jul 11, 2025

View reviewed changes

packages/mcp-server-mocks/src/mock-data-generator.ts Fixed Show fixed Hide fixed

first phase of optimization

7cf5e25

dcramer had a problem deploying to Actions July 12, 2025 00:41 — with GitHub Actions Failure

rescore tests

ccc86e1

dcramer had a problem deploying to Actions July 13, 2025 04:24 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

dcramer requested a review from Copilot July 13, 2025 04:55

This comment was marked as outdated.

Sign in to view

dcramer had a problem deploying to Actions July 13, 2025 15:07 — with GitHub Actions Failure

dcramer requested a review from Copilot July 13, 2025 15:09

This comment was marked as outdated.

Sign in to view

dcramer temporarily deployed to Actions July 13, 2025 15:21 — with GitHub Actions Inactive

dcramer temporarily deployed to Actions July 13, 2025 15:37 — with GitHub Actions Inactive

dcramer requested a review from Copilot July 13, 2025 15:47

This comment was marked as outdated.

Sign in to view

fix: resolve ReDoS vulnerability in mock-data-generator regex patterns

6038199

Replace \S+ with [^\s]+ in regular expressions to prevent catastrophic backtracking. This addresses the polynomial regular expression security issue flagged in PR review.

dcramer requested a review from Copilot July 13, 2025 16:06

Copilot AI reviewed Jul 13, 2025

View reviewed changes

dcramer and others added 2 commits July 13, 2025 09:30

dcramer deployed to Actions July 13, 2025 16:47 — with GitHub Actions Active

github-advanced-security bot found potential problems Jul 13, 2025

View reviewed changes

cursor bot reviewed Jul 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement unified search_events tool with natural language queries #375

feat: implement unified search_events tool with natural language queries #375

dcramer commented Jul 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 11, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

dcramer commented Jul 11, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 13, 2025

Uh oh!

Check failure

cursor bot left a comment

Uh oh!

Uh oh!

		const searchTerms = cleanQuery
		.replace(/\w+\.\w+:[^\s]+/g, "") // Remove field.subfield:value pairs


	/**
	* Searches for events in Sentry using a general query.
	* This method is used by the search_events tool for semantic search.
	*/
	async searchEvents(
	{
	organizationSlug,
	query,
	fields,
	limit = 10,
	projectSlug,
	dataset = "spans",
	statsPeriod,
	sort = "-timestamp",
	}: {
	organizationSlug: string;
	query: string;
	fields: string[];
	limit?: number;
	projectSlug?: string;
	dataset?: "spans" \| "errors" \| "ourlogs";
	statsPeriod?: string;
	sort?: string;
	},
	opts?: RequestOptions,
	) {
	const queryParams = new URLSearchParams();
	queryParams.set("per_page", limit.toString());
	queryParams.set("query", query);
	queryParams.set("referrer", "sentry-mcp");
	queryParams.set("dataset", dataset);
	if (statsPeriod) {
	queryParams.set("statsPeriod", statsPeriod);
	}
	queryParams.set("sort", sort);

	// Add project filter if specified
	if (projectSlug) {
	queryParams.set("project", projectSlug);
	}

	// Add dataset-specific parameters
	if (dataset === "spans") {
	queryParams.set("allowAggregateConditions", "0");
	queryParams.set("useRpc", "1");
	}

	// Add each field as a separate parameter
	for (const field of fields) {
	queryParams.append("field", field);
	}

	const apiUrl = `/organizations/${organizationSlug}/events/?${queryParams.toString()}`;
	return await this.requestJSON(apiUrl, undefined, opts);
	}

feat: implement unified search_events tool with natural language queries #375

Are you sure you want to change the base?

feat: implement unified search_events tool with natural language queries #375

Conversation

dcramer commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

New search_events Tool

Tool Consolidation

Evaluation System Overhaul

Dependencies

Examples

Uh oh!

codecov bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

dcramer commented Jul 11, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Misleading Parameter Name Causes API Confusion

Uh oh!

Uh oh!

dcramer commented Jul 11, 2025 •

edited

Loading

New `search_events` Tool

codecov bot commented Jul 11, 2025 •

edited

Loading