Skip to content

Commit f78b9f7

Browse files
Add a new prompt & tool for diagnosing FPs/FNs from query runs.
Rather than relying on SARIF, give the MCP server a tool to allow agents to read the source code from the zip archive of a database. Create a prompt that instructs the agent to try to find a prior runs, rerun if necessary, and then use this tool to find FPs/FNs.
1 parent 532791a commit f78b9f7

File tree

11 files changed

+789
-17
lines changed

11 files changed

+789
-17
lines changed

docs/ql-mcp/prompts.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
## Overview
66

7-
The server provides **10 prompts** that guide AI assistants through common CodeQL development workflows. Each prompt is backed by a `*.prompt.md` file containing structured instructions.
7+
The server provides **11 prompts** that guide AI assistants through common CodeQL development workflows. Each prompt is backed by a `*.prompt.md` file containing structured instructions.
88

99
## Prompt Reference
1010

@@ -17,6 +17,7 @@ The server provides **10 prompts** that guide AI assistants through common CodeQ
1717
| `ql_tdd_basic` | Test-driven CodeQL query development checklist — write tests first, implement query, iterate until tests pass |
1818
| `sarif_rank_false_positives` | Analyze SARIF results to identify likely false positives in CodeQL query results |
1919
| `sarif_rank_true_positives` | Analyze SARIF results to identify likely true positives in CodeQL query results |
20+
| `run_query_and_summarize_false_positives` | Run a CodeQL query and summarize its false positives |
2021
| `test_driven_development` | Test-driven development workflow for CodeQL queries using MCP tools |
2122
| `tools_query_workflow` | Guide for using built-in tools queries (PrintAST, PrintCFG, CallGraphFrom, CallGraphTo) to understand code structure |
2223
| `workshop_creation_workflow` | Guide for creating CodeQL query development workshops from production-grade queries |

package-lock.json

Lines changed: 23 additions & 12 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

server/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
},
5757
"dependencies": {
5858
"@modelcontextprotocol/sdk": "^1.26.0",
59+
"adm-zip": "^0.5.16",
5960
"cors": "^2.8.6",
6061
"dotenv": "^17.3.1",
6162
"express": "^5.2.1",
@@ -65,6 +66,7 @@
6566
},
6667
"devDependencies": {
6768
"@eslint/js": "^10.0.1",
69+
"@types/adm-zip": "^0.5.7",
6870
"@types/cors": "^2.8.19",
6971
"@types/express": "^5.0.6",
7072
"@types/js-yaml": "^4.0.9",
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
agent: agent
3+
---
4+
5+
# Run a query and describe its false positives
6+
7+
## Task
8+
9+
Help a developer discover what kinds of false positives are produced by their current CodeQL query, and which of those false positive cases are most common.
10+
11+
### Task steps:
12+
13+
1. Read the provided CodeQL query to understand what patterns it is designed to detect.
14+
2. Discover the results of this query on a real database, by:
15+
- Running the tool `list_query_run_results` to find existing runs for this query
16+
- If no existing runs are found, run the query on a relevant database using `codeql_query_run` tool
17+
3. Analyze and group the results into what appear to be similar types of results. This may mean:
18+
- Grouping results in the same file
19+
- Grouping results that reference the same elements
20+
- Grouping results with similar messages
21+
4. For each group, explore the actual code for a sample of alerts in that group, using the `read_database_source` tool to triage the results and determine which groups appear to be false positives
22+
5. For each false positive case discovered in this exploration, group them into categories of similar root causes. For example, a query might not properly account for unreachable code, or there may be a commonly used library that violates the query's assumptions but is actually safe.
23+
6. Explain these results to the user in order of most common to least common, so they can understand where their query may need improvement to reduce false positives.
24+
25+
## Input Context
26+
27+
You will be provided with:
28+
29+
- **Query Path**: The path to the CodeQL query
30+
31+
## Analysis Guidelines
32+
33+
### Exploring code paths
34+
35+
The tool `read_database_source` can be used to read the code of a particular finding. A good strategy to explore the code paths of a finding is:
36+
37+
1. Read in the immediate context of the violation.
38+
- Some queries may depend on later context (e.g., an "unused variable" may only be used after its declaration)
39+
- Some queries may depend on earlier context (e.g., "mutex not initialized" requires observing code that ran before the violation)
40+
- Some queries may rely on exploring other function calls.
41+
2. Read the interprocedural context of the violation.
42+
- If understanding a violation requires checking other source locations, read the beginning of the file to find what it imports, and then read those imported files to find the relevant code.
43+
- Selectively scan interprocedural paths. A false positive that requires tremendous code exploration to verify is less problematic than a false positive that only requires checking a small amount of code to verify.
44+
3. Stop early.
45+
- Grouping the potential false positive cases is more important than exhaustively verifying every single finding.
46+
- A common false positive likely introduces some false positives that are very hard to verify, so it is usually better to focus on simple cases first.
47+
- Truly hard-to-verify false positive cases are often in code that users don't expect to be condusive to static analysis, and query authors often don't expect their queries to work well in those cases.
48+
- Suggest a chainsaw approach rather than a scalpel - if a result may be a false positive, identify some simple heuristics to eliminate all such complex cases, even if such a hueristic could introduce false negatives.
49+
50+
### What Makes a Result Likely to be a False Positive?
51+
52+
1. **Pattern Mismatches**
53+
- Query pattern doesn't accurately match the actual code behavior
54+
- Missing context that would show the code is safe
55+
- Overly broad query logic catching benign variations
56+
57+
2. **Safe Code Patterns**
58+
- Code includes proper validation before the flagged operation
59+
- Results in test code, example code, or mock implementations
60+
- Variable/function names suggesting test/example context (e.g., `testFunc`, `exampleVar`, `mockData`)
61+
- Defensive programming patterns present but not recognized by query
62+
63+
3. **Context Indicators**
64+
- File paths suggesting test/example files (e.g., `test/`, `examples/`, `mock/`)
65+
- Comments indicating intentional patterns or safe usage
66+
- Framework-specific patterns that are safe in context
67+
68+
4. **Technical Factors**
69+
- Type mismatches that make the vulnerability impossible
70+
- Control flow that prevents exploitation
71+
- Data flow interrupted by sanitization not captured in query
72+
73+
### Important Constraints
74+
75+
- **Be objective**: Don't be influenced by variable/function naming suggesting importance (e.g., "prodOnly" vs "test")
76+
- **Require evidence**: Base conclusions on actual code patterns, not assumptions
77+
- **Mark uncertainty**: Use lower confidence scores when code snippets are missing
78+
- **Avoid false confidence**: If you cannot determine FP status, mark confidence as low
79+
80+
81+
## Output Format
82+
83+
Return a JSON array of ranked results, ordered by FP likelihood (highest first):
84+
85+
```json
86+
[
87+
{
88+
"ruleId": "query-id",
89+
"message": { "text": "original result message" },
90+
"locations": [...],
91+
"confidence": 0.85,
92+
"reasoning": "This appears to be a false positive because: (1) the file path 'test/examples/mock-data.js' indicates test code, (2) variable name 'testInput' suggests this is test data, (3) the query doesn't account for the validation on line X.",
93+
"sourceFile": "results-1.sarif",
94+
"resultIndex": 42
95+
}
96+
]
97+
```
98+
99+
### Confidence Score Guidelines
100+
101+
- **0.8-1.0**: Strong evidence of FP (e.g., test code with clear safety patterns)
102+
- **0.6-0.8**: Good evidence of FP (e.g., defensive patterns present)
103+
- **0.4-0.6**: Moderate evidence of FP (e.g., context suggests safety but not conclusive)
104+
- **0.2-0.4**: Weak evidence of FP (e.g., minor indicators, missing code snippets)
105+
- **0.0-0.2**: Minimal evidence (uncertain, need more information)
106+
107+
## Examples
108+
109+
### High-Confidence FP Example
110+
111+
```json
112+
{
113+
"ruleId": "js/sql-injection",
114+
"message": { "text": "Potential SQL injection" },
115+
"confidence": 0.9,
116+
"reasoning": "False positive: (1) File path 'test/unit/database-mock.test.js' indicates test code, (2) Query doesn't recognize that 'sanitizeInput()' function is called before database query, (3) Variable name 'mockUserInput' suggests test data. Missing code snippet limits certainty to 0.9 instead of 1.0."
117+
}
118+
```
119+
120+
### Low-Confidence FP Example
121+
122+
```json
123+
{
124+
"ruleId": "js/path-injection",
125+
"message": { "text": "Potential path traversal" },
126+
"confidence": 0.3,
127+
"reasoning": "Possibly a false positive: (1) No code snippets available in SARIF, (2) File path 'src/utils/fileHandler.js' doesn't indicate test code, (3) Cannot verify if proper path validation exists. Low confidence due to missing context."
128+
}
129+
```
130+
131+
## Processing Instructions
132+
133+
1. **Review each SARIF result** in the provided array
134+
2. **Analyze available context** (code snippets, file paths, messages)
135+
3. **Compare against query logic** to understand what pattern was detected
136+
4. **Identify FP indicators** based on guidelines above
137+
5. **Assign confidence score** reflecting evidence strength
138+
6. **Write clear reasoning** explaining your assessment
139+
7. **Sort results** by confidence score (descending)
140+
8. **Return top N results** as requested (or all if N not specified)
141+
142+
## Important Notes
143+
144+
- A result should never appear in both FP and TP rankings
145+
- When in doubt, prefer lower confidence scores
146+
- Missing code snippets should always reduce confidence
147+
- Test/example code is more likely to be FP but not always
148+
- Focus on technical evidence, not code naming conventions

server/src/prompts/workflow-prompts.ts

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,19 @@ export const sarifRankSchema = z.object({
158158
.describe('Path to the SARIF file to analyze'),
159159
});
160160

161+
/**
162+
* Schema for run_query_and_summarize_false_positives prompt parameters.
163+
*
164+
* Both parameters are optional.
165+
*/
166+
export const describeFalsePositivesSchema = z.object({
167+
queryPath: z
168+
.string()
169+
.optional()
170+
.describe('Path to the CodeQL query file'),
171+
});
172+
173+
161174
/**
162175
* Schema for explain_codeql_query prompt parameters.
163176
*
@@ -222,6 +235,7 @@ export const WORKFLOW_PROMPT_NAMES = [
222235
'ql_lsp_iterative_development',
223236
'ql_tdd_advanced',
224237
'ql_tdd_basic',
238+
'run_query_and_summarize_false_positives',
225239
'sarif_rank_false_positives',
226240
'sarif_rank_true_positives',
227241
'test_driven_development',
@@ -479,6 +493,35 @@ export function registerWorkflowPrompts(server: McpServer): void {
479493
}
480494
);
481495

496+
// Run a query and describe its false positives
497+
server.prompt(
498+
'run_query_and_summarize_false_positives',
499+
'Help a user figure out where their query may need improvement to have a lower false positive rate',
500+
describeFalsePositivesSchema.shape,
501+
async ({ queryPath }) => {
502+
const template = loadPromptTemplate('run-query-and-summarize-false-positives.prompt.md');
503+
504+
let contextSection = '## Analysis Context\n\n';
505+
if (queryPath) {
506+
contextSection += `- **Query Path**: ${queryPath}\n`;
507+
}
508+
509+
contextSection += '\n';
510+
511+
return {
512+
messages: [
513+
{
514+
role: 'user',
515+
content: {
516+
type: 'text',
517+
text: contextSection + template
518+
}
519+
}
520+
]
521+
};
522+
}
523+
);
524+
482525
// Explain CodeQL Query Prompt (for workshop learning content)
483526
server.prompt(
484527
'explain_codeql_query',

0 commit comments

Comments
 (0)