|
| 1 | +--- |
| 2 | +agent: agent |
| 3 | +--- |
| 4 | + |
| 5 | +# Run a query and describe its false positives |
| 6 | + |
| 7 | +## Task |
| 8 | + |
| 9 | +Help a developer discover what kinds of false positives are produced by their current CodeQL query, and which of those false positive cases are most common. |
| 10 | + |
| 11 | +### Task steps: |
| 12 | + |
| 13 | +1. Read the provided CodeQL query to understand what patterns it is designed to detect. |
| 14 | +2. Discover the results of this query on a real database, by: |
| 15 | + - Running the tool `list_query_run_results` to find existing runs for this query |
| 16 | + - If no existing runs are found, run the query on a relevant database using `codeql_query_run` tool |
| 17 | +3. Analyze and group the results into what appear to be similar types of results. This may mean: |
| 18 | + - Grouping results in the same file |
| 19 | + - Grouping results that reference the same elements |
| 20 | + - Grouping results with similar messages |
| 21 | +4. For each group, explore the actual code for a sample of alerts in that group, using the `read_database_source` tool to triage the results and determine which groups appear to be false positives |
| 22 | +5. For each false positive case discovered in this exploration, group them into categories of similar root causes. For example, a query might not properly account for unreachable code, or there may be a commonly used library that violates the query's assumptions but is actually safe. |
| 23 | +6. Explain these results to the user in order of most common to least common, so they can understand where their query may need improvement to reduce false positives. |
| 24 | + |
| 25 | +## Input Context |
| 26 | + |
| 27 | +You will be provided with: |
| 28 | + |
| 29 | +- **Query Path**: The path to the CodeQL query |
| 30 | + |
| 31 | +## Analysis Guidelines |
| 32 | + |
| 33 | +### Exploring code paths |
| 34 | + |
| 35 | +The tool `read_database_source` can be used to read the code of a particular finding. A good strategy to explore the code paths of a finding is: |
| 36 | + |
| 37 | +1. Read in the immediate context of the violation. |
| 38 | + - Some queries may depend on later context (e.g., an "unused variable" may only be used after its declaration) |
| 39 | + - Some queries may depend on earlier context (e.g., "mutex not initialized" requires observing code that ran before the violation) |
| 40 | + - Some queries may rely on exploring other function calls. |
| 41 | +2. Read the interprocedural context of the violation. |
| 42 | + - If understanding a violation requires checking other source locations, read the beginning of the file to find what it imports, and then read those imported files to find the relevant code. |
| 43 | + - Selectively scan interprocedural paths. A false positive that requires tremendous code exploration to verify is less problematic than a false positive that only requires checking a small amount of code to verify. |
| 44 | +3. Stop early. |
| 45 | + - Grouping the potential false positive cases is more important than exhaustively verifying every single finding. |
| 46 | + - A common false positive likely introduces some false positives that are very hard to verify, so it is usually better to focus on simple cases first. |
| 47 | + - Truly hard-to-verify false positive cases are often in code that users don't expect to be condusive to static analysis, and query authors often don't expect their queries to work well in those cases. |
| 48 | + - Suggest a chainsaw approach rather than a scalpel - if a result may be a false positive, identify some simple heuristics to eliminate all such complex cases, even if such a hueristic could introduce false negatives. |
| 49 | + |
| 50 | +### What Makes a Result Likely to be a False Positive? |
| 51 | + |
| 52 | +1. **Pattern Mismatches** |
| 53 | + - Query pattern doesn't accurately match the actual code behavior |
| 54 | + - Missing context that would show the code is safe |
| 55 | + - Overly broad query logic catching benign variations |
| 56 | + |
| 57 | +2. **Safe Code Patterns** |
| 58 | + - Code includes proper validation before the flagged operation |
| 59 | + - Results in test code, example code, or mock implementations |
| 60 | + - Variable/function names suggesting test/example context (e.g., `testFunc`, `exampleVar`, `mockData`) |
| 61 | + - Defensive programming patterns present but not recognized by query |
| 62 | + |
| 63 | +3. **Context Indicators** |
| 64 | + - File paths suggesting test/example files (e.g., `test/`, `examples/`, `mock/`) |
| 65 | + - Comments indicating intentional patterns or safe usage |
| 66 | + - Framework-specific patterns that are safe in context |
| 67 | + |
| 68 | +4. **Technical Factors** |
| 69 | + - Type mismatches that make the vulnerability impossible |
| 70 | + - Control flow that prevents exploitation |
| 71 | + - Data flow interrupted by sanitization not captured in query |
| 72 | + |
| 73 | +### Important Constraints |
| 74 | + |
| 75 | +- **Be objective**: Don't be influenced by variable/function naming suggesting importance (e.g., "prodOnly" vs "test") |
| 76 | +- **Require evidence**: Base conclusions on actual code patterns, not assumptions |
| 77 | +- **Mark uncertainty**: Use lower confidence scores when code snippets are missing |
| 78 | +- **Avoid false confidence**: If you cannot determine FP status, mark confidence as low |
| 79 | + |
| 80 | + |
| 81 | +## Output Format |
| 82 | + |
| 83 | +Return a JSON array of ranked results, ordered by FP likelihood (highest first): |
| 84 | + |
| 85 | +```json |
| 86 | +[ |
| 87 | + { |
| 88 | + "ruleId": "query-id", |
| 89 | + "message": { "text": "original result message" }, |
| 90 | + "locations": [...], |
| 91 | + "confidence": 0.85, |
| 92 | + "reasoning": "This appears to be a false positive because: (1) the file path 'test/examples/mock-data.js' indicates test code, (2) variable name 'testInput' suggests this is test data, (3) the query doesn't account for the validation on line X.", |
| 93 | + "sourceFile": "results-1.sarif", |
| 94 | + "resultIndex": 42 |
| 95 | + } |
| 96 | +] |
| 97 | +``` |
| 98 | + |
| 99 | +### Confidence Score Guidelines |
| 100 | + |
| 101 | +- **0.8-1.0**: Strong evidence of FP (e.g., test code with clear safety patterns) |
| 102 | +- **0.6-0.8**: Good evidence of FP (e.g., defensive patterns present) |
| 103 | +- **0.4-0.6**: Moderate evidence of FP (e.g., context suggests safety but not conclusive) |
| 104 | +- **0.2-0.4**: Weak evidence of FP (e.g., minor indicators, missing code snippets) |
| 105 | +- **0.0-0.2**: Minimal evidence (uncertain, need more information) |
| 106 | + |
| 107 | +## Examples |
| 108 | + |
| 109 | +### High-Confidence FP Example |
| 110 | + |
| 111 | +```json |
| 112 | +{ |
| 113 | + "ruleId": "js/sql-injection", |
| 114 | + "message": { "text": "Potential SQL injection" }, |
| 115 | + "confidence": 0.9, |
| 116 | + "reasoning": "False positive: (1) File path 'test/unit/database-mock.test.js' indicates test code, (2) Query doesn't recognize that 'sanitizeInput()' function is called before database query, (3) Variable name 'mockUserInput' suggests test data. Missing code snippet limits certainty to 0.9 instead of 1.0." |
| 117 | +} |
| 118 | +``` |
| 119 | + |
| 120 | +### Low-Confidence FP Example |
| 121 | + |
| 122 | +```json |
| 123 | +{ |
| 124 | + "ruleId": "js/path-injection", |
| 125 | + "message": { "text": "Potential path traversal" }, |
| 126 | + "confidence": 0.3, |
| 127 | + "reasoning": "Possibly a false positive: (1) No code snippets available in SARIF, (2) File path 'src/utils/fileHandler.js' doesn't indicate test code, (3) Cannot verify if proper path validation exists. Low confidence due to missing context." |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +## Processing Instructions |
| 132 | + |
| 133 | +1. **Review each SARIF result** in the provided array |
| 134 | +2. **Analyze available context** (code snippets, file paths, messages) |
| 135 | +3. **Compare against query logic** to understand what pattern was detected |
| 136 | +4. **Identify FP indicators** based on guidelines above |
| 137 | +5. **Assign confidence score** reflecting evidence strength |
| 138 | +6. **Write clear reasoning** explaining your assessment |
| 139 | +7. **Sort results** by confidence score (descending) |
| 140 | +8. **Return top N results** as requested (or all if N not specified) |
| 141 | + |
| 142 | +## Important Notes |
| 143 | + |
| 144 | +- A result should never appear in both FP and TP rankings |
| 145 | +- When in doubt, prefer lower confidence scores |
| 146 | +- Missing code snippets should always reduce confidence |
| 147 | +- Test/example code is more likely to be FP but not always |
| 148 | +- Focus on technical evidence, not code naming conventions |
0 commit comments