| name | improve-codeql-query-detection-cpp |
|---|---|
| description | Improve detection capabilities of existing CodeQL queries for C++ by analyzing false positives and false negatives, refining query logic, and validating improvements through testing. Use this skill when you need to enhance the accuracy and coverage of C++ security or code quality queries using the CodeQL Development MCP Server tools. |
This skill guides you through improving the detection capabilities of existing CodeQL queries for C++ code by reducing false positives and catching more true positives (reducing false negatives).
- Improving detection accuracy of an existing C++ CodeQL query
- Reducing false positives (incorrect alerts) from a query
- Catching more true vulnerabilities (reducing false negatives)
- Refining query logic based on real-world code patterns
- Enhancing query precision through better data flow analysis
- Optimizing query performance while maintaining accuracy
Before improving a query, ensure you have:
- An existing CodeQL query (
.qlfile) for C++ that you want to improve - Existing test cases for the query
- Understanding of what the query is intended to detect
- Access to CodeQL Development MCP Server tools
- Examples of false positives or false negatives to address
- A query pack directory where your query is organized
Review the existing query to understand:
- Query Purpose: What vulnerability or code pattern is it detecting?
- Current Logic: How does the query identify issues?
- Known Limitations: Are there documented false positives or false negatives?
Use MCP tool codeql_query_compile to ensure the query compiles:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Examine current test files in your query pack's test directory:
- Positive cases: Code patterns that should be detected
- Negative cases: Code patterns that should NOT be detected
- Edge cases: Boundary conditions
Run existing tests using codeql_test_run:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Analyze test results to establish baseline behavior.
Categorize issues into:
False Positives (Query alerts on safe code):
- Query is too broad or imprecise
- Missing sanitization or validation checks
- Not accounting for safe coding patterns
- Ignoring proper null checks or bounds checks
False Negatives (Query misses actual issues):
- Query is too narrow or specific
- Missing data flow paths
- Not considering all vulnerable patterns
- Ignoring indirect or complex call chains
Before modifying the query, create test cases that demonstrate the issues.
Create test code that is safe but currently triggers false alerts:
// False positive: Query incorrectly flags this as unsafe
void safeFunction() {
int* ptr = getUserInput();
if (ptr != nullptr && validatePointer(ptr)) {
*ptr = 42; // Should NOT be detected - pointer is validated
}
}Create test code that is unsafe but currently not detected:
// False negative: Query should detect this but doesn't
void unsafeFunction() {
int* ptr = getPointerFromComplexPath();
// Indirect null pointer dereference through helper
processPointer(ptr); // Should be detected - no null check
}
void processPointer(int* p) {
*p = 42; // Dereference without null check
}- Add new test cases to existing
Example1.cppor createExample2.cpp - Document each case with clear comments explaining the issue
- Update
.expectedfile with expected results after fixes
Re-extract test database using codeql_test_extract:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Run tests to confirm new test cases fail as expected:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Use AST analysis to understand code structure better.
Use codeql_query_run to analyze AST structure:
{
"query": "<query-pack>/src/PrintAST/PrintAST.ql",
"database": "<query-pack>/test/{QueryName}/{QueryName}.testproj",
"searchPath": ["<query-pack>"],
"format": "text"
}Use codeql_bqrs_decode to view human-readable AST:
{
"format": "text",
"bqrsFile": "<path-to-results.bqrs>",
"outputPath": "scratch/ast-analysis.txt"
}Identify AST patterns for:
- Safe code that should be excluded (for false positives)
- Unsafe code that should be included (for false negatives)
- Guard conditions and validation patterns
- Data flow paths through functions
Add predicates to recognize validation and sanitization:
- Check for guard conditions (null checks, bounds checks)
- Recognize validation function calls
- Add barriers to stop flow at validation points
- Exclude expressions that have been validated
Expand detection to catch more cases:
- Include more source patterns (parameters, return values, global variables)
- Add custom flow steps for wrapper functions and field access
- Enhance data flow configuration with additional sources and sinks
- Consider indirect flows through helper functions
For complex scenarios requiring validation tracking:
- Define flow states (e.g., validated vs unvalidated)
- Track state changes through validation functions
- Use
DataFlow::StateConfigSigfor context-sensitive analysis - Apply barriers based on flow state
Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Fix any compilation errors before proceeding.
Use codeql_test_run to validate improvements:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Analyze Results:
✅ Success Indicators:
- False positives eliminated (safe code no longer flagged)
- False negatives caught (unsafe code now detected)
- All positive test cases still detected
- No new false positives introduced
❌ Need Further Refinement:
- Still missing some unsafe patterns
- Still flagging some safe patterns
- Query breaks existing valid detections
If tests don't pass:
- Analyze differences: Compare actual vs expected results
- Refine predicates: Adjust source/sink/barrier definitions
- Add debug output: Use
selectstatements to inspect intermediate results - Re-run tests: Repeat until all tests pass
Once core improvements work, add more test scenarios:
- STL containers and iterators
- Template instantiations
- Class hierarchies and inheritance
- Modern C++ features (lambdas, smart pointers, move semantics)
- Different compilation modes or standards
For each scenario:
- Add test code to existing or new test files
- Update
.expectedfile with expected results - Re-extract test database
- Run tests and verify results
Once tests pass, improve code quality:
- Extract Helper Predicates: Move complex logic into reusable predicates
- Add QLDoc Documentation: Update query metadata and documentation
- Format Query: Use
codeql_query_formatfor consistent formatting - Verify Integration: Run language-level tests to ensure no regressions
If query performance is an issue, consider:
- Materialization: Use
cachedfor expensive predicates that are reused - Limit Search Space: Focus on relevant functions or code patterns
- Choose Appropriate Flow: Use local flow for intra-procedural, global flow for inter-procedural analysis
When improving C++ queries, consider these language-specific aspects:
- Track
new/deletepairs for memory leaks - Detect use-after-free patterns
- Check buffer bounds and array access
- Analyze pointer aliasing and lifetime
- Flow through references (
&,&&) - Smart pointer operations (
unique_ptr,shared_ptr) - Virtual function dispatch and polymorphism
- Template instantiation tracking
- RAII patterns and resource management
- Move semantics and rvalue references
- Lambda expressions and captures
- STL containers and iterators
codeql_query_compile: Compile queries and check for syntax errorscodeql_query_format: Format query files for consistencycodeql_query_run: Run queries (PrintAST, etc.) for analysiscodeql_test_extract: Extract test databases from C++ codecodeql_test_run: Run tests and compare with expected resultscodeql_test_accept: Accept new baseline (use with caution)
codeql_bqrs_decode: Decode binary results to readable formatcodeql_bqrs_interpret: Interpret results in various formatscodeql_bqrs_info: Get metadata about result files
Reduce false positives by recognizing sanitization and validation:
predicate isSanitized(Expr e) {
exists(FunctionCall sanitize |
sanitize.getTarget().hasName(["sanitize", "validate", "check"]) and
DataFlow::localExprFlow(sanitize, e)
)
}Reduce false negatives by including more sources:
predicate isSource(DataFlow::Node source) {
source.asExpr().(FunctionCall).getTarget().hasName(["getInput", "getUserData"]) or
source.asParameter().getFunction().hasName(["main", "handleRequest"])
}Stop flow at validation points:
predicate isBarrier(DataFlow::Node node) {
exists(FunctionCall sanitize |
sanitize.getTarget().hasName(["sanitize", "escape"]) and
sanitize.getAnArgument() = node.asExpr()
)
}When flow paths are missing or incorrect:
- Use partial flow to debug missing edges:
FlowExploration::hasPartialFlow(source, sink, _) - Add debug selects to inspect intermediate results
- Use PrintAST to understand code structure better
Before considering query improvements complete:
- False positives identified and addressed
- False negatives identified and addressed
- New test cases added for both false positives and false negatives
- Test database re-extracted with new test cases
- All tests pass (actual matches expected)
- Query compiles without errors or warnings
- Query formatted with
codeql_query_format - QLDoc documentation updated
- Helper predicates extracted for clarity
- Performance is acceptable
- Integration tests pass (language-level tests)
- No regressions in existing valid detections
❌ Don't:
- Over-generalize and introduce new false positives
- Remove valid detections while fixing false positives
- Make changes without corresponding test cases
- Accept test results without verification
- Ignore performance implications of changes
- Skip integration testing
✅ Do:
- Create test cases before modifying query
- Make incremental changes and test frequently
- Document why each change improves detection
- Balance precision and recall
- Consider real-world C++ code patterns
- Validate improvements with comprehensive tests
- Use AST analysis to understand code structure
- C++ Query Development Prompt - Comprehensive C++ query development guide
- Create CodeQL Query Unit Test for C++ - Testing guidance
- Test-Driven CodeQL Query Development - TDD workflow for queries
- CodeQL C++ Documentation - Official C++ AST reference
Your query improvement is successful when:
- ✅ False positives are eliminated or significantly reduced
- ✅ False negatives are caught (improved coverage)
- ✅ All existing valid detections still work
- ✅ Comprehensive test coverage for improvements
- ✅ All tests pass consistently
- ✅ Query is well-documented and maintainable
- ✅ Performance is acceptable for production use
- ✅ No regressions in integration tests