| name | update-codeql-query-dataflow-cpp |
|---|---|
| description | Upgrade CodeQL queries from legacy (v1) language-specific dataflow library to the new (v2) shared dataflow API for C/C++. Use this skill when migrating C/C++ queries to the modular dataflow API while ensuring query result equivalence through test-driven development. |
This skill guides you through migrating CodeQL queries for C/C++ from the legacy v1 (language-specific) dataflow library to the v2 (shared, modular) dataflow API.
- Migrating C/C++ queries from
DataFlow::ConfigurationtoDataFlow::ConfigSig - Updating to shared dataflow API (v2)
- Ensuring migrated queries produce identical results
Most Important: Migrated query must produce exact same results as original. Use TDD with comprehensive unit tests to guarantee equivalence.
- Existing C/C++ query using v1 dataflow API
- Comprehensive unit tests (create-codeql-query-unit-test-cpp)
- Access to CodeQL Development MCP Server tools
v1 (Legacy) API:
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink, source, sink, "Message"v2 (Modular) API:
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;
from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink, source, sink, "Message"| v1 API | v2 API | Notes |
|---|---|---|
DataFlow::Configuration class |
DataFlow::ConfigSig signature module |
Module-based instead of class-based |
isSanitizer predicate |
isBarrier predicate |
Renamed for clarity |
isAdditionalTaintStep |
isAdditionalFlowStep |
Renamed for consistency |
config.hasFlowPath(source, sink) |
MyFlow::flowPath(source, sink) |
Module instantiation approach |
DataFlow::PathNode |
MyFlow::PathNode |
Path nodes come from module |
v1 Approach:
class MyConfig extends DataFlow::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) {
n2.asIndirectExpr() = n1.asExpr().(PointerDereferenceExpr)
}
}v2 Approach:
module MyConfig implements DataFlow::ConfigSig {
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(PointerDereferenceExpr deref |
n1.asExpr() = deref and
n2.asIndirectExpr() = deref
)
}
}v1 Approach:
override predicate isSource(DataFlow::Node source) {
exists(Parameter p |
p = source.asParameter() or
source.(DataFlow::IndirectParameterNode).getParameter() = p
)
}v2 Approach (No Change Needed):
predicate isSource(DataFlow::Node source) {
exists(Parameter p |
p = source.asParameter() or
source.(DataFlow::IndirectParameterNode).getParameter() = p
)
}v1 Approach:
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(FieldAccess fa |
n1.asExpr() = fa.getQualifier() and
n2.asExpr() = fa
)
}v2 Approach:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(FieldAccess fa |
n1.asExpr() = fa.getQualifier() and
n2.asExpr() = fa
)
}v2 Enhanced Support:
module MyConfig implements DataFlow::ConfigSig {
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Track std::move operations
exists(FunctionCall move |
move.getTarget().hasQualifiedName("std", "move") and
n1.asExpr() = move.getArgument(0) and
n2.asExpr() = move
)
}
}Critical: Ensure comprehensive test coverage before changing query code.
Verify or create tests in your query pack's test directory:
Use codeql_test_run to establish baseline. If tests missing, follow create-codeql-query-unit-test-cpp.
Test coverage must include: All sources, sinks, barriers, C++-specific features (pointers, references, templates), and edge cases.
Backup original: cp {QueryName}.ql {QueryName}.ql.v1.backup
Document current behavior: sources, sinks, sanitizers, custom steps, limitations.
Before (v1):
import cpp
import semmle.code.cpp.dataflow.TaintTrackingAfter (v2):
import cpp
import semmle.code.cpp.dataflow.TaintTrackingNote: Import paths remain the same; API usage changes.
Before (v1):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) {
source.asExpr().(FunctionCall).getTarget().hasName("untrusted_input")
}
override predicate isSink(DataFlow::Node sink) {
exists(FunctionCall fc |
fc.getTarget().hasName("dangerous_operation") and
sink.asExpr() = fc.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
exists(FunctionCall sanitize |
sanitize.getTarget().hasName("validate") and
DataFlow::localFlow(DataFlow::exprNode(sanitize), node)
)
}
}After (v2):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr().(FunctionCall).getTarget().hasName("untrusted_input")
}
predicate isSink(DataFlow::Node sink) {
exists(FunctionCall fc |
fc.getTarget().hasName("dangerous_operation") and
sink.asExpr() = fc.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
exists(FunctionCall sanitize |
sanitize.getTarget().hasName("validate") and
DataFlow::localFlow(DataFlow::exprNode(sanitize), node)
)
}
}After conversion, add:
module MyFlow = TaintTracking::Global<MyConfig>;Or for pure data flow (not taint):
module MyFlow = DataFlow::Global<MyConfig>;Choose based on original query:
- Use
TaintTracking::Globalif original extendedTaintTracking::Configuration - Use
DataFlow::Globalif original extendedDataFlow::Configuration
Before (v1):
from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe data flow from $@ to $@",
source.getNode(), "source", sink.getNode(), "sink"After (v2):
from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe data flow from $@ to $@",
source.getNode(), "source", sink.getNode(), "sink"isSanitizer→isBarrierisAdditionalTaintStep→isAdditionalFlowStep- Remove constructor
MyConfig() { this = "..." }
- Compile: Use
codeql_query_compile. Fix errors before proceeding. - Run tests: Use
codeql_test_run - Goal: Tests pass with identical results to baseline
- If tests fail: Compare actual vs expected, check sources/sinks/barriers/flow steps
Common issues: Missing results (check sources/sinks), extra results (check barriers), flow path differences.
Debug tools:
- PrintAST: Analyze AST structure with
codeql_query_run - Partial flow: Use
MyFlow::partialFlow(source, sink, _)for missing paths - Debug predicates: Add
selectstatements to inspect intermediate results
Once tests pass:
- Extract helper predicates for clarity
- Update QLDoc with migration note
- Format with
codeql_query_format
- Run all language tests with
codeql_test_run(ensure no regressions) - Clean up: Remove backup file if successful
/**
* @name Null pointer dereference
* @kind path-problem
*/
import cpp
import semmle.code.cpp.dataflow.TaintTracking
class NullFlowConfig extends TaintTracking::Configuration {
NullFlowConfig() { this = "NullFlowConfig" }
override predicate isSource(DataFlow::Node source) {
source.asExpr() instanceof NullLiteral
}
override predicate isSink(DataFlow::Node sink) {
exists(PointerDereferenceExpr deref |
sink.asExpr() = deref.getOperand()
)
}
override predicate isSanitizer(DataFlow::Node node) {
exists(IfStmt check |
check.getCondition().(ComparisonOperation).getAnOperand() = node.asExpr()
)
}
}
from NullFlowConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
"Potential null pointer dereference from $@", source.getNode(), "null value"/**
* @name Null pointer dereference
* @kind path-problem
* @note Migrated to v2 modular dataflow API
*/
import cpp
import semmle.code.cpp.dataflow.TaintTracking
module NullFlowConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr() instanceof NullLiteral
}
predicate isSink(DataFlow::Node sink) {
exists(PointerDereferenceExpr deref |
sink.asExpr() = deref.getOperand()
)
}
predicate isBarrier(DataFlow::Node node) {
exists(IfStmt check |
check.getCondition().(ComparisonOperation).getAnOperand() = node.asExpr()
)
}
}
module NullFlow = TaintTracking::Global<NullFlowConfig>;
from NullFlow::PathNode source, NullFlow::PathNode sink
where NullFlow::flowPath(source, sink)
select sink.getNode(), source, sink,
"Potential null pointer dereference from $@", source.getNode(), "null value"This skill uses the following CodeQL Development MCP Server tools:
codeql_test_extract: Extract test databases for baseline and validationcodeql_test_run: Verify query results match expected baselinecodeql_query_compile: Compile migrated query and check for errorscodeql_query_format: Format query code for consistencycodeql_query_run: Run PrintAST and debug queries for investigationcodeql_bqrs_decode: Decode result files for analysiscodeql_pack_install: Install dependencies for query and test packs
Before considering your migration complete:
- Comprehensive test coverage exists for original query
- All tests pass with v1 query (baseline established)
- Query backed up before migration
- v1 Configuration class converted to v2 ConfigSig module
- Flow module instantiated (
TaintTracking::Global<Config>orDataFlow::Global<Config>) -
isSanitizerrenamed toisBarrier -
isAdditionalTaintSteprenamed toisAdditionalFlowStep - Constructor removed from configuration
- Query body updated to use module-based flow
- Migrated query compiles without errors
- All tests pass with v2 query (results identical to v1)
- No new false positives introduced
- No new false negatives introduced
- Query formatted with
codeql_query_format - QLDoc documentation updated
- Language-level tests pass (no regressions)
❌ Don't:
- Migrate without establishing test baseline first
- Accept different results without investigation
- Skip testing comprehensive source/sink/barrier coverage
- Ignore C++-specific dataflow patterns (pointers, references, indirection)
- Forget to update query body to use module-based flow
- Leave constructor in the configuration module
✅ Do:
- Use TDD approach with comprehensive tests
- Verify result equivalence before and after migration
- Test all C++ language features used in original query
- Debug result differences systematically
- Use MCP tools for validation at each step
- Keep backup of original query until migration verified
- New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- Analyzing data flow in C and C++ - C/C++ dataflow guide
- Advanced dataflow scenarios for C/C++ - Advanced C/C++ patterns
- Create CodeQL Query TDD Generic - Test-driven development workflow
- Create CodeQL Query Unit Test for C++ - C++ unit testing guidance
- Improve CodeQL Query Detection for C++ - Query improvement patterns
Your dataflow migration is successful when:
- ✅ Comprehensive tests exist covering all query behaviors
- ✅ Original query (v1) passes all tests (baseline)
- ✅ Migrated query (v2) passes all tests (same results)
- ✅ Query compiles without warnings
- ✅ No false positives introduced
- ✅ No false negatives introduced
- ✅ Code is clean and well-documented
- ✅ Performance is equivalent or better
- ✅ All language-level tests pass
- ✅ Query follows v2 API best practices