| name | update-codeql-query-dataflow-go |
|---|---|
| description | Update CodeQL queries for Go from legacy v1 dataflow API to modern v2 shared dataflow API. Use this skill when migrating Go queries to use DataFlow::ConfigSig modules, ensuring query results remain equivalent through TDD. |
This skill guides you through migrating Go CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query results remain equivalent.
- Migrating Go queries using deprecated
DataFlow::Configurationclasses - Updating queries to use
DataFlow::ConfigSigmodules - Modernizing Go queries to use the shared dataflow library
- Ensuring query result equivalence during dataflow API migration
- Existing Go CodeQL query using v1 dataflow API that you want to migrate
- Existing unit tests for the query
- Understanding of the query's detection purpose
- Access to CodeQL Development MCP Server tools
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}v2 (Modern):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;| v1 API | v2 API | Purpose |
|---|---|---|
DataFlow::Configuration |
DataFlow::ConfigSig |
Configuration signature |
isSanitizer |
isBarrier |
Stop data flow propagation |
isAdditionalTaintStep |
isAdditionalFlowStep |
Custom flow steps |
this.hasFlow(source, sink) |
MyFlow::flow(source, sink) |
Query flow paths |
Go dataflow uses multiple node representations:
ExprNode: AST expression nodes (e.g., function calls, literals)ParameterNode: Function parameter nodesInstructionNode: IR (intermediate representation) instruction nodesRemoteFlowSource: Predefined sources for user-controllable input
Critical: Before any code changes, capture current query behavior.
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Save the output - this is your reference for query result equivalence.
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baselineThis ensures you can verify equivalence after migration.
Review the query for v1 API usage:
class X extends DataFlow::ConfigurationisSanitizerpredicatesisAdditionalTaintSteppredicatesthis.hasFlow(source, sink)queries
Identify how the query uses Go dataflow constructs:
- AST-to-IR mappings (e.g.,
asExpr(),asInstruction()) RemoteFlowSourcefor user input- Go-specific sources:
os.Args,os.Getenv, HTTP request parameters - Go-specific sinks:
os/exec.Command,database/sql.Query, file operations
Before:
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getTarget().hasQualifiedName("os/exec", "Command") and
sink = call.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
from MyConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "Untrusted data flows to command execution"After:
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getTarget().hasQualifiedName("os/exec", "Command") and
sink = call.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
module MyFlow = TaintTracking::Global<MyConfig>;
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select sink, "Untrusted data flows to command execution"isSanitizer→isBarrier: Change method name only, logic unchangedisAdditionalTaintStep→isAdditionalFlowStep: Change method name only
Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):
- Remove configuration variable from
fromclause - Use module flow predicate directly
Ensure proper node type handling:
// v1 and v2 both support these conversions
DataFlow::Node n;
Expr e = n.asExpr(); // AST expression
Instruction i = n.asInstruction(); // IR instruction
Parameter p = n.asParameter(); // Function parameterRemoteFlowSource works identically in v1 and v2:
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource or
source.asExpr().(CallExpr).getTarget().hasQualifiedName("os", "Getenv") or
// In Go, main function is in package main with empty qualifier
source.asParameter().getFunction().hasQualifiedName("", "main")
}For concurrent flow patterns, ensure proper tracking:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Basic channel send → receive flow
// Note: This is a simplified example for unbuffered channels.
// Complex scenarios (buffered channels, select statements, channel closures)
// require more sophisticated tracking.
exists(SendStmt send, RecvExpr recv |
n1.asExpr() = send.getValue() and
n2.asExpr() = recv and
send.getChannel() = recv.getChannel()
)
}Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Fix any compilation errors before testing.
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Critical: Results MUST match baseline from Phase 1.
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expectedSuccess: Empty diff (identical results) Failure: Any differences require investigation and fixes
If baseline tests pass, add more test cases to ensure robustness:
Create additional test files covering:
- Complex goroutine data sharing patterns
- Interface type assertions and conversions
- Error handling flow patterns (ignored errors, wrapped errors)
- Stdlib sink variations (
exec.CommandContext,sql.Prepare) - Channel-based concurrent flows
For each new test:
- Add test code to
Example2.go,Example3.go, etc. - Update
.expectedfile with anticipated results - Re-extract test database with
codeql_test_extract - Run tests to verify
Run query on realistic database and monitor performance:
{
"query": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"database": "<path-to-realistic-go-database>",
"searchPath": ["<query-pack>"]
}If performance degrades significantly, consider:
- Caching expensive predicates with
cached - Using local flow instead of global flow where possible
- Limiting scope with additional constraints
Ensure query metadata reflects v2 API usage:
/**
* @name Command Injection via Untrusted Data
* @description Executes system commands with user-controllable data
* @kind path-problem
* @id go/command-injection
* @tags security
*/
import go
import DataFlow::PathGraph- Remove v1 baseline files after verification
- Add migration notes in query comments if helpful
- Format query with
codeql_query_format
Go's explicit error handling affects dataflow:
// Track flows through error-returning functions
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(CallExpr call, DataFlow::ResultNode result |
n1 = call.getAnArgument() and
n2 = result and
result.getCall() = call and
// Function returns (value, error) pair - track the value (index 0)
call.getType() instanceof TupleType and
result.hasResultIndex(0)
)
}Track flows through interface conversions:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(TypeAssertExpr assertion |
n1.asExpr() = assertion.getExpr() and
n2.asExpr() = assertion
)
}Track flows through pointer operations:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(StarExpr deref |
n1.asExpr() = deref.getBase() and
n2.asExpr() = deref
)
}codeql_test_run: Run tests and compare with expected resultscodeql_test_extract: Extract test databases from Go source codecodeql_query_compile: Compile queries and check for errorscodeql_query_run: Run queries for analysiscodeql_bqrs_decode: Decode binary query resultscodeql_query_format: Format query files for consistencycodeql_pack_install: Install query pack dependencies
❌ Don't:
- Skip baseline test establishment before migration
- Change query logic alongside API migration (separate concerns)
- Accept test results without verifying equivalence
- Remove v1 baseline until migration is confirmed successful
- Ignore performance regressions
- Forget to update imports if needed
✅ Do:
- Establish test baseline BEFORE any changes
- Make purely mechanical API changes first
- Verify exact result equivalence after migration
- Keep v1 baseline for comparison during migration
- Test edge cases specific to Go (goroutines, channels, interfaces)
- Document any intentional behavior changes separately
If results differ after migration:
- Check node type conversions: Ensure
asExpr(),asInstruction()usage is correct - Verify predicate renames: Confirm
isBarriervsisSanitizerlogic is identical - Review flow predicates: Check
isAdditionalFlowStepmirrorsisAdditionalTaintStep - Inspect missing results: Use
MyFlow::flow(source, sink)for debugging partial flows - Debug with partial flow: Use flow exploration to find missing edges
- New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- Analyzing data flow in Go - Go dataflow guide
- CodeQL Go Library Reference - Standard library documentation
- Create CodeQL Query TDD Generic - TDD workflow for queries
- QSpec Reference for Go - Go-specific QSpec patterns
- Go Query Development Prompts - Go query guidance
Your dataflow migration is successful when:
- ✅ Test baseline established before migration
- ✅ Query compiles without errors using v2 API
- ✅ All configuration classes converted to modules
- ✅ All
isSanitizerrenamed toisBarrier - ✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - ✅ All
cfg.hasFlow()calls replaced with module flow predicates - ✅ Test results EXACTLY match v1 baseline (zero diff)
- ✅ No performance regressions
- ✅ Query metadata updated appropriately
- ✅ Go-specific patterns (goroutines, channels, errors) handled correctly