@@ -27,6 +27,16 @@ For creating/updating **query documentation files** (`.md` or `.qhelp`), use the
2727- Tuple counts: ` grep -E "tuples|rows" <logfile> `
2828- RA operations: ` grep -E "SCAN|JOIN|AGGREGATE" <logfile> `
2929
30+ ## Choosing a Database
31+
32+ Several steps below require a CodeQL database. Determine which database to use:
33+
34+ 1 . ** User-provided ` databasePath ` ** — use this if provided and valid (check with #codeql_resolve_database).
35+ 2 . ** Test database** — if Step 1 finds tests, run them in Step 3 to create a ` .testproj ` database.
36+ 3 . ** No database available** — skip Steps 4-6, and base the explanation on source code analysis only.
37+
38+ Store the chosen database path as ` $DB ` for use in Steps 4-6.
39+
3040## Workflow Checklist
3141
3242You MUST use the following MCP server tools in sequence to gather context before generating your explanation:
@@ -51,16 +61,19 @@ You MUST use the following MCP server tools in sequence to gather context before
5161 - Parameters: ` tests ` = array of test directories from Step 1
5262 - Purpose: Ensures test database is created and current with test code
5363 - Gather: Test pass/fail status, test database path (` .testproj ` directory)
64+ - ** If no tests exist** : Skip this step. Use user-provided ` databasePath ` as ` $DB ` .
65+
66+ ### Phase 3: Code Structure Analysis (requires ` $DB ` )
5467
55- ### Phase 3: Code Structure Analysis ( if test database exists)
68+ Skip this phase entirely if no database is available.
5669
5770- [ ] ** Step 4: Generate PrintAST output**
5871 - Tool: #codeql_query_run
5972 - Parameters:
6073 - ` queryName ` : ` "PrintAST" `
6174 - ` queryLanguage ` : provided language
62- - ` database ` : test database path from Step 3
63- - ` sourceFiles ` : test source file names
75+ - ` database ` : ` $DB `
76+ - ` sourceFiles ` : test source file names (or representative source files from the database)
6477 - ` format ` : ` "graphtext" `
6578 - Gather: AST hierarchy showing code structure representation
6679
@@ -69,24 +82,30 @@ You MUST use the following MCP server tools in sequence to gather context before
6982 - Parameters:
7083 - ` queryName ` : ` "PrintCFG" `
7184 - ` queryLanguage ` : provided language
72- - ` database ` : test database path from Step 3
73- - ` sourceFunction ` : key function name(s) from test code
85+ - ` database ` : ` $DB `
86+ - ` sourceFunction ` : key function name(s) from test code or query source
7487 - ` format ` : ` "graphtext" `
7588 - Gather: Control flow graph showing execution paths
7689
77- ### Phase 4: Query Profiling and Evaluation Order
90+ ### Phase 4: Query Profiling and Evaluation Order (requires ` $DB ` )
91+
92+ Skip this phase entirely if no database is available.
7893
79- - [ ] ** Step 6: Profile the query**
80- - Tool: #profile_codeql_query
94+ - [ ] ** Step 6a: Run the query with evaluator logging**
95+ - Tool: #codeql_query_run (or #codeql_database_analyze)
96+ - Run the query against ` $DB ` with evaluator logging enabled
97+ - The tool returns the path to the evaluator log file (` .jsonl ` )
98+
99+ - [ ] ** Step 6b: Profile from evaluator logs**
100+ - Tool: #profile_codeql_query_from_logs
81101 - Parameters:
82- - ` queryPath ` : the query file path
83- - ` database ` : If ` databasePath ` input was provided and valid, use it; otherwise use test database from Step 3
84- - Gather: Query evaluator log, pipeline execution order, timing data
102+ - ` evaluatorLog ` : path to the evaluator log file from Step 6a
103+ - Gather: Pipeline execution order, predicate timing data, tuple counts
85104 - ** Critical** : This reveals the actual bottom-up evaluation order of predicates
86105
87106- [ ] ** Step 7: Analyze evaluator log** (for large logs)
88107 - Use CLI grep commands to extract key performance data from evaluator logs:
89- - Replace ` <evaluator-log-file> ` with the actual log file path from Step 6
108+ - Replace ` <evaluator-log-file> ` with the log file path from Step 6a
90109
91110 ``` bash
92111 # Find pipeline evaluation order and timing
@@ -127,138 +146,28 @@ Understanding the query profiler output is critical for explaining how the query
127146
128147### Verbal Explanation Structure
129148
130- ``` markdown
131- ## Query Overview
132-
133- [ 2-3 sentence summary of what this query detects and why it matters]
134-
135- ## Query Metadata
136-
137- | Property | Value |
138- | ----------- | ---------------------------------------- |
139- | Name | [ from @name ] |
140- | Description | [ from @description ] |
141- | Kind | [ problem/path-problem/diagnostic/metric] |
142- | ID | [ from @id ] |
143- | Tags | [ from @tags ] |
144- | Precision | [ high/medium/low] |
145- | Severity | [ error/warning/recommendation] |
146-
147- ## What This Query Detects
148-
149- [ Detailed explanation of the vulnerability/issue being detected, including:]
150-
151- - Security implications (CWE, OWASP if applicable)
152- - Why this pattern is problematic
153- - Real-world attack scenarios or code quality impacts
154-
155- ## How the Query Works
156-
157- ### Bottom-Up Evaluation Order
158-
159- [ Explain how CodeQL evaluates this query from the profiler data. CodeQL always evaluates queries "bottom-up" - starting with base predicates and building up to the final select clause.]
160-
161- ** Evaluation Timeline** (from profiler data):
162-
163- | Order | Predicate/Pipeline | Time (ms) | Tuples |
164- | ----- | ---------------------------- | --------- | ------- |
165- | 1 | [ first evaluated predicate] | [ time] | [ count] |
166- | 2 | [ second evaluated predicate] | [ time] | [ count] |
167- | ... | ... | ... | ... |
168-
169- ### Key Components
170-
171- #### Imports and Dependencies
172-
173- [ List and explain imports]
174-
175- #### Classes and Characteristic Predicates
149+ Generate a single markdown document with these sections in order:
176150
177- [ For each class, explain what it represents and its characteristic predicate]
178-
179- #### Helper Predicates
180-
181- [ Explain each predicate, its purpose, parameters, and return type]
182-
183- #### Data Flow Configuration (if applicable)
184-
185- - ** Sources** : [ what defines a source]
186- - ** Sinks** : [ what defines a sink]
187- - ** Sanitizers/Barriers** : [ what stops the flow]
188- - ** Additional Flow Steps** : [ custom flow propagation]
189-
190- #### Main Query (from/where/select)
191-
192- [ Explain the final query structure]
193-
194- ## Test Code Analysis
195-
196- ### AST Structure
197-
198- [ Summarize key insights from PrintAST output - what AST classes represent the test code patterns]
199-
200- ### Control Flow
201-
202- [ Summarize key insights from PrintCFG output - execution paths through key functions]
203-
204- ## Example Patterns
205-
206- ### Positive Test Cases (Should Match)
207-
208- [ Code patterns that should trigger the query, with explanation]
209-
210- ### Negative Test Cases (Should Not Match)
211-
212- [ Code patterns that should NOT trigger the query, with explanation]
213-
214- ## Performance Characteristics
215-
216- [ Based on profiler data, note:]
217-
218- - Most expensive predicates (by evaluation time)
219- - Predicates with highest tuple counts
220- - Any potential performance optimizations
221-
222- ## Limitations and Edge Cases
223-
224- [ Note any patterns the query might miss or known false positive scenarios]
225- ```
151+ 1 . ** Query Overview** — 2-3 sentence summary of what the query detects and why it matters.
152+ 2 . ** Query Metadata** — Table with: Name, Description, Kind, ID, Tags, Precision, Severity (from ` @ ` annotations).
153+ 3 . ** What This Query Detects** — Security implications (CWE/OWASP if applicable), why the pattern is problematic, real-world attack scenarios.
154+ 4 . ** How the Query Works** — Two subsections:
155+ - ** Bottom-Up Evaluation Order** : Evaluation timeline table (Order, Predicate/Pipeline, Time ms, Tuples) derived from profiler data. Explain that CodeQL evaluates bottom-up.
156+ - ** Key Components** : Imports, classes and characteristic predicates, helper predicates, data flow configuration (sources, sinks, sanitizers, additional flow steps), main query ` from ` /` where ` /` select ` .
157+ 5 . ** Test Code Analysis** (if database was available) — AST structure insights from PrintAST, control flow insights from PrintCFG.
158+ 6 . ** Example Patterns** — Positive test cases (should match) and negative test cases (should not match) with explanations.
159+ 7 . ** Performance Characteristics** (if profiler data available) — Most expensive predicates, highest tuple counts, optimization opportunities.
160+ 8 . ** Limitations and Edge Cases** — Patterns the query might miss, known false positive scenarios.
226161
227162### Visual Explanation: Mermaid Evaluation Diagram
228163
229- Generate a mermaid diagram showing the bottom-up evaluation order based on profiler data:
230-
231- ``` mermaid
232- flowchart BU
233- subgraph "Base Predicates (Evaluated First)"
234- A[Predicate1] --> B[Predicate2]
235- C[Class1.member] --> B
236- end
237-
238- subgraph "Intermediate Predicates"
239- B --> D[HelperPredicate]
240- D --> E[DataFlowConfig.isSource]
241- D --> F[DataFlowConfig.isSink]
242- end
243-
244- subgraph "Flow Analysis"
245- E --> G[DataFlow::hasFlow]
246- F --> G
247- end
248-
249- subgraph "Final Query (Evaluated Last)"
250- G --> H[select clause]
251- end
252- ```
253-
254- ** Diagram Guidelines:**
255-
256- - Direction: Bottom-Up (` BU ` ) to reflect actual evaluation
257- - Group predicates by evaluation phase
258- - Show dependencies between predicates
259- - Label with actual predicate/class names from the query
260- - Use timing data from profiler to order predicates accurately
164+ Generate a ` mermaid ` ` flowchart BU ` (bottom-up) diagram showing the evaluation order derived from profiler data. Guidelines:
165+
166+ - Use subgraphs to group predicates by evaluation phase (base, intermediate, flow analysis, final select)
167+ - Label nodes with actual predicate/class names from the query
168+ - Show dependency edges between predicates
261169- Annotate expensive operations with timing (e.g., ` [500ms] ` )
170+ - Direction must be ` BU ` to reflect bottom-up evaluation
262171
263172## Important Notes
264173
0 commit comments