Conversation
Introduces scripts/generate_rules/misra_help/, a two-stage pipeline for (mostly) idempotent generation of per-query .md help files. Uses MISRA rules as input and creates (or updates, as needed) documentation for codeql-coding-standards queries for C and C++. Focuses on immediate support for: - MISRA C 2012/2023 - MISRA C++ 2023. Stage 1: deterministic docling-based extraction and rendering, with a JSON sidecar for downstream consumption. Stage 2: a headless Python driver for the Copilot SDK that rewrites each help file from the JSON sidecar against a fixed Markdown schema and American English spelling. Adds docs via -> "scripts/generate_rules/misra_help/README.md"
| # Drop trailing references of the form "C90 [...]" / "C99 [...]" etc. | ||
| s = re.sub( | ||
| r"\s+(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\]" | ||
| r"(?:\s*[,;]?\s*(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\])*\s*$", |
| ap.add_argument("--standard", required=True, choices=SUPPORTED_STANDARDS, | ||
| help="MISRA standard to populate (the source language is " | ||
| "derived from this)") | ||
| ap.add_argument("--query-repo", type=Path, default=DEFAULT_QUERY_REPO, |
There was a problem hiding this comment.
IMO some of this stuff should be deleted as YAGNI.
I think it's totally fine to either assume that the working directory is the project root, or to find the project root via relative path to __FILE__. We already have other scripts that assume the help repo can be found via ../codeql-coding-standards-help.
Given the size of this PR, I'd rather not add too many bells and whistles
| # | ||
| # If none of those resolve to exactly one file, we abort with a clear message. | ||
| PDF_ENV_VARS = { | ||
| "MISRA-C-2023": "MISRA_C_PDF", |
There was a problem hiding this comment.
So just a clarification here. We should not differentiate C-2023 and C-2012 at all.
Every rule we have is both a part of MISRA C 2012, and a part of MISRA C 2023, there isn't an actual distinction. MISRA C 2012 with all amendments included = MISRA C 2023
| } | ||
|
|
||
| RULE_DIR_RE = re.compile(r"^(?:RULE|DIR)-\d+(?:-\d+){1,2}$") | ||
| QL_NAME_RE = re.compile(r"@name\s+(?:RULE|DIR)-\d+(?:-\d+){1,2}:\s+(?P<title>.+?)\s*$") |
There was a problem hiding this comment.
This is reproducing some prior art and needs to be consolidated.
We have normalized titles etc already in rule_packages/*.json. The script scripts/generate_rules/generate_package_files.py already takes the parsed rule_package data which is organized per rule and per query, and that's what's used to fill in the existing help template.
What's especially important is that some rule_package.json entries have an implementation_scope property (see here ) that's added to the query help. This is critical, because it is the only part of our query help that isn't a direct copy of the misra text, but rather describes expected FPs and FNs to the user.
| if not cli_pdf.is_file(): | ||
| raise SystemExit(f"error: --pdf {cli_pdf} does not exist") | ||
| return cli_pdf | ||
| env_var = PDF_ENV_VARS[standard] |
There was a problem hiding this comment.
This is another thing we should cut via YAGNI -- no need to support setting the pdf path as an environment variable, just creates more code to have to maintain
| f"error: ${env_var} is set to {p} which does not exist") | ||
| return p | ||
| matches: list[Path] = [] | ||
| for pattern in PDF_FILE_GLOBS[standard]: |
There was a problem hiding this comment.
this is also unnecessary magic
| p.add_argument("--model", default=DEFAULT_MODEL, | ||
| help=f"Copilot model id. Default: {DEFAULT_MODEL}. " | ||
| f"Known good: {', '.join(MODEL_FALLBACKS)}.") | ||
| p.add_argument("--no-overwrite", action="store_true", |
There was a problem hiding this comment.
Again, I'd probably prefer default behavior doesn't overwrite and it requires --overwrite to do so.
| import requests | ||
|
|
||
|
|
||
| SUPPORTED_STANDARDS = ("MISRA-C-2012", "MISRA-C-2023", "MISRA-C++-2023") |
There was a problem hiding this comment.
again, there aren't really two C standards
| return time.time() + slack_seconds >= self.expires_at | ||
|
|
||
|
|
||
| def fetch_copilot_token(oauth_token: str) -> CopilotToken: |
There was a problem hiding this comment.
Should we be using the copilot API, or copilot CLI?
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Prompt construction (mirrors codeql-coding-standards-agent/src/rewriteHelp.ts) |
There was a problem hiding this comment.
Note that this is a "nice to have."
The "must have" is that we have query help. As a first pass, this should be a word for word match to the MISRA documents.
| "8. End with these two sections verbatim, with the rule id and the short rule statement substituted in:", | ||
| " \"## Implementation notes\"", | ||
| " \"\"", | ||
| " \"None\"", |
There was a problem hiding this comment.
This is provided by the implementation_scope field in our rule_packages json files, and should not be None
Introduces
scripts/generate_rules/misra_help/, a two-stage pipeline for (mostly) idempotent generation of per-query.mdhelp files. It uses MISRA rule text as input and creates (or updates) documentation forcodeql-coding-standardsqueries in C and C++.Initial supported standards:
Stage 1 — deterministic, docling-based extraction and rendering, with a JSON sidecar for downstream consumption.
Stage 2 — a headless Python driver for the Copilot SDK that rewrites each help file from the JSON sidecar against a fixed Markdown schema, normalized to American English.
See
scripts/generate_rules/misra_help/README.mdfor usage, architecture, and operational notes.Description
Adds a new internal tooling package under
scripts/generate_rules/misra_help/that automates generation of per-query Markdown help files for MISRA C/C++ queries. No query files, query metadata, rule packages, shared libraries, tests,.expectedfiles, or release artifacts are modified by this PR — it is purely additive tooling (7 new files, ~2.1k lines, all underscripts/generate_rules/misra_help/).The pipeline is split so that the deterministic extraction stage can be re-run cheaply and audited independently of the LLM-driven rewrite stage. The JSON sidecar is the contract between the two stages, which keeps Stage 2 reproducible against a pinned input.
Change request type
.ql,.qll,.qlsor unit tests)Rules with added or modified queries
Release change checklist
A change note (development_handbook.md#change-notes) is required for any pull request which modifies:
If you are only adding new rule queries, a change note is not required.
Author: Is a change note required?
scripts/generate_rules/misra_help/; no release artifacts, query results, or query performance are affected.🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.
.ql/.qllfiles modified.Reviewer: Confirm that either a change note is not required or the change note is required and has been added.
Query development review checklist
For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:
Author
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Reviewer
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.