Skip to content

Commit 36bd835

Browse files
authored
Merge pull request #13 from advanced-security/copilot/sub-pr-12
Fix prettier formatting in CodeQL performance review prompt
2 parents 049d981 + 3d445a5 commit 36bd835

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed
Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
---
22
mode: agent
33
---
4+
45
You are reviewing CodeQL log output for performance issues.
56

6-
It is critical that you understand key aspects of CodeQL log output that can flag performance issues. Understanding the language being scanned is critical to the performance review process. You should be able to identify the language being scanned, the number of files in the database, and the number of lines of code in the database. You should also be able to identify the time taken to extract the code from the database, build the database, and analyze the code.
7+
It is critical that you understand key aspects of CodeQL log output that can flag performance issues. Understanding the language being scanned is critical to the performance review process. You should be able to identify the language being scanned, the number of files in the database, and the number of lines of code in the database. You should also be able to identify the time taken to extract the code from the database, build the database, and analyze the code.
78

89
In general, look for the following key aspects in the log output:
10+
911
- The time taken to extract the code from into the CodeQL database insert format (`Extracting ..` and `Done extracting ..` will be logged for each file)
1012
- The time taken to create/optimize the database indicates size/complexity (`TRAP import`)
1113
- The time taken to analyze the code (each query: `[##/## eval ###ms] Evaluation done; writing results to... `)
@@ -16,43 +18,43 @@ In general, look for the following key aspects in the log output:
1618

1719
These log files will be huge, instead of reading them line by line - run grep style commands in the cli to investigate the file.
1820

19-
2021
## Review Areas
2122

2223
### Excluding Code
2324

24-
This is one of the most important aspects of CodeQL performance. Excluding a file from analysis will speed up extraction, database creation, query execution, and result generation. We would expect to see some number of files excluded from the scan. Scanning unit tests or vendored dependencies is often not useful, and can slow down the scan. Any interpreted language or compiled that utilizes `build-mode: none` can take advantage of a `paths-ignore` array in a CodeQL configuration file.
25+
This is one of the most important aspects of CodeQL performance. Excluding a file from analysis will speed up extraction, database creation, query execution, and result generation. We would expect to see some number of files excluded from the scan. Scanning unit tests or vendored dependencies is often not useful, and can slow down the scan. Any interpreted language or compiled that utilizes `build-mode: none` can take advantage of a `paths-ignore` array in a CodeQL configuration file.
2526

2627
To analyze this aspect, look for the following key aspects in the log output:
28+
2729
- In general, the number of files in the database vs the number of files in the baseline should not match (`CodeQL scanned <# in DB> out of <# in baseline > <language> files ... in this invocation`) - this indicates no exclusions were made.
2830
- Extractor output `Done extracting /home/runner/work/<repo>/<repo>/src/public/static/3rd-party-static/<CommonPackageName.2.1.0>.js (11164 ms)`
29-
- Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`.
31+
- Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`.
3032
- Call out any timings > 1000ms for extraction `(11164 ms)` - often times this indicates a large bundled JS file (and other files in the same folder are often Generated or vendored).
3133

32-
3334
See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#reduce-the-amount-of-code-being-analyzed-in-a-single-workflow
3435

3536
### Hardware Recommendations
3637

37-
The default GitHub runner is 8GB of RAM and 2 CPUs. This is often not enough power for extracting code from large repos or scanning through complex databases. A RAM ~7GB `CODEQL_RAM: 6914` and 2 cores `CODEQL_THREADS: 2` will likely indicate this is running on the default runner.
38+
The default GitHub runner is 8GB of RAM and 2 CPUs. This is often not enough power for extracting code from large repos or scanning through complex databases. A RAM ~7GB `CODEQL_RAM: 6914` and 2 cores `CODEQL_THREADS: 2` will likely indicate this is running on the default runner.
3839

3940
The recommended hardware sizes for running CodeQL are based off of lines of code:
40-
- Small (<100 K lines of code) = 8 GB or higher 2 cores
41-
- Medium (100 K to 1 M lines of code) = 16 GB or higher 4 or 8 cores
42-
- Large (>1 M lines of code) = 64 GB or higher 8 cores
41+
42+
- Small (<100 K lines of code) = 8 GB or higher 2 cores
43+
- Medium (100 K to 1 M lines of code) = 16 GB or higher 4 or 8 cores
44+
- Large (>1 M lines of code) = 64 GB or higher 8 cores
4345

4446
See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#increase-the-memory-or-cores
4547

46-
`Compiling in one thread due to RAM limits.` is an indication that there is limited RAM available. This is not often critical as the CodeQL bundle is used that includes precompiled queries.
48+
`Compiling in one thread due to RAM limits.` is an indication that there is limited RAM available. This is not often critical as the CodeQL bundle is used that includes precompiled queries.
4749

4850
### Breaking apart monorepos
4951

50-
CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimal for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code.
51-
52-
Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo.
52+
CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimal for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code.
5353

54+
Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo.
5455

5556
To find this scenario - review the extractor logs and identify common project structures that might indicate individual applications that would not have any cross method calls OR data flows. Commonly applications will be organized by various techniques - if any of these appear like good candidates for separation, please call them out:
57+
5658
- monorepo structure (ex: `apps/` or `services/`)
5759
- front end web/api / middle tier api / back end data access
5860
- common project structures (ex: `src/` or `lib/` or `framework/` or `common/`)

0 commit comments

Comments
 (0)