A robust CodeQL extractor for Core Data Services (CDS) files used in SAP Cloud Application Programming (CAP) model projects. This extractor processes .cds files and compiles them into .cds.json files for CodeQL analysis while maintaining project-aware parsing and dependency resolution.
The CodeQL CDS extractor is designed to efficiently process CDS projects by:
- Project-Aware Processing: Analyzes CDS files as related project configurations rather than independent definitions
- Optimized Dependency Management: Caches and reuses
@sap/cdsand@sap/cds-dkdependencies across projects - Enhanced Precision: Reduces false-positives in CodeQL queries by understanding cross-file relationships
- Performance Optimization: Avoids duplicate processing and unnecessary dependency installations
The extractor uses an autobuild approach with the following key components:
cds-extractor.ts: Main entry point that orchestrates the extraction processsrc/cds/parser/: CDS project discovery and dependency graph buildingsrc/cds/compiler/: Compilation orchestration and.cds.jsongenerationsrc/packageManager/: Dependency installation and cachingsrc/logging/: Unified logging and performance trackingsrc/environment.ts: Environment setup and validationsrc/codeql.ts: CodeQL JavaScript extractor integration
- Environment Setup: Validates CodeQL tools and system requirements
- Project Discovery: Recursively scans for CDS projects and builds dependency graph
- Dependency Management: Installs and caches required CDS compiler dependencies
- CDS Compilation: Compiles
.cdsfiles to.cds.jsonusing project-aware compilation - JavaScript Extraction: Runs CodeQL's JavaScript extractor on source and compiled files
- Node.js (accessible via
nodecommand) - CodeQL CLI tools
- SAP CDS projects with
.cdsfiles
The extractor is typically invoked by CodeQL during database creation:
codeql database create --language=cds --source-root=/path/to/project my-databaseFor development and testing purposes:
# Build the extractor
npm run build
# Run directly (from project source root)
node dist/cds-extractor.js /path/to/source/root
⚠️ IMPORTANT NOTE: Any changes to the CDS extractor's compilation task behavior (including how and wherecds compilecommands are executed, project detection logic, or output file generation patterns) MUST be reflected in theextractors/cds/tools/test/cds-compilation-for-actions.test.shscript. The.github/workflows/run-codeql-unit-tests-javascript.ymlworkflow executes this script during the "Compile CAP CDS files" step to simulate the CDS extractor's compilation process for unit tests. If the script and extractor implementations diverge, theCodeQL - Run Unit Tests (javascript)workflow will fail on PRs, causing status check failures. Always review and update the test script when modifying compilation behavior to maintain consistency between local testing and CI/CD environments.
extractors/cds/tools/
├── cds-extractor.ts # Main entry point
├── src/ # Source code modules
│ ├── cds/ # CDS-specific functionality
│ │ ├── compiler/ # Compilation orchestration
│ │ └── parser/ # Project discovery and parsing
│ ├── logging/ # Logging and performance tracking
│ ├── packageManager/ # Dependency management
│ ├── codeql.ts # CodeQL integration
│ ├── diagnostics.ts # Error reporting
│ ├── environment.ts # Environment setup
│ ├── filesystem.ts # File system utilities
│ └── utils.ts # General utilities
├── test/ # Test suites
├── dist/ # Compiled JavaScript output
└── package.json # Project configuration
# Install dependencies
npm install
# Build TypeScript to JavaScript
npm run build
# Run all checks and build
npm run build:all# Run tests
npm test
# Run tests with coverage
npm run test:coverage
# Run tests in watch mode
npm run test:watch# Lint TypeScript files
npm run lint
# Auto-fix linting issues
npm run lint:fix
# Format code
npm run formatThe extractor respects several CodeQL environment variables:
CODEQL_DIST: Path to CodeQL distributionCODEQL_EXTRACTOR_CDS_WIP_DATABASE: Target database pathLGTM_INDEX_FILTERS: File filtering configuration
Projects are detected based on:
- Presence of
package.jsonfiles - CDS files (
.cds) in the project directory tree - Valid CDS dependencies (
@sap/cds,@sap/cds-dk) in package.json
The extractor uses a sophisticated compilation approach:
- Dependency Graph Building: Maps relationships between CDS projects
- Smart Caching: Reuses compiled outputs and dependency installations
- Error Recovery: Handles compilation failures gracefully
- Performance Tracking: Monitors compilation times and resource usage
- Shared Dependency Cache: Single installation per unique dependency combination
- Isolated Environments: Dependencies installed in temporary cache directories
- No Source Modification: Original project files remain unchanged
- Project-Level Compilation: Compiles related CDS files together
- Duplicate Avoidance: Prevents redundant processing of imported files
- Memory Tracking: Monitors and reports memory usage throughout extraction
- Large Codebase Support: Optimized for enterprise-scale CDS projects
- Parallel Processing: Where possible, processes independent projects concurrently
- Resource Management: Cleans up temporary files and cached dependencies
The CDS extractor attempts to optimize performance for most projects by caching the installation of the unique combinations of resolved CDS dependencies across all projects under a given source root.
The "unique combinations of resolved CDS dependencies" means that we resolve the latest available version within the semantic version range for each @sap/cds and @sap/cds-dk dependency specified in the package.json file for a given CAP project.
In practice, this means that if "project-a" requires @sap/cds@^6.0.0 and "project-b" requires @sap/cds@^7.0.0 while the latest available version is @sap/cds@9.0.0 (as a trivial example), the extractor will install @sap/cds@9.0.0 once and reuse it for both projects.
This is much faster than installing all dependencies for every project individually, especially for large projects with many CDS files. However, this approach has some limitations and trade-offs:
- This latest-first approach is more likely to choose the same version for multiple projects, which can reduce analysis time and can improve consistency in analysis between projects.
- This approach does not read (or respect) the
package-lock.jsonfile, which means that we are more likely to use acdsversion that is different from the one most recently tested/used by the project developers. - We are more likely to encounter incompatibility issues where a particular project hasn't been tested with the latest version of
@sap/cdsor@sap/cds-dk.
We can mitigate some of these issues through a (to be implemented) compilation retry mechanism for projects where some CDS compilation task(s) fail to produce the expected .cds.json output file(s).
The proposed retry mechanism would install the full set of dependencies for the affected project(s) while respecting the package-lock.json file, and then re-run the compilation for the affected project(s).
TODO: retry mechanism expected before next release of the CDS extractor
TODO: implement installation of dependencies required for compilation to succeed for a given project
The CDS extractor uses the cds compile command to compile .cds files into .cds.json files, which are then processed by CodeQL's JavaScript extractor.
Where possible, a single model.cds.json file is generated for each project, containing all the compiled definitions from the project's .cds files. This results in a faster extraction process overall with minimal duplication of CDS code elements (e.g., annotations, entities, services, etc.) within the CodeQL database created from the extraction process.
Where project-level compilation is not possible (e.g., due to project structure), the extractor generates individual .cds.json files for each .cds file in the project. The main downside to this approach is that if one .cds file imports another .cds file, the imported definitions will be duplicated in the CodeQL database, which can lead to false positives in queries that expect unique definitions.
TODO: use the unique (session) ID of the CDS extractor run to as the `<session>` part of `<basename>.<session>.cds.json` and set JS extractor env vars to only extractor `.<session>.cds.json` files
The current version of the CDS extractor expects CAP projects to follow the default project structure, particularly regarding the names of the (app, db, & srv) subdirectories in which the extractor will look for .cds files to process (in addition to the root directory of the project).
The proposed solution will use the cds env command to discover configurations that affect the structure of the project and/or the expected "compilation tasks" for the project, such as any user customization of environment configurations such as:
cds.folders.appcds.folders.dbcds.folders.srv
TODO : add support for integration with `cds env` CLI command as a means of consistently getting configurations for CAP projects
The extractor processes both:
- Source Files: Original
.cdsfiles for source code analysis - Compiled Files: Generated
.cds.jsonfiles for semantic analysis
- Integrates with CodeQL's JavaScript extractor for final database population
- Maintains proper file relationships and source locations
- Supports CodeQL's standard indexing and filtering mechanisms
- Missing Node.js: Ensure
nodecommand is available in PATH - CDS Dependencies: Verify projects have valid
@sap/cdsdependencies - Compilation Failures: Check CDS syntax and cross-file references
- Memory Issues: Monitor memory usage for very large projects
The extractor provides comprehensive logging:
- Performance Tracking: Times for each extraction phase
- Memory Usage: Memory consumption at key milestones
- Error Reporting: Detailed error messages with context
- Project Discovery: Information about detected CDS projects
info: General progress and milestone informationwarn: Non-critical issues that don't prevent extractionerror: Critical failures that may affect extraction quality