|
| 1 | +# CodeQL CDS Extractor |
| 2 | + |
| 3 | +A robust CodeQL extractor for [Core Data Services (CDS)][CDS] files used in [SAP Cloud Application Programming (CAP)][CAP] model projects. This extractor processes `.cds` files and compiles them into `.cds.json` files for CodeQL analysis while maintaining project-aware parsing and dependency resolution. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The CodeQL CDS extractor is designed to efficiently process CDS projects by: |
| 8 | + |
| 9 | +- **Project-Aware Processing**: Analyzes CDS files as related project configurations rather than independent definitions |
| 10 | +- **Optimized Dependency Management**: Caches and reuses `@sap/cds` and `@sap/cds-dk` dependencies across projects |
| 11 | +- **Enhanced Precision**: Reduces false-positives in CodeQL queries by understanding cross-file relationships |
| 12 | +- **Performance Optimization**: Avoids duplicate processing and unnecessary dependency installations |
| 13 | + |
| 14 | +## Architecture |
| 15 | + |
| 16 | +The extractor uses an `autobuild` approach with the following key components: |
| 17 | + |
| 18 | +### Core Components |
| 19 | + |
| 20 | +- **`cds-extractor.ts`**: Main entry point that orchestrates the extraction process |
| 21 | +- **`src/cds/parser/`**: CDS project discovery and dependency graph building |
| 22 | +- **`src/cds/compiler/`**: Compilation orchestration and `.cds.json` generation |
| 23 | +- **`src/packageManager/`**: Dependency installation and caching |
| 24 | +- **`src/logging/`**: Unified logging and performance tracking |
| 25 | +- **`src/environment.ts`**: Environment setup and validation |
| 26 | +- **`src/codeql.ts`**: CodeQL JavaScript extractor integration |
| 27 | + |
| 28 | +### Extraction Process |
| 29 | + |
| 30 | +1. **Environment Setup**: Validates CodeQL tools and system requirements |
| 31 | +2. **Project Discovery**: Recursively scans for CDS projects and builds dependency graph |
| 32 | +3. **Dependency Management**: Installs and caches required CDS compiler dependencies |
| 33 | +4. **CDS Compilation**: Compiles `.cds` files to `.cds.json` using project-aware compilation |
| 34 | +5. **JavaScript Extraction**: Runs CodeQL's JavaScript extractor on source and compiled files |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +### Prerequisites |
| 39 | + |
| 40 | +- Node.js (accessible via `node` command) |
| 41 | +- CodeQL CLI tools |
| 42 | +- SAP CDS projects with `.cds` files |
| 43 | + |
| 44 | +### Running the Extractor |
| 45 | + |
| 46 | +The extractor is typically invoked by CodeQL during database creation: |
| 47 | + |
| 48 | +```bash |
| 49 | +codeql database create --language=cds --source-root=/path/to/project my-database |
| 50 | +``` |
| 51 | + |
| 52 | +### Manual Execution |
| 53 | + |
| 54 | +For development and testing purposes: |
| 55 | + |
| 56 | +```bash |
| 57 | +# Build the extractor |
| 58 | +npm run build |
| 59 | + |
| 60 | +# Run directly (from project source root) |
| 61 | +node dist/cds-extractor.js /path/to/source/root |
| 62 | +``` |
| 63 | + |
| 64 | +## Development |
| 65 | + |
| 66 | +### Project Structure |
| 67 | + |
| 68 | +```text |
| 69 | +extractors/cds/tools/ |
| 70 | +├── cds-extractor.ts # Main entry point |
| 71 | +├── src/ # Source code modules |
| 72 | +│ ├── cds/ # CDS-specific functionality |
| 73 | +│ │ ├── compiler/ # Compilation orchestration |
| 74 | +│ │ └── parser/ # Project discovery and parsing |
| 75 | +│ ├── logging/ # Logging and performance tracking |
| 76 | +│ ├── packageManager/ # Dependency management |
| 77 | +│ ├── codeql.ts # CodeQL integration |
| 78 | +│ ├── diagnostics.ts # Error reporting |
| 79 | +│ ├── environment.ts # Environment setup |
| 80 | +│ ├── filesystem.ts # File system utilities |
| 81 | +│ └── utils.ts # General utilities |
| 82 | +├── test/ # Test suites |
| 83 | +├── dist/ # Compiled JavaScript output |
| 84 | +└── package.json # Project configuration |
| 85 | +``` |
| 86 | + |
| 87 | +### Building |
| 88 | + |
| 89 | +```bash |
| 90 | +# Install dependencies |
| 91 | +npm install |
| 92 | + |
| 93 | +# Build TypeScript to JavaScript |
| 94 | +npm run build |
| 95 | + |
| 96 | +# Run all checks and build |
| 97 | +npm run build:all |
| 98 | +``` |
| 99 | + |
| 100 | +### Testing |
| 101 | + |
| 102 | +```bash |
| 103 | +# Run tests |
| 104 | +npm test |
| 105 | + |
| 106 | +# Run tests with coverage |
| 107 | +npm run test:coverage |
| 108 | + |
| 109 | +# Run tests in watch mode |
| 110 | +npm run test:watch |
| 111 | +``` |
| 112 | + |
| 113 | +### Code Quality |
| 114 | + |
| 115 | +```bash |
| 116 | +# Lint TypeScript files |
| 117 | +npm run lint |
| 118 | + |
| 119 | +# Auto-fix linting issues |
| 120 | +npm run lint:fix |
| 121 | + |
| 122 | +# Format code |
| 123 | +npm run format |
| 124 | +``` |
| 125 | + |
| 126 | +## Configuration |
| 127 | + |
| 128 | +### Environment Variables |
| 129 | + |
| 130 | +The extractor respects several CodeQL environment variables: |
| 131 | + |
| 132 | +- `CODEQL_DIST`: Path to CodeQL distribution |
| 133 | +- `CODEQL_EXTRACTOR_CDS_WIP_DATABASE`: Target database path |
| 134 | +- `LGTM_INDEX_FILTERS`: File filtering configuration |
| 135 | + |
| 136 | +### CDS Project Detection |
| 137 | + |
| 138 | +Projects are detected based on: |
| 139 | + |
| 140 | +- Presence of `package.json` files |
| 141 | +- CDS files (`.cds`) in the project directory tree |
| 142 | +- Valid CDS dependencies (`@sap/cds`, `@sap/cds-dk`) in package.json |
| 143 | + |
| 144 | +### Compilation Strategy |
| 145 | + |
| 146 | +The extractor uses a sophisticated compilation approach: |
| 147 | + |
| 148 | +1. **Dependency Graph Building**: Maps relationships between CDS projects |
| 149 | +2. **Smart Caching**: Reuses compiled outputs and dependency installations |
| 150 | +3. **Error Recovery**: Handles compilation failures gracefully |
| 151 | +4. **Performance Tracking**: Monitors compilation times and resource usage |
| 152 | + |
| 153 | +## Performance Features |
| 154 | + |
| 155 | +### Optimized Dependency Management |
| 156 | + |
| 157 | +- **Shared Dependency Cache**: Single installation per unique dependency combination |
| 158 | +- **Isolated Environments**: Dependencies installed in temporary cache directories |
| 159 | +- **No Source Modification**: Original project files remain unchanged |
| 160 | + |
| 161 | +### Efficient Processing |
| 162 | + |
| 163 | +- **Project-Level Compilation**: Compiles related CDS files together |
| 164 | +- **Duplicate Avoidance**: Prevents redundant processing of imported files |
| 165 | +- **Memory Tracking**: Monitors and reports memory usage throughout extraction |
| 166 | + |
| 167 | +### Scalability |
| 168 | + |
| 169 | +- **Large Codebase Support**: Optimized for enterprise-scale CDS projects |
| 170 | +- **Parallel Processing**: Where possible, processes independent projects concurrently |
| 171 | +- **Resource Management**: Cleans up temporary files and cached dependencies |
| 172 | + |
| 173 | +## Integration with CodeQL |
| 174 | + |
| 175 | +### File Processing |
| 176 | + |
| 177 | +The extractor processes both: |
| 178 | + |
| 179 | +- **Source Files**: Original `.cds` files for source code analysis |
| 180 | +- **Compiled Files**: Generated `.cds.json` files for semantic analysis |
| 181 | + |
| 182 | +### Database Population |
| 183 | + |
| 184 | +- Integrates with CodeQL's JavaScript extractor for final database population |
| 185 | +- Maintains proper file relationships and source locations |
| 186 | +- Supports CodeQL's standard indexing and filtering mechanisms |
| 187 | + |
| 188 | +## Troubleshooting |
| 189 | + |
| 190 | +### Common Issues |
| 191 | + |
| 192 | +1. **Missing Node.js**: Ensure `node` command is available in PATH |
| 193 | +2. **CDS Dependencies**: Verify projects have valid `@sap/cds` dependencies |
| 194 | +3. **Compilation Failures**: Check CDS syntax and cross-file references |
| 195 | +4. **Memory Issues**: Monitor memory usage for very large projects |
| 196 | + |
| 197 | +### Debugging |
| 198 | + |
| 199 | +The extractor provides comprehensive logging: |
| 200 | + |
| 201 | +- **Performance Tracking**: Times for each extraction phase |
| 202 | +- **Memory Usage**: Memory consumption at key milestones |
| 203 | +- **Error Reporting**: Detailed error messages with context |
| 204 | +- **Project Discovery**: Information about detected CDS projects |
| 205 | + |
| 206 | +### Log Levels |
| 207 | + |
| 208 | +- `info`: General progress and milestone information |
| 209 | +- `warn`: Non-critical issues that don't prevent extraction |
| 210 | +- `error`: Critical failures that may affect extraction quality |
| 211 | + |
| 212 | +## References |
| 213 | + |
| 214 | +- [SAP Cloud Application Programming Model][CAP] |
| 215 | +- [Core Data Services (CDS)][CDS] |
| 216 | +- [Conceptual Definition Language (CDL)][CDL] |
| 217 | +- [CodeQL Documentation](https://codeql.github.com/docs/) |
| 218 | + |
| 219 | +[CAP]: https://cap.cloud.sap/docs/about/ |
| 220 | +[CDS]: https://cap.cloud.sap/docs/cds/ |
| 221 | +[CDL]: https://cap.cloud.sap/docs/cds/cdl |
0 commit comments