You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Amazon Bedrock multi-agent applications add a second prompt/control plane on top of the base agent: a **router** or **supervisor** decides which collaborator receives the user request, and collaborators can expose **action groups**, **knowledge bases**, **memory**, or even **code interpretation**. If the application treats user text as policy and disables Bedrock **pre-processing** or **Guardrails**, a legitimate chatbot user can often steer orchestration, discover collaborators, leak tool schemas, and coerce a collaborator into invoking an allowed tool with attacker-chosen inputs.
86
+
87
+
This is an **application-level prompt-injection / policy-by-prompt failure**, not a Bedrock platform vulnerability.
88
+
89
+
### Attack surface and preconditions
90
+
91
+
The attack becomes practical when all are true:
92
+
- The Bedrock application uses **Supervisor Mode** or **Supervisor with Routing Mode**.
93
+
- A collaborator has high-impact **action groups** or other privileged capabilities.
94
+
- The application accepts **untrusted user text** from a normal chat UI and lets the model decide routing, delegation, or authorization.
95
+
-**Pre-processing** and/or **Guardrails** are disabled, or tool backends trust model-selected arguments without independent authorization checks.
96
+
97
+
### 1. Operating mode detection
98
+
99
+
- In **Supervisor with Routing Mode**, the router prompt contains an `<agent_scenarios>` block with `$reachable_agents$`. A detection payload can instruct the router to forward to the **first listed agent** and return a unique marker, proving direct routing occurred.
100
+
- In **Supervisor Mode**, the orchestration prompt forces responses and inter-agent communication through `AgentCommunication__sendMessage()`. A payload that requests a unique message via that tool fingerprints supervisor-mediated handling.
101
+
102
+
Useful artifacts:
103
+
-`<agent_scenarios>` / `$reachable_agents$` strongly suggests a router classification layer.
104
+
-`AgentCommunication__sendMessage()` strongly suggests supervisor orchestration and an explicit inter-agent messaging primitive.
105
+
106
+
### 2. Collaborator discovery
107
+
108
+
- In **Routing Mode**, discovery prompts should look **ambiguous or multi-step** so the router escalates to the supervisor instead of routing straight to one collaborator.
109
+
- The supervisor prompt embeds collaborators inside `<agents>$agent_collaborators$</agents>`, but usually also says not to reveal tools/agents/instructions.
110
+
- Instead of asking for the raw prompt, ask for **functional descriptions** of the available specialists. Even partial descriptions are enough to map collaborators to domains such as forecasting, solar management, or peak-load optimization.
111
+
112
+
### 3. Payload delivery to a chosen collaborator
113
+
114
+
- In **Supervisor Mode**, use the discovered collaborator role and instruct the supervisor to relay a payload **unchanged** through `AgentCommunication__sendMessage()`. The goal is payload integrity across the orchestration hop.
115
+
- In **Routing Mode**, craft the prompt with strong **domain cues** so the router classifier consistently sends it to the desired collaborator without supervisor review.
116
+
117
+
### 4. Exploitation progression: leakage to tool misuse
118
+
119
+
After delivery, a common progression is:
120
+
121
+
1.**Instruction extraction**: coerce the collaborator into paraphrasing its internal logic, operational limits, or hidden guidance.
122
+
2.**Tool schema extraction**: elicit tool names, purposes, required parameters, and expected outputs. This gives the attacker the effective API contract for later abuse.
123
+
3.**Tool misuse**: persuade the collaborator to invoke a legitimate action group with attacker-controlled arguments, causing unauthorized business actions such as fraudulent ticket creation, workflow triggering, record manipulation, or downstream API abuse.
124
+
125
+
The core issue is that the backend lets the model decide **who may do what** by prompt semantics instead of enforcing authorization and validation outside the LLM.
126
+
127
+
### Notes for operators and defenders
128
+
129
+
-**Trace** and **model invocation logs** are useful to confirm routing, prompt augmentation, collaborator selection, and whether tool calls executed with the attacker-supplied arguments.
130
+
- Treat each collaborator as a separate trust boundary: scope action groups narrowly, validate tool inputs in the backend, and require server-side authorization before high-impact actions.
131
+
- Bedrock **pre-processing** can reject or classify suspicious requests before orchestration, and **Guardrails** can block prompt-injection attempts at runtime. They should be enabled even if prompt templates already contain “do not disclose” rules.
132
+
81
133
82
134
## References
83
135
84
136
-[When AI Remembers Too Much – Persistent Behaviors in Agents’ Memory (Unit 42)](https://unit42.paloaltonetworks.com/indirect-prompt-injection-poisons-ai-longterm-memory/)
137
+
-[When an Attacker Meets a Group of Agents: Navigating Amazon Bedrock's Multi-Agent Applications (Unit 42)](https://unit42.paloaltonetworks.com/amazon-bedrock-multiagent-applications/)
85
138
-[Retain conversational context across multiple sessions using memory – Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/agents-memory.html)
0 commit comments