In Part 1, I put Squad inside a deterministic LangGraph workflow. In Part 2, I moved risky tool execution into Azure Container Apps Dynamic Sessions.
That gave me a clean rule: the app owns the workflow, the graph owns the state transitions, and risky execution happens somewhere isolated instead of inside the process that is making decisions.
But then I hit the next problem.
What happens when the agent needs a real workspace?
Not one command. Not one tool call. A workspace. A repo checkout. A build cache. Logs. Generated artifacts. A place where the agent can stop, wait for me, and continue tomorrow without rebuilding the universe from scratch.
You can recreate all of that on every turn. You can also eat soup with a fork. It builds character. I do not recommend it.
So Part 3 is about the next piece of the architecture:
Use ACA Sandboxes as persistent, isolated workspaces for agent tasks that need continuity across tool calls, checkpoints, and human review.
The Control Plane and the Tool Plane
The app from the previous posts does not go away.
The LangGraph/Squad app is still the control plane. It decides what should happen next, records state, routes work, pauses for human input, and decides whether an artifact is good enough to move the graph forward.
The sandbox is the workspace/tool plane. It is where files live, commands run, dependencies get installed, and proof artifacts are produced.
That distinction matters because I do not want the agent’s brain and the agent’s workshop to be the same thing.
1
2
3
4
5
6
7
8
9
10
11
12
13
LangGraph / Squad app
|
| decide next step, choose approved command
v
ACA Sandbox workspace
|
| run command, update files, produce artifact
v
proof artifact
|
| attach result back to graph state
v
next node, human review, or cleanup
The graph should be able to say: “Run the approved analysis command in this workspace, bring me back the artifact, and tell me exactly what happened.”
Not “here is a shell, good luck.”
That is the pattern I liked in the implementation framing Tamir pointed me to: deterministic workflow plus a sandbox validation loop. The code runs before the artifact is shown, and the workflow only advances based on something we can inspect. I am not using that as Azure product documentation; I am using it as a useful engineering shape.
Dynamic Sessions vs. Sandboxes
Azure Container Apps has two similarly named primitives that I do not want to blur together.
Dynamic Sessions are managed, ephemeral execution environments behind a session pool. They are great when I need gloves: run one bounded action, return one artifact, clean up.
ACA Sandboxes are first-class preview sandbox resources under Microsoft.App/SandboxGroups. They are a better fit when I need a desk. Or a workshop.
Dynamic Sessions ask:
Where can I safely run this one risky thing?
Sandboxes ask:
Where can this agent have an isolated workspace that survives long enough to be useful?
For a real Squad workflow, that second question matters. An agent fixing a bug may need to clone a repo, install dependencies, reproduce a failing test, generate a review file, pause for human review, and resume from the same directory after I reply. Even a short-lived validation run starts leaning toward Sandboxes once it needs a real filesystem and multiple command phases like restore, build, test, smoke-run, score, and refine.
The sandbox is not the brain. It is the workbench. The durable decision state still belongs in the app, graph, issue, PR, or repo. The sandbox preserves operational continuity: files, caches, outputs, logs, and artifacts.
The Overall Flow
Here is the boring version of the workflow I want.
1
2
3
4
5
6
7
1. Create a sandbox group.
2. Add a sandbox-group credential.
3. Create or choose a disk image with the tools I need.
4. Create a sandbox from that image with explicit egress rules.
5. Run only allowlisted commands.
6. Export the proof artifact back to graph state.
7. Suspend, snapshot, or delete the sandbox.
Boring is good. Boring survives contact with calendars.
There are a few important details hiding inside that list.
The sandbox group is the administrative boundary. The credential belongs to the group, not to my laptop. The disk image contains tools, not my browser profile. The sandbox gets egress intentionally, not accidentally. The agent gets a command catalog, not arbitrary shell. The artifact comes back to the graph as evidence, not vibes.
That last part is important enough to name plainly: proof artifacts. The command log, the manifest, the generated review file, and the cleanup result. Basically: what can I prove happened?
Credentials: The Part That Needs to Be Boring on Purpose
This is the part that needs to be boring on purpose.
A credential, in this context, is a token or provider connection the sandbox can use to authenticate to an external service. For this post, the interesting provider is GitHub Copilot. If I want a Copilot-backed Squad command to run inside the sandbox, the sandbox needs a way to authenticate.
The wrong way is to copy my host machine’s token store, gh profile, browser cookies, or local credential cache into the sandbox. That would make the demo easier and the architecture worse.
The right shape is a sandbox-group credential:
- it is configured on the sandbox group;
- it can be shared by sandboxes in that group;
- it is injected into the sandbox through the platform path;
- it can be rotated or removed without baking secrets into a disk image;
- it keeps the custom disk focused on tools, not identity.
The Sandboxes UI is at https://sandboxes.azure.com/. Once you open your sandbox group, the path is:
1
Sandbox group > Credentials > Provider Tokens > GitHub Copilot > Set Token

That screenshot is from the real Sandboxes UI, with account and sandbox group details blurred. The important product point is still visible: provider tokens are attached to the sandbox group, and sandboxes in that group can use them without baking credentials into the disk image.
The UI is not required, though. For this architecture, I want the setup to be boring and scriptable.
First, bootstrap the group with the preview aca CLI. This is not az containerapp; Sandboxes use the separate aca command surface.
1
2
3
4
5
6
7
8
9
10
11
# Install once, then authenticate only when needed.
curl -fsSL https://aka.ms/aca-cli-install | sh
az account show -o none || az login
aca auth status || aca auth login
aca sandboxgroup create \
--name <SANDBOX_GROUP> \
--location <REGION> \
--set-config
aca doctor
aca sandboxgroup create auto-assigns the caller the Container Apps SandboxGroup Data Owner role unless you opt out with --skip-role-check. For another principal, assign it explicitly:
1
2
3
4
aca sandboxgroup role create \
--group <SANDBOX_GROUP> \
--role "Container Apps SandboxGroup Data Owner" \
--principal-id <PRINCIPAL_OBJECT_ID>
There is no YAML file to apply for this credential in the current flow. The earlier YAML-shaped sketch was conceptual; it was not a real manifest. The credential is a sandbox-group resource you create through the UI or the preview aca CLI. This is where the token value is set: in the UI, you paste it into the GitHub Copilot provider token field; in the CLI, aca sandboxgroup credential create stores it on the sandbox group and returns a credential id.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Safer local flow: omit --token and let the CLI prompt for the value.
GITHUB_COPILOT_CREDENTIAL_ID=$(aca sandboxgroup credential create \
--group <SANDBOX_GROUP> \
--type github-copilot \
-o json | jq -r .id)
aca sandboxgroup credential list \
--group <SANDBOX_GROUP> \
-o json
aca sandboxgroup credential delete \
--group <SANDBOX_GROUP> \
--id "$GITHUB_COPILOT_CREDENTIAL_ID" \
--yes
For CI, use your secret store to provide the token to that command; do not hard-code it in source, .env, screenshots, or logs. The current preview credential types exposed by the CLI are github-copilot and anthropic-claude. For GitHub Copilot, the CLI validates a fine-grained GitHub PAT prefix (github_pat_...) and rejects classic ghp_... tokens. Use a disposable, tightly scoped token and rotate it. If it ships, it ships reliably.
The sandbox create path is also scriptable. The public disk path uses --disk; private or committed disks use --disk-id; credentials are attached by id; egress starts with a default action plus host allow rules. These flags are preview surface, so I would re-check aca sandbox create --help and aca sandbox egress set --help before automating them in CI.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
aca sandbox create \
--group <SANDBOX_GROUP> \
--disk copilot \
--credential "$GITHUB_COPILOT_CREDENTIAL_ID" \
--egress-default Deny \
--egress-rule "github.com:Allow" \
--egress-rule "api.github.com:Allow" \
--egress-rule "*.githubcopilot.com:Allow" \
-o json
aca sandbox create \
--group <SANDBOX_GROUP> \
--disk-id <PRIVATE_OR_COMMITTED_DISK_ID> \
--credential "$GITHUB_COPILOT_CREDENTIAL_ID" \
--egress-default Deny \
--egress-rule "github.com:Allow" \
--egress-rule "api.github.com:Allow" \
--egress-rule "*.githubcopilot.com:Allow" \
-o json
For a sandbox that already exists, the same egress policy can be set or inspected without recreating the sandbox:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Current aca preview build used in this validation:
aca sandbox egress set \
--id <SANDBOX_ID> \
--default Deny \
--rule "github.com:Allow" \
--rule "api.github.com:Allow" \
--rule "*.githubcopilot.com:Allow" \
--traffic-inspection Full
# If your aca build exposes --host-allow instead, use the equivalent form:
aca sandbox egress set \
--id <SANDBOX_ID> \
--default Deny \
--host-allow "github.com" \
--host-allow "api.github.com" \
--host-allow "*.githubcopilot.com" \
--traffic-inspection Full
aca sandbox egress show --id <SANDBOX_ID>
The important detail is the handoff: the credential id is passed to aca sandbox create --credential. That is how the sandbox gets access to the provider token. The token is not in the disk image, not in the repo, and not copied from my host profile.
For ARM/Bicep, I am drawing a hard line: Bicep can create the sandbox group control-plane resource, but I am not inventing Bicep children for provider credentials or individual runtime sandboxes. The published ARM template reference I could verify exposes Microsoft.ContainerInstance/sandboxGroups@2026-06-01-preview; the Sandboxes preview material and aca command surface also refer to the ACA sandbox group as Microsoft.App/SandboxGroups. In practice, validate the provider namespace registered for your preview subscription and keep runtime operations on the supported preview CLI/SDK/data plane.
The public Bicep shape is group-level only:
param location string = resourceGroup().location
param sandboxGroupName string
param sandboxSubnetId string
resource sandboxGroup 'Microsoft.ContainerInstance/sandboxGroups@2026-06-01-preview' = {
name: sandboxGroupName
location: location
properties: {
networkProfile: {
subnets: [
{
id: sandboxSubnetId
}
]
}
}
}
So the clean no-UI split is:
1
2
3
4
5
6
7
8
9
ARM/Bicep:
create the sandbox group boundary, network profile, identity/tags where supported
aca CLI / SDK / data plane:
create/list/delete credentials
create sandboxes from disks or snapshots
attach credentials
set egress policy
exec, shell, files, ports, lifecycle, snapshot, cleanup
For my custom disk path, the sample still expects the future prompt proof to use COPILOT_GITHUB_TOKEN inside the sandbox, but only after I verify the exact credential-injection behavior for the custom disk and wire delete-after-run cleanup. Until then, prompt-level proof stays fail-closed.
Gates Before Commands
Before showing the command catalog, I need one more piece: gates.
SandboxGates is the set of switches that prevents the demo from quietly becoming more powerful than I intended. Creating infrastructure, deleting resources, requiring egress verification, or running an authenticated Copilot prompt should all be explicit choices.
1
2
3
4
5
6
@dataclass(frozen=True)
class SandboxGates:
allow_create: bool
delete_after_run: bool
require_verified_egress: bool
allow_copilot_prompt: bool
The point is not the dataclass. The point is the policy.
If allow_create is false, the sample should attach to an existing workspace instead of creating new infrastructure. If delete_after_run is false, cleanup should not pretend it deleted things. If require_verified_egress is true, the run should fail unless the egress allowlist was checked. And if allow_copilot_prompt is false, the authenticated Copilot/Squad proof command should not run.
Then comes COMMANDS.
COMMANDS: dict[str, CommandSpec] is the allowlist. It is the difference between “the agent can run a shell” and “the coordinator can request one of these known operations.”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
COMMANDS: dict[str, CommandSpec] = {
"prepare_workspace": CommandSpec(
argv=["python", "-m", "demo.prepare"],
writes=["/workspace/manifest.json"],
),
"analyze_workspace": CommandSpec(
argv=["python", "-m", "demo.analyze"],
reads=["/workspace/manifest.json"],
writes=["/workspace/review.md"],
),
"read_artifact": CommandSpec(
argv=["python", "-m", "demo.read_artifact"],
reads=["/workspace/review.md"],
),
"copilot_squad_proof": CommandSpec(
argv=["python", "-m", "demo.copilot_squad_proof"],
requires=[
"COPILOT_GITHUB_TOKEN",
"delete_after_run",
"verified_egress_allowlist",
],
),
}
The last command is the interesting one. It represents the future proof I actually want: run a real Copilot-backed Squad prompt inside the sandbox, using a sandbox-group credential, with egress verified and cleanup enforced.
But the command being present in the allowlist is not the same as the proof being complete.
What We Can Prove Right Now
Here is the honest state of the implementation.
The local workspace path works. The package compiles, tests pass, and the local demo produces a deterministic review artifact.
The live Azure sandbox path works for the bounded demo commands. A disposable sandbox was created, default-deny egress was used, the approved workspace commands ran, a snapshot path was exercised, suspend-at-end was used, and cleanup removed the disposable resources.
The custom disk/toolchain path works without secrets. The proof artifact says the sandbox could run help/version checks for git, gh, Node, npm, GitHub Copilot CLI, and Squad.

This is the real sandbox detail view, redacted. It shows the pieces engineers actually need to understand: lifecycle state, terminal access, network audit, resource usage, disk image, and the surrounding sandbox group navigation.
What I cannot claim yet is the thing Tamir correctly pushed on:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Proved so far:
workspace lifecycle
default-deny sandbox path
allowlisted demo commands
custom disk/toolchain inspection
proof artifact export
cleanup of disposable resources
Current implementation gap:
authenticated Copilot-backed Squad prompt inside the sandbox
Next engineering gate for prompt-level proof:
credential-backed prompt using the sandbox-group credential path,
delete-after-run enabled,
verified egress allowlist,
and confirmation from the infrastructure/code reviewers that it actually ran
That is less flashy than “Copilot ran in the sandbox.”
It is also the version I can defend.
So my recommendation is simple: keep the claim narrow until the authenticated Copilot/Squad proof runs for real. The architecture is useful now; the final proof still has one gate left.
Cleanup Is Part of the Feature
Persistent workspaces are useful because they preserve state. That also makes cleanup part of the design, not an afterthought.
For a Squad sandbox workflow, I want the cleanup contract to be explicit:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
cleanup:
export:
- sanitized-command-log
- manifest-hash
- approved-artifacts
redact:
- resource-paths
- sandbox-identifiers
- snapshot-identifiers
- account-details
afterRun:
- delete-disposable-snapshots
- suspend-or-delete-disposable-sandbox
- verify-no-unapproved-resources-remain
If I cannot explain why a workspace still exists, it should not still exist.
Sample Walkthrough
The companion sample is here:
https://github.com/tamirdresher/squad-aca-sandboxes-workspace
The important implementation details are concrete. I validated the workspace as a working companion implementation, not just as a diagram with aspirations and a nice haircut.
Where Dynamic Sessions Still Win
This is not “Sandboxes are better, forget Dynamic Sessions.” That would be wrong.
Dynamic Sessions still feel like the right primitive when the work is short-lived, the result is one structured artifact, and I do not want state preserved after the tool call.
Use the gloves when the task is one risky action.
Use the workshop when the task needs a place to live.
The mistake is pretending they solve the same problem because both have the word “sandbox” nearby.
The Point
Part 1 made Squad deterministic enough to reason about.
Part 2 made tool execution isolated enough to trust for short-lived actions.
Part 3 is about making agent workspaces persistent enough to review, resume, and govern.
1
2
3
4
5
6
7
deterministic workflow
+
isolated risky execution
+
persistent isolated workspace
=
agent work I can inspect, resume, and prove
That is the story I want this post to tell.
But the final proof matters. Until a real Copilot-backed Squad prompt runs inside the sandbox through the sandbox-group credential path, the honest claim is workspace plus toolchain validation, not full Copilot-backed Squad execution.
Agents do not just need brains. They need places to work. And if we give them a workshop, I want it to have a door, a lock, an allowlist, a cleanup policy, and a very boring manifest that tells me exactly what happened.
Sources
- Azure Container Apps Sandboxes overview
- Introducing Azure Container Apps Sandboxes: Secure Infrastructure for Agentic Workloads
- Azure Container Apps Sandboxes CLI preview
- ACA Sandboxes CLI reference skill
- ARM template reference:
Microsoft.ContainerInstance/sandboxGroups@2026-06-01-preview - Dynamic sessions in Azure Container Apps
- Companion sample workspace: https://github.com/tamirdresher/squad-aca-sandboxes-workspace, validated live for the bounded demo path, including local validation, live sandbox lifecycle, toolchain checks, and cleanup.
- A sanitized proof artifact accompanies the post at
/assets/aca-sandboxes-squad-workspace/live-validation-evidence.md.