fix(cloud-agent): harden workspace bootstrap#3937
Merged
Merged
Conversation
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Executive SummaryAll previously identified issues have been resolved: Resolved Issues
New commit reviewed
Files Reviewed (12 files)
Fix these issues in Kilo Cloud Reviewed by claude-sonnet-4.6 · 423,536 tokens Review guidance: REVIEW.md from base branch |
alex-alecu
reviewed
Jun 11, 2026
e5d2cb2 to
0897d4f
Compare
alex-alecu
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
Large repositories can remain healthy while cloning or checking out for longer than the wrapper's previous two-minute wall-clock limit. When those operations were interrupted, the remaining
.gitdirectory could also make a later attempt reuse an incomplete workspace, turning one timeout into repeated setup failures.What was done
Starting Kilo...while the runtime starts.High-level architecture
sequenceDiagram participant Orchestrator participant Wrapper participant Workspace participant Kilo Orchestrator->>Wrapper: POST /session/ready (10-minute outer budget) Wrapper->>Workspace: Prepare repository and session (8-minute shared budget) alt Workspace is complete Workspace-->>Wrapper: Reuse and refresh credentials else Workspace is incomplete or cold Wrapper->>Workspace: Mark pending, remove stale state, clone, restore, and run setup Wrapper->>Workspace: Write completion marker end Wrapper->>Kilo: Start runtime Wrapper-->>Orchestrator: Session readyArchitecture decision
Decision: Keep bootstrap lifecycle policy in the wrapper and combine activity-aware command watchdogs with a shared workspace deadline and explicit completion markers.
Context: A single elapsed timeout could not distinguish a slow, active checkout from a stalled process, while
.gitexistence alone could not distinguish a usable workspace from an interrupted clone.Rationale: The wrapper owns the subprocesses and persisted workspace state, so it can observe output, cancel child processes, and write completion state at the point where bootstrap actually succeeds. Layered two-minute inactivity, five-minute command, eight-minute workspace, and ten-minute readiness limits keep each boundary finite without colliding with startup cleanup.
Alternatives considered:
.gitdirectory as warm. This avoids recloning but preserves the failure mode where partial checkouts are mistaken for complete workspaces.Consequences: Active long-running operations receive more time and interrupted new workspaces recover deterministically. Silent commands can still time out after two minutes, markerless legacy workspaces use a compatibility heuristic, and cleanup remains best-effort so the original setup failure is preserved.
Verification
Visual Changes
Reviewer Notes
Unauthorized: Invalid tokenclone failure is intentionally not addressed by this PR.