fix(security): authz, IDOR, and abuse-prevention fixes#4944
Conversation
The PATCH /api/knowledge/[id]/documents/[documentId]/chunks handler performs enable/disable/delete operations but authorized callers with only read-level access (checkDocumentAccess). This let read-only workspace members destroy or disable indexed chunks. Switch to checkDocumentWriteAccess (write/admin required), matching the sibling POST/PUT/DELETE chunk mutation endpoints.
GET /api/workspaces/:id/environment returned decrypted workspace environment variables to any member, including read-only collaborators, leaking API tokens, database URLs, and other secrets. Mask workspace variable values for non-admin viewers while preserving the variable names, so editor autocomplete and conflict detection keep working. A value is revealed only when the caller is a credential admin of that key, or — for legacy keys with no per-secret ACL — holds workspace admin permission. This mirrors the per-key edit gating already enforced by PUT/DELETE: if you can administer a secret, you can read it. Personal variables and execution-time resolution are unchanged.
POST /api/files/delete trusted a client-supplied `context`, letting any authenticated user delete another tenant's file by naming an arbitrary key with `context: "og-images"`. verifyFileAccess() short-circuited the three public contexts (profile-pictures, og-images, workspace-logos) to `true` before any ownership/requireWrite check. - Derive the storage context strictly from the trusted key prefix in the delete route; reject a supplied `context` that disagrees with the key. - Gate the public-context short-circuit to reads only. Destructive ops (requireWrite) now prove ownership via verifyPublicAssetWriteAccess: workspace-logos require write/admin on the bound workspace, profile-pictures require an exact owner match, og-images always deny. Reads of public assets are unchanged.
…ooks Telegram triggers accepted any forged update from anyone who knew the webhook URL path: verifyAuth was a no-op that always returned null, and setWebhook registered no secret_token. Generate a per-webhook secret in createSubscription, register it with Telegram as secret_token, and persist it to providerConfig. verifyAuth now fails closed — rejects when no token is configured, when the X-Telegram-Bot-Api-Secret-Token header is absent, or when it does not match via constant-time safeCompare.
… tools The Agiloft directExecution tools (read/create/search/update/delete/lock/ saved_search/select/get_choice_line_id/remove_attachment/attachment_info) and the Grafana update_dashboard/update_alert_rule postProcess hooks issued outbound HTTP to a fully user-controlled host (instanceUrl/baseUrl) via the global fetch(), guarded only by the synchronous validateExternalUrl() — which never resolves DNS, so a hostname resolving to an internal/reserved IP passed validation (SSRF). Route all of these through the codebase's standard SSRF-safe path: - Agiloft: moved executeAgiloftRequest into utils.server.ts where the existing pinned helpers live. It now resolves+validates the instance URL once and pins every hop (login, operation, logout) to that IP via secureFetchWithPinnedIP. The 11 tool configs now import it from utils.server; URL builders stay in the client-safe utils.ts. - Grafana: the postProcess POST/PUT now uses validateUrlWithDNS + secureFetchWithPinnedIP, matching the already-pinned initial GET. This completes the Agiloft SSRF pinning started in #4639 (which covered the attach/retrieve API routes) by closing the directExecution path, and extends the same guard to the Grafana update tools.
The external v1 API authenticated API keys without evaluating the per-workspace allowPersonalApiKeys setting, so a personal API key could read and mutate a workspace's resources (workflows, tables, files, knowledge, logs) even when the workspace had explicitly disabled personal keys. The same control is already enforced on the workflow-execution surface. Enforce the policy in checkWorkspaceScope (covering validateWorkspaceAccess too): reject personal keys with 403 when the workspace has allowPersonalApiKeys=false. checkWorkspaceScope becomes async; all v1 route callsites updated to await it.
The server-side usage-limit gate read already-recorded cost, but cost is only written when an execution finishes. A burst of concurrent executions all observed the same pre-burst usage, all passed the cap, and all ran — collectively spending far past the limit before any cost landed in the ledger (free-tier abuse / hard-cap defeat). manual/chat triggers also skip rate limiting, removing the only throttle. Add an atomic check-then-reserve admission step (Redis Lua) that bounds in-flight, un-costed executions per billing entity by both a per-plan concurrency cap and remaining usage headroom, so recordedUsage + reservedSlots * estimate <= limit always holds. The slot is released at execution completion via LoggingSession (skipped on pause; TTL self-heals crashes). Runs for all trigger types, covering the previously-unthrottled manual/chat paths. Fails open when billing is disabled or Redis is unavailable, matching the rate limiter — a Redis blip can't turn into an execution outage, and the recorded-usage gate still runs.
…create/update/reorder Reject a folderId that references a folder in a different workspace (or an archived/non-existent folder) before writing it to workflow.folderId. Previously create, update, and reorder only checked workspace permission on the workflow and the folder's lock status, never that the folder lived in the workflow's own workspace, allowing a dangling cross-workspace folder reference. Adds isFolderInWorkspace/assertFolderInWorkspace + FolderNotFoundError to @sim/workflow-authz (mirroring assertTargetFolderMutable in the duplicate path), enforced in performCreateWorkflow, performUpdateWorkflow, and the reorder route. Invalid folders now return 400.
…order Folder write endpoints accepted a caller-supplied parentId and persisted it without verifying the parent existed in the same workspace, and the create and reorder paths had no cycle guard. A workspace member with write access could reparent a folder to a foreign-workspace folder, a non-existent id, or (via reorder) into a cycle, hiding the folder and its workflows from all members. - performCreateFolder: reject self-parenting and validate the parent exists in the workspace and is not archived (mirrors the duplicate route). - performUpdateFolder: add the same workspace/archived parent check alongside the existing circular-reference guard. - folders/reorder: validate every target parent against the workspace, detect cycles in the resulting parent graph (catches batch cycles), and normalize falsy parentId to null to prevent orphaning. Adds tests for cross-workspace parent rejection and batch-cycle rejection.
Inbound webhook signature verification failed open for HMAC providers (GitHub, Intercom, Jira, JSM, Confluence, Cal.com, Notion, Greenhouse, Typeform, Fireflies, Circleback): when no signing secret was stored, verifyAuth returned null and the workflow executed on a fully attacker-controlled body. Reject these deliveries with 401 instead, matching the fail-closed Stripe/WhatsApp/Vercel providers. Run provider reachability/verification handshakes (Notion verification_token, Grain/Intercom ping) ahead of auth so the pre-secret setup handshake still completes — those return a canned 200 without executing the workflow, and real event payloads fall through to fail-closed verification. Update the trigger secret-field copy to state the secret is required for deliveries to be accepted (was misleadingly marked optional).
The custom before-hook pre-check threw a distinguishing 422/USER_ALREADY_EXISTS for already-registered emails, letting an unauthenticated attacker enumerate accounts — defeating better-auth's own OWASP enumeration protection (active under requireEmailVerification). Remove the pre-check and rely on better-auth's generic duplicate-sign-up response, wiring: - onExistingUserSignUp: notify the real account owner out-of-band, mirroring the privacy-preserving forget-password flow. - customSyntheticUser: include admin (role/banned/banReason/banExpires) and Stripe (stripeCustomerId, billing-gated) user fields so the fake response shape is byte-identical to a real new-user response. Adds an ExistingAccountEmail template + 'existing-account' subject.
Remove a redundant size annotation and two verbose multi-line materialization comments whose intent is already clear from the code. Load-bearing comments (race-condition and key-translation notes) kept.
Table-cell dispatch is row-bounded, async rate-limited, and already surfaces a graceful usage state. Applying the in-flight concurrency reservation there turned its 429 into a hard cell error on a normal >15-concurrent-cell run (only 402 was handled gracefully). Skip the reservation for that surface via a new skipConcurrencyReservation option (the usage-cost cap is still enforced), and tidy the reservation comments to TSDoc.
Password-protected public chat (POST /api/chat/[identifier]) had no throttling on the password check and compared with a non-constant-time !==, allowing unlimited brute-force and per-character timing leaks. - Add per-IP rate limiting (10 / 15min) to the password branch of validateChatAuth, mirroring the OTP/SSO endpoints; return 429 with Retry-After. Only explicit unlock attempts consume tokens — message sends carry no password and ride the auth cookie. - Replace password !== decrypted with safeCompare. - Fails open on rate-limiter storage errors; no availability regression.
The shared parseJsonBody helper (behind parseRequest, used by nearly every contract route) read request bodies with no size limit, buffering the full body into memory before validation. The unauthenticated public deployed-chat endpoint reached this sink with no admission gate, enabling an anonymous memory-exhaustion DoS. - parseRequest/parseJsonBody now enforce a byte cap via a size-limited stream read (content-length precheck + streamed cap), returning 413. Default is API_MAX_JSON_BODY_BYTES (50 MB), overridable per route via maxBodyBytes. Decoding uses TextDecoder to match request.json() BOM handling. - Public chat POST is wrapped with the admission gate (tryAdmit) and passes an explicit CHAT_MAX_REQUEST_BYTES (20 MB) cap. - Chat body contract gains .max() bounds on input, password, conversationId, file data/name/type, and files array length. - Admin bulk workspace import opts into a higher 100 MB cap to avoid regressing large multi-workflow imports.
Password-protected public chat (POST /api/chat/[identifier]) had no throttling on the password check and compared with a non-constant-time !==, allowing unlimited brute-force and per-character timing leaks. - Add per-IP rate limiting (10 / 15min) to the password branch of validateChatAuth, mirroring the OTP/SSO endpoints; return 429 with Retry-After. Only explicit unlock attempts consume tokens — message sends carry no password and ride the auth cookie. - Replace password !== decrypted with safeCompare. - Fails open on rate-limiter storage errors; no availability regression. Reinstates the fix reverted by an intervening commit.
The admission reservation tapered allowed concurrency by remaining usage headroom. With under one credit of headroom left (but not yet over the cap), floor(headroom / estimate) hit zero and rejected even a single, zero-concurrency execution — stricter than the recorded-usage gate, which would have allowed that last run, and with a misleading "too many concurrent executions" message. Floor the headroom term at 1 so a lone execution is governed only by the cost gate; concurrency above the first slot still tapers with headroom.
Extract the workspace-env value masking into a TSDoc-documented maskWorkspaceEnvForViewer helper and remove the redundant inline comments from the GET handler and its test. No behavior change.
Move the tiered-authorization rationale for the workspace env upsert and delete handlers into TSDoc blocks and drop the inline comments. No behavior change.
…llback The secret-token check rejected every webhook registered before secret_token support, breaking live triggers until re-saved. Fall back to verifying the request originates from Telegram's published webhook IP ranges when no secret is configured, so existing triggers keep firing with no re-save or migration while forged updates from arbitrary hosts are still rejected. Webhooks with a registered secret continue to use strict constant-time token verification.
A billing commit (ac56525) reverted the public-chat auth hardening as collateral, leaving HEAD with a timing-oracle password comparison (password !== decrypted) and no per-IP brute-force rate limit. Restore safeCompare and the password-attempt rate limiter, and re-add the 429 test.
Reverts the Telegram inbound-token verification (3ed97a4, 41f133a) and the HMAC fail-closed change (5b6cae9). Production data shows ~79 live webhooks have no signing secret configured (63 GitHub, 9 Fireflies, 3 Jira, 2 Circleback, 1 Confluence, 1 Cal.com), so failing closed would 401 them. Restoring fail-open behavior until a backwards-compatible rollout (grandfather existing secretless webhooks / migration) is designed. Other security fixes on this branch are unaffected.
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 33881454 | Triggered | Generic Password | 86e26b8 | apps/sim/app/api/chat/utils.test.ts | View secret |
| 33881454 | Triggered | Generic Password | 62764cb | apps/sim/app/api/chat/utils.test.ts | View secret |
| 33881454 | Triggered | Generic Password | 536af73 | apps/sim/app/api/chat/utils.test.ts | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
PR SummaryHigh Risk Overview File storage stops trusting client-supplied Workspace & API policy: workspace env GET masks secret values unless the caller is a credential admin (or workspace admin for legacy keys). v1 Billing & execution: new Redis usage reservations during preprocessing close the concurrent usage-cap race; slots release on Auth: duplicate email sign-up no longer returns “user exists”; better-auth’s generic success path plus an existing-account email and a synthetic user shape that matches a real sign-up response. SSRF: Agiloft moves HTTP to Other: contract JSON bodies get a default size limit (configurable per route); KB batch chunk ops use write access; minor docs/icon tweaks. Reviewed by Cursor Bugbot for commit 8975698. Configure here. |
Greptile SummaryThis PR ships a batch of security hardening fixes across authorization, IDOR prevention, billing admission, and abuse protection. All fixes are surgical and well-scoped, with targeted unit tests added for each changed surface.
Confidence Score: 5/5Safe to merge. All new access-control checks fail closed, the atomic billing reservation is correctly wired with idempotent release at every early-exit path, and the SSRF fixes cover both the POST/PUT and the initial GETs via the tool framework. The changes are well-scoped security hardening with targeted unit tests. The account-enumeration fix has a known timing caveat acknowledged in the PR notes, and the PEXPIRE lifetime extension in the reservation set is a cosmetic Redis hygiene note rather than a correctness issue. No broken control-flow, missing releases, or incorrect ownership checks were found. apps/sim/lib/auth/auth.ts — confirm whether better-auth awaits onExistingUserSignUp inline or fires it in the background, which determines whether a residual timing oracle exists. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant ChatRoute as chat/[identifier]/route.ts
participant ChatUtils as chat/utils.ts
participant Preprocess as preprocessing.ts
participant Reservation as usage-reservation.ts
participant Redis
participant LoggingSession
Client->>ChatRoute: "POST /api/chat/:id (body <= 220 MB)"
ChatRoute->>ChatRoute: tryAdmit() — process-level gate
ChatRoute->>ChatUtils: validateChatAuth(...)
ChatUtils->>ChatUtils: IP rate-limit check (10/15 min)
ChatUtils->>ChatUtils: safeCompare(password, decrypted)
ChatUtils-->>ChatRoute: "{ authorized }"
ChatRoute->>Preprocess: preprocessExecution(...)
Note over Preprocess: Steps 1-6: auth, usage check, ...
Preprocess->>Redis: RESERVE_SCRIPT (atomic ZCARD + ZADD)
Redis-->>Preprocess: 1 (admitted) or 0 (throttled)
Preprocess-->>ChatRoute: success / 429
ChatRoute->>ChatRoute: Execute workflow (stream)
ChatRoute->>LoggingSession: finalize(...)
LoggingSession->>Reservation: releaseExecutionSlot(executionId)
Reservation->>Redis: GETDEL pointer then ZREM inflight set
Reviews (4): Last reviewed commit: "fix(icons): make Linkup icon black for c..." | Re-trigger Greptile |
…picture delete deny Per PR review: when a profile-picture delete is denied, distinguish a missing owner record (no userId metadata) from a genuine ownership mismatch so the fail-closed denial is diagnosable. Behavior unchanged — both still deny.
If queueing the background workflow job throws, no job runs and no LoggingSession finalizes, so the admission slot reserved during preprocessing would leak until its TTL. Release it before returning 500.
…t limits - DEFAULT_MAX_JSON_BODY_BYTES and CHAT_MAX_REQUEST_BYTES now fall back to hardcoded defaults (50 MB / 220 MB) when the env value is missing or non-numeric, so a misconfig can't silently produce a NaN cap that never rejects. - Raise CHAT_MAX_REQUEST_BYTES default to 220 MB to cover 15 base64 file attachments, and MAX_CHAT_INPUT_CHARS to 1,000,000. - Minor: tidy use-inline-rename onSave type; drop two redundant test comments.
A prior commit changed onSave's return type from `void | Promise<unknown>` to `undefined | Promise<unknown>`, which broke the build: callbacks that return nothing (table-grid column rename, table header rename) infer a `void` return, which is not assignable to `undefined`. Restore the `void` union so both fire-and-forget and Promise-returning callbacks type-check.
|
@greptile |
|
@cursor review |
…ve 413 on oversized import - Chat route: preprocessExecution reserves a billing concurrency slot, but the post-preprocess early exits (missing workspaceId, execution-setup failure) returned without releasing it, leaking the slot until TTL and wrongly throttling later runs. Release explicitly on those paths (idempotent), mirroring the workflows execute route. - Admin import route: an oversized JSON body now returns the real 413 from parseJsonBody instead of being remapped to a 400; invalid JSON still 400s.
|
@greptile |
|
@cursor review |
The Infisical mark rendered near-white on its yellow block background and was barely visible; switch its fill from currentColor to #000000 (matching the hardcoded-fill pattern of sibling brand icons). Sync the docs icon copy and pick up a stale servicenow doc regeneration.
After preprocessExecution reserves a billing concurrency slot, the streaming path could exit without releasing it: the 503 return when initializeExecutionStreamMeta fails, and any throw during stream setup (caught by the outer handler, which only returned 500). Both left the slot held until TTL, wrongly throttling unrelated runs. Release on the 503 path and in the outer catch (executionId hoisted so the catch can see it; release is idempotent and a no-op when no slot was reserved).
The Linkup mark rendered with currentColor (near-white on its block background); switch its fill to #000000 for legibility, matching the Infisical fix. Docs icon copy synced via generate-docs.
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 8975698. Configure here.
In the inline (single-process) async path, if jobQueue.startJob threw before executeWorkflowJob ran, no LoggingSession finalized and the reserved billing slot was held until TTL. Release it in the fire-and-forget catch (idempotent; a no-op when the job already finalized and released). The queued-worker path and all in-job outcomes already release via the job's LoggingSession finalize.
Summary
Batch of security fixes (authorization / IDOR / abuse-prevention). Trigger-webhook auth hardening was intentionally reverted in this branch pending a backwards-compatible rollout (see note below).
Shipping:
/api/files/deleteno longer trusts client-suppliedcontext; derives it from the trusted key prefix and gates public-asset deletes by ownership (fixes cross-tenant file deletion)allowPersonalApiKeyspolicy (DB defaulttrue, so existing consumers unaffected)folderId/parentIdbelong to the same workspace on create/update/reorder (cross-workspace IDOR)Type of Change
Testing
Tested via targeted unit tests per fix (files, chat, webhooks, env, folders, workflows, billing, agiloft) plus
tscandcheck:api-validation. All passing.Notes / Follow-ups
EMAIL_VERIFICATION_ENABLED=true(orautoSignIn:false) — confirm in production.Checklist