feat(block): Allow wait block to wait up to 30 days#4331
feat(block): Allow wait block to wait up to 30 days#4331TheodoreSpeaks wants to merge 50 commits intostagingfrom
Conversation
…ership workflow edits via sockets, ui improvements
…ng improvements, posthog, secrets mutations
…ration, signup method feature flags, SSO improvements
…nts, secrets performance, polling refactors, drag resources in mothership
…y invalidation, HITL docs
…endar triggers, docs updates, integrations/models pages improvements
…ions, jira forms endpoints
…mat, logs performance improvements fix(csp): add missing analytics domains, remove unsafe-eval, fix workspace CSP gap (#4179) fix(landing): return 404 for invalid dynamic route slugs (#4182) improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs (#4170) fix(gemini): support structured output with tools on Gemini 3 models (#4184) feat(brightdata): add Bright Data integration with 8 tools (#4183) fix(mothership): fix superagent credentials (#4185) fix(logs): close sidebar when selected log disappears from filtered list; cleanup (#4186)
v0.6.46: mothership streaming fixes, brightdata integration
…m integration, atlassian triggers
…ze, subagent thinking, files sorting, agentphone integration
fix(db): revert statement_timeout startup options breaking pooled connections (#4284)
v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix, bun upgrade
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@BugBot review |
PR SummaryMedium Risk Overview This introduces a new Manual resume routes and paused-execution listing/detail now exclude time-based pauses and explicitly restrict manual resume to Reviewed by Cursor Bugbot for commit c234b01. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR extends the Wait block to support durations up to 30 days by reusing the human-in-the-loop pause/resume infrastructure. Waits ≤ 5 minutes continue to execute in-process; longer waits suspend the workflow by writing
Confidence Score: 3/5Not safe to merge until the failed-dispatch silent-strand bug is fixed; any transient error during resume dispatch permanently locks a workflow. A confirmed P1 defect in the new poll route means any transient DB or lock error during dispatch will permanently orphan a paused execution with no retry or observability. The rest of the implementation (schema migration, in-process vs. suspended branching, UI filtering, cron config) is well-structured and correct. apps/sim/app/api/resume/poll/route.ts — the failed-dispatch no-retry bug and missing ORDER BY Important Files Changed
Sequence DiagramsequenceDiagram
participant W as WaitBlockHandler
participant E as ExecutionEngine
participant M as PauseResumeManager
participant DB as pausedExecutions DB
participant C as CronJob (/api/resume/poll)
W->>W: execute(inputs)
alt waitMs ≤ 5 min (in-process)
W->>W: sleep(waitMs)
W-->>E: {status: 'completed'}
else waitMs > 5 min (suspended)
W-->>E: {status: 'waiting', _pauseMetadata: {pauseKind: 'time', resumeAt}}
E->>M: persistPauseResult(pausePoints)
M->>M: compute nextResumeAt (earliest time pause point)
M->>DB: INSERT/UPDATE pausedExecutions {nextResumeAt}
end
loop Every 1 minute
C->>DB: SELECT WHERE status='paused' AND nextResumeAt <= now LIMIT 200
DB-->>C: dueRows[]
loop for each dueRow
loop for each duePoint (pauseKind='time', resumeAt <= now)
C->>M: enqueueOrStartResume({executionId, contextId})
M-->>C: {status: 'starting', ...}
C->>M: startResumeExecution() [fire and forget]
end
C->>DB: UPDATE SET nextResumeAt = nextRemaining (null if all done)
end
end
Reviews (1): Last reviewed commit: "restore ff" | Re-trigger Greptile |
| for (const point of duePoints) { | ||
| const contextId = point.contextId | ||
| if (!contextId) continue | ||
| try { | ||
| const enqueueResult = await PauseResumeManager.enqueueOrStartResume({ | ||
| executionId: row.executionId, | ||
| contextId, | ||
| resumeInput: {}, | ||
| userId, | ||
| }) | ||
|
|
||
| if (enqueueResult.status === 'starting') { | ||
| PauseResumeManager.startResumeExecution({ | ||
| resumeEntryId: enqueueResult.resumeEntryId, | ||
| resumeExecutionId: enqueueResult.resumeExecutionId, | ||
| pausedExecution: enqueueResult.pausedExecution, | ||
| contextId: enqueueResult.contextId, | ||
| resumeInput: enqueueResult.resumeInput, | ||
| userId: enqueueResult.userId, | ||
| }).catch((error) => { | ||
| logger.error('Background time-pause resume failed', { | ||
| executionId: row.executionId, | ||
| contextId, | ||
| error: toError(error).message, | ||
| }) | ||
| }) | ||
| } | ||
| dispatched++ | ||
| } catch (error) { | ||
| const message = toError(error).message | ||
| logger.warn('Failed to dispatch time-pause resume', { | ||
| executionId: row.executionId, | ||
| contextId, | ||
| error: message, | ||
| }) | ||
| failures.push({ executionId: row.executionId, contextId, error: message }) | ||
| } | ||
| } | ||
|
|
||
| await db | ||
| .update(pausedExecutions) | ||
| .set({ nextResumeAt: nextRemaining }) | ||
| .where(eq(pausedExecutions.id, row.id)) | ||
| } | ||
|
|
||
| logger.info('Time-pause resume poll completed', { | ||
| requestId, | ||
| claimedRows, | ||
| dispatched, | ||
| failureCount: failures.length, |
There was a problem hiding this comment.
Failed dispatches permanently strand executions
When enqueueOrStartResume throws for a due pause point, the error is caught and pushed to failures[], but nextRemaining is unaffected (it only tracks future points). The loop then runs UPDATE … SET next_resume_at = nextRemaining (effectively NULL when all points were due). After this update, the row no longer satisfies the cron query (isNotNull(nextResumeAt)), so it is silently abandoned and the workflow is permanently stuck in status = 'paused'.
Any transient failure — DB timeout, lock contention, network hiccup inside enqueueOrStartResume — turns into a permanent hang with no visible alert and no retry path.
A simple fix is to re-schedule failed points by putting their resumeAt back into nextRemaining:
for (const point of duePoints) {
const contextId = point.contextId
if (!contextId) continue
try {
// ... dispatch ...
dispatched++
} catch (error) {
const message = toError(error).message
logger.warn('Failed to dispatch time-pause resume', { ... })
failures.push({ executionId: row.executionId, contextId, error: message })
// Re-queue failed point
if (point.resumeAt) {
const retryAt = new Date(point.resumeAt)
if (!Number.isNaN(retryAt.getTime())) {
if (!nextRemaining || retryAt < nextRemaining) nextRemaining = retryAt
}
}
}
}Alternatively, schedule a short retry (e.g. new Date(Date.now() + 60_000)) to avoid hammering a bad point at full frequency.
| metadata: pausedExecutions.metadata, | ||
| }) | ||
| .from(pausedExecutions) | ||
| .where( | ||
| and( | ||
| eq(pausedExecutions.status, 'paused'), | ||
| isNotNull(pausedExecutions.nextResumeAt), | ||
| lte(pausedExecutions.nextResumeAt, now) | ||
| ) | ||
| ) | ||
| .limit(POLL_BATCH_LIMIT) |
There was a problem hiding this comment.
No
ORDER BY on batch query — high-volume queues risk row starvation
Without an explicit ORDER BY, PostgreSQL returns rows in an unspecified order. When the queue depth exceeds POLL_BATCH_LIMIT = 200, the same 200 rows may be returned on every invocation (e.g. lowest physical heap order), while later-inserted rows are perpetually skipped. Adding .orderBy(pausedExecutions.nextResumeAt) ensures the most-overdue entries are always processed first and that all rows are eventually drained.
.orderBy(pausedExecutions.nextResumeAt)
.limit(POLL_BATCH_LIMIT)| async executeWithNode( | ||
| ctx: ExecutionContext, | ||
| block: SerializedBlock, | ||
| inputs: Record<string, any>, | ||
| nodeMetadata: { | ||
| nodeId: string | ||
| loopId?: string | ||
| parallelId?: string | ||
| branchIndex?: number | ||
| branchTotal?: number | ||
| } | ||
| ): Promise<BlockOutput> { |
There was a problem hiding this comment.
executeWithNode signature is narrower than the BlockHandler interface
BlockHandler.executeWithNode in types.ts declares nodeMetadata with three additional optional fields (originalBlockId, isLoopNode, executionOrder). The WaitBlockHandler implementation omits all three, so the method technically does not satisfy the declared interface contract. While TypeScript currently allows this (the extra fields are optional and ignored at runtime), it means callers that pass full nodeMetadata objects will silently drop fields the handler might need in a future iteration. Widening the implementation's parameter type to match the interface definition prevents this drift.
…rizations, mothership positional table row insertion, CI improvements, org-external users, file viewer improvements
v0.6.62: fix new copilot chat creation and selection on refresh
…ixes, db query optimizations, contract boundaries code hygiene, CORS, toast improvements, tables infinite query, executor robustness, reranker support
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 29606901 | Triggered | Generic High Entropy Secret | a54dcbe | apps/sim/providers/utils.test.ts | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c234b01. Configure here.
| status: 'completed', | ||
| status: 'waiting', | ||
| resumeAt, | ||
| _pauseMetadata: pauseMetadata, |
There was a problem hiding this comment.
Resumed wait block status stays "waiting" not "completed"
Medium Severity
When a long wait (>5 min) suspends and is later resumed by the poll, the block's output status remains 'waiting' because the resume merge in runResumeExecution spreads the existing output (which has status: 'waiting') but never updates it to 'completed'. In-process waits correctly return status: 'completed'. Downstream blocks referencing {{wait-block.status}} in conditional logic will see different values depending on whether the wait was short or long, potentially causing silent control-flow divergence.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c234b01. Configure here.


Summary
Reuse human in the loop logic to allow wait blocks to wait up to 30 days. Think this could be useful for things like email automation where you want to send followups after x days.
Type of Change
Testing
Checklist
Screenshots/Videos