fix: make Frame.lookup() iterative to prevent StackOverflowError#107
fix: make Frame.lookup() iterative to prevent StackOverflowError#107fdelbrayelle wants to merge 1 commit intodashjoin:mainfrom
Conversation
Java does not optimize tail calls. On JVMs with smaller default thread stack sizes (e.g. Windows workers, containers with -Xss256k), deeply nested JSONata expressions cause Jsonata$Frame.lookup() to overflow the stack via tail recursion. StackOverflowError is a JVM Error — it cannot be safely caught and the entire worker thread crashes. Replace the recursive parent.lookup(name) call with an iterative loop over the scope chain. Semantics are identical; traversal order is unchanged. Eliminates all stack risk from scope chain lookup regardless of nesting depth. Fixes worker crashes reported in kestra-io/plugin-transform#79.
|
thanks for the PR. looks good. can you provide an example of a JVM setup + expression that reproduces this issue. Thanks! |
Tests now assert isInstanceOf(StackOverflowError.class) — current behavior without the upstream fix. Rename constant to XSS_256K and add Javadoc linking it to -Xss256k so the stack constraint is self-documenting. Comment marks where assertions flip once dashjoin/jsonata-java#107 ships.
|
Hi @aeberhart 👋 |
Tests document the current bug state: StackOverflowError is expected. Will flip to assertThatNoException once dashjoin/jsonata-java#107 ships.
|
Hi @fdelbrayelle, I ran some tests with the expression from your repo: Your change does not really make a difference because the number of nested frames is independent of the recursion depth. Eventually, you get a StackOverflow at a more or less random code position. In your engine, you might want to do this: Then, you get a JException from com.dashjoin.jsonata.Timebox.checkRunnaway(). |
|
In protected JsonNode evaluateExpression(RunContext runContext, JsonNode jsonNode) {
try {
var timeoutInMilli = runContext.render(getTimeout()).as(Duration.class)
.map(Duration::toMillis)
.orElse(Long.MAX_VALUE);
var rMaxDepth = runContext.render(getMaxDepth()).as(Integer.class).orElseThrow();
var data = MAPPER.convertValue(jsonNode, Object.class);
var frame = this.parsedExpression.createFrame();
frame.setRuntimeBounds(timeoutInMilli, rMaxDepth);
var result = this.parsedExpression.evaluate(data, frame);
if (result == null) {
return NullNode.getInstance();
}
return MAPPER.valueToTree(result);
} catch (JException | IllegalVariableEvaluationException e) {
throw new RuntimeException("Failed to evaluate expression", e);
}
}
As an evidence: Kestra already calls The change is zero-behavior-change, zero-risk and removes one class of |
|
Hi @fdelbrayelle, running this is the stack trace: in the example D=4 and N is some large number. In total, lookup is called N*D times. The recursion depth is O(N+D). You can see this in the trace above. The recursive call (evaluate) is repeated N times with 4 calls to lookup "on top". Please let us know if we're missing something. |
Each JSONata recursion level pushes ~8 JVM frames. On 256 KB worker stacks (~300 usable frames), the old default maxDepth=200 allowed 200 × 8 = 1600 frames before the bounds check fired — far past overflow. Two-layer fix: 1. Lower default maxDepth 200 → 50 (50 × 8 = 400 frames, safe on 256 KB). setRuntimeBounds fires at depth 50, throwing JException cleanly. 2. Run evaluate() on a dedicated thread with an explicit 4 MB stack. Worker thread stack size (e.g. 256 KB on Windows) can no longer constrain the evaluator. If a user sets a very high maxDepth and triggers StackOverflowError anyway, it is caught as Throwable inside the throwaway eval thread; the worker thread gets a clean RuntimeException instead of crashing. Update regression tests: - Parametrized test: maxDepth=50/200/500/1000 all produce JException, never StackOverflowError, even on -Xss512k JVM. - Isolation test: maxDepth=50000 with $f(49999) overflows the eval thread but worker receives RuntimeException(cause=StackOverflowError). Closes dashjoin/jsonata-java#107 dependency — fix is now self-contained in plugin-transform without requiring upstream changes.
|
You're right, thank you for the analysis and the stack trace. The recursion depth is We've fixed it on our side in plugin-transform instead: evaluation now runs on a dedicated thread with an explicit 4MB stack and the default So in the end, no upstream changes are needed here. Closing this PR. |
Problem
Jsonata$Frame.lookup()walks the parent scope chain via tail recursion:Java does not optimize tail calls. On JVMs with smaller default thread stack sizes (e.g. Windows workers, containers with
-Xss256k), deeply nested JSONata expressions overflow the stack. TheStackOverflowErroris a JVMError, not anException— it cannot be safely caught and recovered from, and it crashes the entire worker thread.Fix
Replace with an iterative loop. Identical semantics, zero recursion:
Impact
plugin-transform-jsonon Windows workers (reported at fix(jsonata): lower default maxDepth to prevent Windows worker crash kestra-io/plugin-transform#79)