Skip to content

fix: make Frame.lookup() iterative to prevent StackOverflowError#107

Closed
fdelbrayelle wants to merge 1 commit intodashjoin:mainfrom
fdelbrayelle:fix/frame-lookup-iterative
Closed

fix: make Frame.lookup() iterative to prevent StackOverflowError#107
fdelbrayelle wants to merge 1 commit intodashjoin:mainfrom
fdelbrayelle:fix/frame-lookup-iterative

Conversation

@fdelbrayelle
Copy link
Copy Markdown

Problem

Jsonata$Frame.lookup() walks the parent scope chain via tail recursion:

Object lookup(String name) {
    if (bindings.containsKey(name)) return bindings.get(name);
    else if (parent != null) return parent.lookup(name);
    return null;
}

Java does not optimize tail calls. On JVMs with smaller default thread stack sizes (e.g. Windows workers, containers with -Xss256k), deeply nested JSONata expressions overflow the stack. The StackOverflowError is a JVM Error, not an Exception — it cannot be safely caught and recovered from, and it crashes the entire worker thread.

Fix

Replace with an iterative loop. Identical semantics, zero recursion:

Object lookup(String name) {
    for (Frame f = this; f != null; f = f.parent) {
        if (f.bindings.containsKey(name)) return f.bindings.get(name);
    }
    return null;
}

Impact

Java does not optimize tail calls. On JVMs with smaller default thread
stack sizes (e.g. Windows workers, containers with -Xss256k), deeply
nested JSONata expressions cause Jsonata$Frame.lookup() to overflow the
stack via tail recursion. StackOverflowError is a JVM Error — it cannot
be safely caught and the entire worker thread crashes.

Replace the recursive parent.lookup(name) call with an iterative loop
over the scope chain. Semantics are identical; traversal order is
unchanged. Eliminates all stack risk from scope chain lookup regardless
of nesting depth.

Fixes worker crashes reported in kestra-io/plugin-transform#79.
@aeberhart
Copy link
Copy Markdown
Contributor

thanks for the PR. looks good. can you provide an example of a JVM setup + expression that reproduces this issue. Thanks!

fdelbrayelle added a commit to kestra-io/plugin-transform that referenced this pull request Apr 30, 2026
Tests now assert isInstanceOf(StackOverflowError.class) — current behavior
without the upstream fix. Rename constant to XSS_256K and add Javadoc linking
it to -Xss256k so the stack constraint is self-documenting.
Comment marks where assertions flip once dashjoin/jsonata-java#107 ships.
@fdelbrayelle
Copy link
Copy Markdown
Author

Hi @aeberhart 👋
We don't have the customer's initial data from his flow because it's sensitive data.
But I'm able to reproduce it. I opened a PR to let you check.

fdelbrayelle added a commit to kestra-io/plugin-transform that referenced this pull request Apr 30, 2026
Tests document the current bug state: StackOverflowError is expected.
Will flip to assertThatNoException once dashjoin/jsonata-java#107 ships.
@aeberhart
Copy link
Copy Markdown
Contributor

Hi @fdelbrayelle, I ran some tests with the expression from your repo:

($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(...))

Your change does not really make a difference because the number of nested frames is independent of the recursion depth. Eventually, you get a StackOverflow at a more or less random code position.

In your engine, you might want to do this:

  @Test
  public void testStack() {
    var e = Jsonata.jsonata("($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(820))");
    Frame f = e.createFrame();
    f.setRuntimeBounds(Long.MAX_VALUE, 100);
    System.out.println(e.evaluate(null, f));
  }

Then, you get a JException from com.dashjoin.jsonata.Timebox.checkRunnaway().

@fdelbrayelle
Copy link
Copy Markdown
Author

fdelbrayelle commented Apr 30, 2026

In plugin-transform, we're already following this pattern with a maxDepth of 200 by default now (here see):

    protected JsonNode evaluateExpression(RunContext runContext, JsonNode jsonNode) {
        try {
            var timeoutInMilli = runContext.render(getTimeout()).as(Duration.class)
                .map(Duration::toMillis)
                .orElse(Long.MAX_VALUE);
            var rMaxDepth = runContext.render(getMaxDepth()).as(Integer.class).orElseThrow();

            var data = MAPPER.convertValue(jsonNode, Object.class);
            var frame = this.parsedExpression.createFrame();
            frame.setRuntimeBounds(timeoutInMilli, rMaxDepth);

            var result = this.parsedExpression.evaluate(data, frame);
            if (result == null) {
                return NullNode.getInstance();
            }
            return MAPPER.valueToTree(result);
        } catch (JException | IllegalVariableEvaluationException e) {
            throw new RuntimeException("Failed to evaluate expression", e);
        }
    }

Frame.lookup() is called inside the recursive evaluator. With maxDepth=N and a scope chain of depth D, each level of evaluation recursion adds D lookup frames on top of its own frame. Stack usage grows as O(N × D), not O(N). Making lookup() iterative collapses that D multiplier to 1 regardless of scope nesting.

setRuntimeBounds limits evaluation recursion depth but it fires after the frames are pushed. It can't prevent overflow if the stack is already exhausted by the time checkRunnaway() is reached. The iterative lookup() reduces the per level footprint giving the bounds check room to fire cleanly.

As an evidence: Kestra already calls frame.setRuntimeBounds(timeout, 200). That didn't prevent the crash. The overflow happened inside lookup() traversal while evaluating a moderately nested expression on a container with a reduced stack size. That's exactly the scenario this PR fixes.

The change is zero-behavior-change, zero-risk and removes one class of StackOverflowError entirely. It doesn't conflict with 'setRuntimeBounds`, it makes it more reliable.

@aeberhart
Copy link
Copy Markdown
Contributor

Hi @fdelbrayelle,

running

($f := function($n) { $n > 0 ? $f($n - 1) + 1 : 0 }; $f(1999))

this is the stack trace:

java.lang.StackOverflowError
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:94)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:97)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata$Frame.lookup(Jsonata.java:97)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:155)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateFunction(Jsonata.java:1601)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:202)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateBinary(Jsonata.java:565)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:166)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluateCondition(Jsonata.java:1247)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata._evaluate(Jsonata.java:190)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.evaluate(Jsonata.java:138)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.applyProcedure(Jsonata.java:1920)
	at com.dashjoin.jsonata/com.dashjoin.jsonata.Jsonata.applyInner(Jsonata.java:1722)
...

in the example D=4 and N is some large number. In total, lookup is called N*D times. The recursion depth is O(N+D). You can see this in the trace above. The recursive call (evaluate) is repeated N times with 4 calls to lookup "on top".
Of course, a while loop is minimally faster than recursion, but since D is very small, we won't merge the change.

Please let us know if we're missing something.

fdelbrayelle added a commit to kestra-io/plugin-transform that referenced this pull request May 2, 2026
Each JSONata recursion level pushes ~8 JVM frames. On 256 KB worker
stacks (~300 usable frames), the old default maxDepth=200 allowed
200 × 8 = 1600 frames before the bounds check fired — far past overflow.

Two-layer fix:
1. Lower default maxDepth 200 → 50 (50 × 8 = 400 frames, safe on 256 KB).
   setRuntimeBounds fires at depth 50, throwing JException cleanly.
2. Run evaluate() on a dedicated thread with an explicit 4 MB stack.
   Worker thread stack size (e.g. 256 KB on Windows) can no longer
   constrain the evaluator. If a user sets a very high maxDepth and
   triggers StackOverflowError anyway, it is caught as Throwable inside
   the throwaway eval thread; the worker thread gets a clean
   RuntimeException instead of crashing.

Update regression tests:
- Parametrized test: maxDepth=50/200/500/1000 all produce JException,
  never StackOverflowError, even on -Xss512k JVM.
- Isolation test: maxDepth=50000 with $f(49999) overflows the eval
  thread but worker receives RuntimeException(cause=StackOverflowError).

Closes dashjoin/jsonata-java#107 dependency — fix is now self-contained
in plugin-transform without requiring upstream changes.
@fdelbrayelle
Copy link
Copy Markdown
Author

You're right, thank you for the analysis and the stack trace.

The recursion depth is O(N+D) and D is small so making lookup() iterative doesn't prevent the overflow. I was wrong about the multiplier.

We've fixed it on our side in plugin-transform instead: evaluation now runs on a dedicated thread with an explicit 4MB stack and the default maxDepth has been lowered to 50. The worker thread is fully isolated from any StackOverflowError in the evaluator.

So in the end, no upstream changes are needed here. Closing this PR.

@fdelbrayelle fdelbrayelle deleted the fix/frame-lookup-iterative branch May 2, 2026 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants