Skip to content

[NFC] Refactor delta debugging to use coroutines#8657

Open
tlively wants to merge 3 commits intomainfrom
ddb-coroutine
Open

[NFC] Refactor delta debugging to use coroutines#8657
tlively wants to merge 3 commits intomainfrom
ddb-coroutine

Conversation

@tlively
Copy link
Copy Markdown
Member

@tlively tlively commented Apr 29, 2026

Add a generator utility in a new support/coroutine.h header and use it to refactor away the callback in the delta debugging utility. Now the utility is a struct providing access to the test and working sets as well as accept() and reject() methods that cause the test and working sets to be updated appropriately. Rather than being refactored into an explicit state machine, the implementation of the delta debugging algorithm remains readable straight-line code the does a co_yield whenever it is ready to return control to the user. It co_yields a pointer to local state object that exposes all the information that the delta debugging utility exposes in its public API. This local object stays alive across suspend points. When the delta debugging algorithm is complete, we suspend the coroutine one final time and make sure never to resume it, which ensures the state remains alive and available after delta debugging has finished. It will ultimately be cleaned up when the outer DeltaDebugger struct is cleaned up.

Add a generator utility in a new support/coroutine.h header and use it to refactor away the callback in the delta debugging utility. Now the utility is a struct providing access to the test and working sets as well as `accept()` and `reject()` methods that cause the test and working sets to be updated appropriately. Rather than being refactored into an explicit state machine, the implementation of the delta debugging algorithm remains readable straight-line code the does a co_yield whenever it is ready to return control to the user. It co_yields a pointer to local state object that exposes all the information that the delta debugging utility exposes in its public API. This local object stays alive across suspend points. When the delta debugging algorithm is complete, we suspend the coroutine one final time and make sure never to resume it, which ensures the state remains alive and available after delta debugging has finished. It will ultimately be cleaned up when the outer `DeltaDebugger` struct is cleaned up.
@tlively tlively requested a review from a team as a code owner April 29, 2026 05:08
@tlively tlively requested review from aheejin, kripken and stevenfontanella and removed request for a team April 29, 2026 05:08
@tlively
Copy link
Copy Markdown
Member Author

tlively commented Apr 29, 2026

This is an alternative to #8651. It took some iteration, but I'm pretty happy with how it turned out. The arcane C++ coroutine nonsense is pretty well encapsulated in coroutine.h and the delta debugging implementation is essentially just as readable as before.

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 29, 2026

What about compiler support for coroutines - https://en.cppreference.com/cpp/compiler_support suggests clang on windows may not be done yet, but perhaps that page is out of date?

@tlively
Copy link
Copy Markdown
Member Author

tlively commented Apr 29, 2026

Looks like this is still a problem :( https://clang.llvm.org/cxx_status.html#:~:text=Clang%2017-,Coroutines,-P0912R5. But I think the ABI problem only affects 32-bit x86, and it doesn't look like we do any 32-bit releases at all. Maybe this is good enough for us? Here's the relevant LLVM bug: llvm/llvm-project#59382

@kripken
Copy link
Copy Markdown
Member

kripken commented Apr 29, 2026

If it only affects 32-bit windows I think we are ok here. But the wording in those links is a little ambiguous to me if that is the case?

@tlively
Copy link
Copy Markdown
Member Author

tlively commented Apr 29, 2026

The opening post on the LLVM issue says "The 32-bit Windows ABI passes objects of non-trivially-copyable class type by value on the stack" and never mentions any other problems, so I think we're good. (And I asked an expert internally and he confirmed this understanding.)

Copy link
Copy Markdown
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good about compiler support!

if (working.empty()) {
finished = true;
co_yield &state;
co_return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to yield before returning? Isn't the output in the right place already?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tricky thing here is that we need to prevent the coroutine from ever returning because we depend on its local state staying live for the lifetime of the outer DeltaDebugger. So we yield before returning here and below, then make sure we never resume the coroutine again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks, that's what I was missing. Please document that, it is indeed tricky...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, could we std::move the final state from the coroutine?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there is not a great way to do that. This is by far the simplest approach I tried. Will add comments.

Comment thread src/support/coroutine.h Outdated
return false;
}
PromiseType* await_resume() const noexcept { return promise; }
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some docs for these classes? I'm not really sure what "GetPromise" means or does just from this code (which seems so generic as to do almost nothing but store a "promise"..?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these methods are well-known to the compiler and configure the suspending and resuming behavior of our Generator utility. Unfortunately this is just a bunch of unavoidable boilerplate that doesn't do anything interesting (or comprehensible to non-experts). I'll document the interesting user-exposed methods, but for most of this there's not anything more to say than // Unavoidable boilerplate.

@tlively tlively requested review from juj and sbc100 April 29, 2026 20:20
@tlively
Copy link
Copy Markdown
Member Author

tlively commented Apr 29, 2026

@sbc100, @juj, we'd like to land this PR introducing coroutines usage in Binaryen. But my understanding is that coroutines require XCode 16 or above to be stable. Can we find a way to make that acceptable?

@juj
Copy link
Copy Markdown
Collaborator

juj commented May 4, 2026

I tested current LLVM main & Binaryen main builds on different macOS build&test environments. Here are the results I got

Device OS Xcode tools version clang --version Build status?
MacBook Pro 2014 macOS Big Sur 11.7.11 12.5 (OUTDATED) Apple clang 12.0.5 ❌ Fails LLVM build, SLVVectorizer.cpp: error: static_assert expression is not an integral constant expression
Mac Mini x64 2018 macOS Sequoia 15.5 11.3 (OUTDATED) Apple clang 12.0.5 ❌ Fails LLVM build, SLVVectorizer.cpp: error: static_assert expression is not an integral constant expression
MacBook Pro 2014 macOS Big Sur 11.7.11 13.2 Apple clang 13.0.0 ✔️ Builds LLVM. ❌ Fails Binaryen, lattice.h: no member named 'copyable' in namespace 'std'
Mac Pro 2013 macOS Monterey 12.7.6 14.2 Apple clang 14.0.0 ✔️ Builds LLVM. ❌ Fails Binaryen, GlobalEffects.cpp: no member named 'keys' in 'std::ranges::views'
MacBook Pro 2017 macOS Ventura 13.7.8 14.3 Apple clang 14.0.3 ✔️ Builds LLVM. ❌ Fails Binaryen, GlobalEffects.cpp: no member named 'keys' in 'std::ranges::views'
MacBook Air 2018 macOS Sonoma 14.8.5 16.2 Apple clang 16.0.0 ✔️ OK, builds LLVM & Binaryen
Mac Mini x64 2018 macOS Sequoia 15.7.5 Xcode 26.3 Apple clang 17.0.0 ✔️ OK, builds LLVM & Binaryen
Mac Mini M1 2020 macOS Sequoia 15.5 ? ? ✔️ OK, builds LLVM & Binaryen
Mac M4 Mini 2024 macOS Tahoe 26.0.1 16.4 Apple clang 17.0.0 ✔️ OK, builds LLVM & Binaryen

Ideally, it would be nice if Binaryen could take a goal that it would build wherever LLVM builds. I.e. target Clang-13 at the moment?

Testing briefly, I see that Binaryen no longer builds with Visual Studio 2019, or older Visual Studio 2022 on Windows.

Do you know what macOS user version requirements coroutines would bring? Would it only be a build-time requirement to have Xcode 16, or would it also mean that e.g. macOS 14 would be required on user systems at minimum?

Are the coroutines support only needed for the wasm-reduce test case reducer tool? If so, would it be possible to restrict coroutines use to wasm-reduce, and e.g. have a -DBUILD_WASM_REDUCE=ON/OFF CMake variable to enable skip building wasm-reduce on older systems? Or is the intent to migrate more areas to coroutines after that?

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented May 4, 2026

On the runtime vs compile time point, I believe all new C++ features are build time only. i.e. don't effect the version of macOS you can target with CMAKE_OSX_DEPLOYMENT_TARGET.

Or at least, binaryen hardcodes CMAKE_OSX_DEPLOYMENT_TARGET to 10.15, and I believe this means that features that depend on > 10.15 at runtime are the compile time errors, regardless of how recent the build machine is.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented May 4, 2026

BTW, thanks for taking the time to time to do that comprehensive survey on real machines. I imagine that must have take a while, and its great to have hard evidence.

@juj
Copy link
Copy Markdown
Collaborator

juj commented May 4, 2026

Thanks. I posted PRs #8668 to fix Clang-14 build, and #8669 to fix Clang-13 build on the above macOS systems.

@tlively
Copy link
Copy Markdown
Member Author

tlively commented May 4, 2026

Ideally, it would be nice if Binaryen could take a goal that it would build wherever LLVM builds. I.e. target Clang-13 at the moment?

In general this is stronger than LLVM's own policy, since LLVM may build on compilers older than their documented minimum requirements. I would strongly object to adopting stricter requirements than LLVM has.

LLVM's policy for updating their minimum requirements is that they can adopt LLVM and GCC versions as recent as three years old as minimum requirements. There is an RFC in flight that would raise the minimum to clang 15, released on September 6, 2022, which corresponds to XCode version to 14.3, which was released March 30, 2023.

The first LLVM version to have good enough support for coroutines is 17, which was released on September 9, 2023. We are only four months away from this meeting LLVM's three-year-old requirement. The first XCode version to be based on clang 17 is XCode 16.0, released on September 16, 2024.

From my point of view, folks building emsdk will have to drop support for older compilers eventually as LLVM updates its minimum requirements, so this is already a problem such folks have to be prepared to solve. The question is how soon folks will have to solve it. I would very much like to use new language features, but I'd like to understand the other side of that trade off better.

@juj
Copy link
Copy Markdown
Collaborator

juj commented May 4, 2026

Thanks for detailing the links. I have two main motivations with backwards compatibility:

  1. Typically, requiring the user to update system OS version and compiler toolchain to a newer one in order to build, is fine, for all installations/updates that are easy&possible for user to do. E.g. the "(OUTDATED)" fields above referred to updates that were easy for user to do, by going to OS system menus and clicking update. Keeping support for old versions that would have been easy to update, is not particularly important.

    But the difficulty arises e.g. on Apple devices when Apple financially decides that a device no longer gets new macOS versions. So certain versions, namely macOS 11.7, 12.7, 13.7, 14.8 and 15.7, represent versions that are "end of the road" OS versions for certain Apple hardware. This makes updating the system compilers impossible for the user in an easy way.

    This results in version cutoff charts like

    MacBook Supported OS Versions, Wikipedia

    and

    Mac Mini Supported OS Versions, Wikipedia

    As result, arbitrarily, certain macOS versions become much more important than others to target, since Apple leaves devices behind, stuck on those OS versions. This effect is unique to Apple mac, iOS and Google Android ecosystems, and does not happen on Windows or Linux to the same extent.

    We find that older Apple devices (5-10 years old), are still very popular among second hand markets for student and hobbyist developers. So even though they are ancient (and Apple deploys anti-consumer tactics to actively prevent users from using such old devices), they are still in circulation. People tend to use them as a "secondary device", or pass them on to someone else, or repurpose them as a dedicated build device, or similar.

  2. When running a small test lab of test hardware on Emscripten, it has been very convenient to 1:1 test building the toolchain + testing the use of toolchain symmetrically on the same device. This simplifies testing setup considerably.

    At Unity, whenever possible, we build the distributed artifacts on the min-level required system, because experience shows that OS vendors have spotty testing of backwards OS targeting, so that feature is sometimes brittle. By building the artifacts on the minimum OS version, we have avoided that bug surface area from appearing.

All that being said, I don't think we want to conclude that "shouldn't use C++20 since it's not available on a Mac from 2013". I wonder instead, if we should plan on introducing an intermediate bootstrap compiler build into e.g. the Emsdk ecosystem, so that these old systems could first acquire a newer compiler, and then use that newer compiler to compile LLVM & Binaryen. When that mechanism is in place, we would be guaranteed to have C++20 support available for example in the Emsdk scripts. That would resolve emscripten-core/emsdk#1704.

So:

  1. @tlively how urgent is your interest to start using C++20? In the short term, would it be possible for you to constrain the use of C++20 coroutines behind a C++ feature flag, to disable building the wasm-reduce tool if current compiler does not support C++20? (wasm-reduce is a developer tool that's not part of e.g. Emscripten use, so it is not critical to build at the moment?)

  2. What do you think of landing Clang-14 fix at Avoid use of C++20 <ranges> header in GlobalEffects.cpp #8668 and Clang-13 fix at Fix clang 13 build #8669 to help short-term?

  3. I can develop an intermediate compiler bootstrap into Emsdk, that would build e.g. Clang-16 for example, and then use that to build Binaryen as part of Emsdk. That should then allow targeting the C++20 features unconditionally in Binaryen in the long run, as we can assume that we have a compiler that supports it.

@tlively
Copy link
Copy Markdown
Member Author

tlively commented May 4, 2026

@juj, thanks for all the details. It's not urgent that we start using C++20, but having a plan to allow us to use it soon would be great. Adding a bootstrap step to the emsdk build sounds like a good solution to me, and I would be happy to land #8668 and #8669 and hold off on introducing new C++20 usage while you develop that. cc @sbc100 and @dschuff for their thoughts as well.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 4, 2026

Yeah, I like the idea of a bootstrap compiler in emsdk. Do we want/need this on platforms other than MacOS? If no, then we maybe still want to figure out minimum supported versions of MSVC and gcc/clang/libc++/libstdc++. If yes, then I guess it will just take more work to build the bootstrap (including a libc++ if necessary) but I think it should be doable.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented May 5, 2026

.. deleted my comment.. sorry I didn't finish reading you earlier reply @juj.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants