New fuzzer mode: Fuzz against JavaScript#8655
New fuzzer mode: Fuzz against JavaScript#8655kripken wants to merge 45 commits intoWebAssembly:mainfrom
Conversation
|
@sbc100 I'm confused by the ruff lint errors here
|
Oops, looks like a bad merge in Regarding |
|
Fixed |
|
@sbc100 thanks, I'll disable lint on that one function then. |
| if (newHeapType.isBottom()) { | ||
| options.push_back(oldHeapType); |
There was a problem hiding this comment.
In cases where the old heap type is also bottom, this will end up with two copies of bottom in the options. Not incorrect, but wastes a bit of randomness.
There was a problem hiding this comment.
Hmm, yeah, but keeping the code simple seems good enough here.
| if (type == Type::unreachable) { | ||
| // Nothing sent here, so use the declared type - what we refine to | ||
| // must still validate even though this call is unreachable. | ||
| type = declaredParams[i]; | ||
| } |
There was a problem hiding this comment.
Using the declared type here will prevent any refinement. For reference types we should use the relevant non-nullable reference to bottom type to preserve maximum optionality.
| if (map[func->name].reffed) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
We might consider redirecting the references to new wrapper functions that have the original type and forward to the original function with its refined type.
| if (!lub.noted()) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Could use non-nullable bottom reference types here, too.
There was a problem hiding this comment.
I suppose, though in this case the import/export is never actually reached?
There was a problem hiding this comment.
Yes, but who knows what might end up happening in the engine 🤷 Might as well exercise as many situations as we can. Without looking at the V8 code, it seems unlikely but plausible that there would be some different code path taken on traps or exceptions, or when doing a JSPI suspension or something, depending on the return type even when the function never does a normal return.
There was a problem hiding this comment.
Hmm, but what would we refine it to? A totally random type..?
There was a problem hiding this comment.
A random subtype of the old result type, yes. The maybeRefine call below should not need to change.
| (import "module" "base" (func $import (param i32 anyref) (result eqref))) | ||
| (import "module" "base" (func $import-reffed (param i32 anyref) (result eqref))) | ||
|
|
||
| ;; Two exports, one which will be reffed. |
There was a problem hiding this comment.
Might as well expand "reffed" to "referenced" to keep hypothetical future spell checkers running in CI happier.
There was a problem hiding this comment.
What if we use a script similar to this to find an input module / seed for which the fuzzer generates output that contains all of the modifications we're interested in. Then we could just check in the input and output as a normal lit test and maybe check in the script in scripts/. That way there would be no need to read and understand any code to see that the test is testing the intended behavior.
There was a problem hiding this comment.
But the seed would constantly change, leading to a lot of churn and toil?
There was a problem hiding this comment.
Yeah, I guess if something unrelated changes the output we would need to entirely regenerate the test.
I still wish there were a more declarative way of doing this kind of property testing on our fuzzers, though. Can we at least refactor this into a generic utility for running the fuzzer iteratively and checking that the output satisfies various user-provided predicates with given probabilities? Laying this into reasonable abstractions would help make it more palatable.
There was a problem hiding this comment.
If we want something more generic here, another option is to move this code in the fuzzer itself. Each fuzzer could persist state over time, then run "variety testing" after enough iterations. It would basically be the same code as here, but running when the fuzzer runs, not in the unit tests. What do you think?
There was a problem hiding this comment.
Oh, yes, I kind of like that idea. I don't know about persisting the output modules over several iterations, but persisting statistics about the various predicates and raising an error once there is high confidence that there is problem sounds reasonable.
There was a problem hiding this comment.
Hmm, thinking more about it, I'm not sure it makes sense in the fuzzer. We need to start with a fixed testcase so we can see what mutations we added on top, like this test does, while the fuzzer starts with something totally random each time.
I pushed a refactor instead along the lines of your suggestion, a generic utility that is now used in a modular way.
|
Last commit fixes the exactness logic, which was wrong - it could pick an intermediate heap type and make it exact. |
| if (!lub.noted()) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Yes, but who knows what might end up happening in the engine 🤷 Might as well exercise as many situations as we can. Without looking at the V8 code, it seems unlikely but plausible that there would be some different code path taken on traps or exceptions, or when doing a JSPI suspension or something, depending on the return type even when the function never does a normal return.
There was a problem hiding this comment.
Yeah, I guess if something unrelated changes the output we would need to entirely regenerate the test.
I still wish there were a more declarative way of doing this kind of property testing on our fuzzers, though. Can we at least refactor this into a generic utility for running the fuzzer iteratively and checking that the output satisfies various user-provided predicates with given probabilities? Laying this into reasonable abstractions would help make it more palatable.
| if (newHeapType != new_.getHeapType() || newHeapType.isBasic()) { | ||
| newExactness = Inexact; | ||
| } |
There was a problem hiding this comment.
Might as well check these conditions before burning a bit to generate a new exactness above.
| if (!lub.noted()) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
A random subtype of the old result type, yes. The maybeRefine call below should not need to change.
| if (newHeapType != new_.getHeapType() || newHeapType.isBasic()) { | ||
| newExactness = Inexact; | ||
| } |
|
|
||
| # Given the types we saw for params or results, look in detail for the | ||
| # things we expect to see. | ||
| def found_expected(self, data): |
There was a problem hiding this comment.
IIUC, this looks for a single iteration where all of these things are true. But that's stricter than we want. Can we make each of these conditions a bool on the class, set them to true whenever we see them, and pass the test once all have been seen?
The new fuzzer flag
--fuzz-against-jstells the fuzzer we will only run thewasm against JS - not link it to wasm or something else. This lets it make
changes that are valid from JS's point of view, like refining things on the
boundary while not changing the arity.
For example, if we sent JS an anyref, but the actual type we send is
(ref $A)then we can refine to that type (or any type between it andanyref). We can do this for both export results and import params, as in
both cases we send things to JS and know their type.
Original idea was @tlively 's. This is useful for fuzzers that generate JS
and let Binaryen mutate the wasm: they can emit anyrefs on the
boundary, and Binaryen will be able to add new GC types in the
module and even refine the boundary to those types. Such a fuzzer
does not even need to emit GC types itself (it can emit anyref and send
only nulls). cc @rmahdav