deps: add ata JSON Schema validator for node.config.json#62603
deps: add ata JSON Schema validator for node.config.json#62603mertcanaltin wants to merge 26 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #62603 +/- ##
==========================================
+ Coverage 89.65% 89.72% +0.06%
==========================================
Files 708 708
Lines 220413 220302 -111
Branches 42275 42277 +2
==========================================
+ Hits 197607 197657 +50
+ Misses 14658 14476 -182
- Partials 8148 8169 +21
🚀 New features to boost your workflow:
|
|
IMHO this is a bit too much, an extra dep for a schema parser inside Node.js (not to mention that is not even v1.x) This is not a blocker, this is just my personal opinion |
|
That's a fair concern. The dependency is intentionally minimal, it only uses simdjson which is already in core (no new external dependencies since it's built with ATA_NO_RE2). On the version, you're right it's pre-1.0, but it already passes 96.9% of the official JSON Schema Test Suite. Happy to work toward 1.0 if that's a requirement. |
|
Maybe ata could also be exposed to JS userland as an API if it ends up being vendored in? |
|
Yes, that's the plan! The initial PR focuses on The main prerequisite is getting the security story solid first, we've already submitted to OSS-Fuzz and all 283 nst/JSONTestSuite tests are passing. |
|
@mertcanaltin check also https://www.npmjs.com/package/@exodus/schemasafe testsuite |
|
Thanks @ChALkeR, I ran the schemasafe test suite against ata (draft2020-12): 95.3% pass rate, 1103/1157, 59 skipped (remote refs). Main gaps are unevaluatedProperties/Items annotation tracking, some $ref URI edge cases (URN, absolute-path), grapheme counting in minLength/maxLength, and a few optional format semantics. Running the suite also helped me catch a few things I fixed since: recursion depth guard for circular $ref, proto handling in const/enum, and date format range checks. $dynamicRef is fully passing now (42/42). Working on the rest. |
edc0176 to
85e4bca
Compare
|
CI says: |
Qard
left a comment
There was a problem hiding this comment.
While it is a non-trivial amount of C++, it is all neatly contained in the external ata dependency where we can trivially iterate on any correctness requirements if there's any of the remaining spec features folks think are essential. The actual integration surface area is extremely minimal and straightforward.
I think it looks great. Excellent work! 👏🏻
33202b7 to
32ba0e9
Compare
|
thanks for review @Qard, I some conflicts solved, if have available time, can you review again ? |
32ba0e9 to
5bdb5f5
Compare
lemire
left a comment
There was a problem hiding this comment.
Looks fine to me but I recommending adding more tests. Currently, the builtin tests are somewhat mininal.
|
Thanks @lemire, added more test coverage, type mismatches for boolean, number, array, and nested array items, plus positive tests for valid configs and empty objects. |
avivkeller
left a comment
There was a problem hiding this comment.
Can you add an update script?
d72a1af to
1bf468e
Compare
I added thanks @avivkeller |
|
@mertcanaltin could it notice mistakes like #61595? |
Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
acd291c to
9414f0e
Compare
|
@marco-ippolito I removed unused codes 0d17c47, 9b19d8b |
|
I feel that as a 3000+ LOC deps this is a bit of an overkill to vendor in for this use case - node.config.json isn't really on a path that would benefit all that much from it compared to how it's currently implemented. The few extra ns gain is a drop in the ocean in the picture of start up time of dozens if not hundreds of ms; It's part of the trusted configuration that's outside our threat model so the security claim is out of scope. Until ata itself is battle tested enough in the wild by being used by more users, we are introducing maintenance burden for being the most common entry point where people are going to try out this new library, this means:
Since ata is also available as an npm package, I'd be more comfortable if ata is used by other projects e.g. server frameworks in paths where its stated goal - a more secure and performant validator - actually matters, and get this goal battle tested, bugs reported and fixed in ata through wider adoption, before vendoring it inside Node.js core. At that point using node.config.json would be mostly a drive-by, but at least we know for sure that the library can actually justify being in core as a input validator as proposed in #62598 instead of staying as an overkill for a use case where the gain is negligible. |
|
Thanks @joyeecheung, the maintenance burden point is the strongest one and fair. Some background: ata is in OSS-Fuzz, passes 98.5% of Draft 2020-12 suite, 95.3% of @exodus/schemasafe, 283/283 nst/JSONTestSuite. Dependency is minimal, three maintainers (@Qard, @lemire, @anonrig) approved after looking at the code. you are right on the userland adoption point, that's the bar I have not met yet. Fastify integration work is in progress (compatibility-focused, and perf), the plan was wider adoption before core. This PR was a bit early. Happy to convert this to draft and revisit once ata reaches 1.0 and has real-world traction. What would count as sufficient testing for you? That bar would help me plan the path back. |
|
It's hard to say what would be a good threshold for something to be considered mature, or whether a sharp threshold even exists or is it just going to be a very big grey area before it gets there; but I think usually when it reaches there this would be non-disputable for most people you ask; one possible measurement is npm download counts and dependent statistics e.g. this - this has the bias towards certain categories of libraries, but then this is in the category of libraries where these stats would skew towards, if you take a look at ajv's stats. Another possible measurement is simply time, the longer it's depended on by people the more issues will be uncovered and fixed. If the justification for a 3000+ LOC deps is that "it will be used for input validation", then IMO there needs to be some evidence that users actually prefer it over ajv because it's otherwise hard to explain to our users why we are adding a relatively unknown library that have not been battle tested instead of a well known, battle tested one whose API is already depended on by many. |
|
I’ll be focusing on the user side (starting with the Fastify integration) I’m marking this as a draft, and I’d like to thank everyone again for all the comments and feedback. |
|
Hey @mertcanaltin, I just opened rjsf-team/react-jsonschema-form#5052 to try to help you on the user validation front. I believe a highly performant JSON Schema validation solution vendored into Node out of the box would be valuable (at least in my projects, it would). Integrating with RJSF would increase the number of monthly ata downloads on npm and allow for faster, effective user validation of the library. For reference, the current RJSF AJV validator gets almost 1M downloads/month. Feel free to interact with the community there; the maintainers are very cooperative and would certainly help with a potential integration. I see it as a good step towards getting ata into Node eventually. To prevent this discussion from going off-topic, I suggest we continue the discussion about this in the above-linked issue. |
|
Really appreciate this @TheOneTheOnlyJJ, RJSF is a great direction, I will continue the discussion on the linked issue. |
Adds ata-validator (v0.10.3) to validate
node.config.jsonagainst its JSON Schema before parsing.Clear error messages for invalid config instead of generic "Invalid value" errors.
Example output:
What's included:
deps/ata/with ata-validator built usingATA_NO_RE2(no extra dependencies, only simdjson which is already in core)node_config_file.ccbefore existing parsingprocess.versions.ataSpec compliance:
@exodus/schemasafetest suite$dynamicRef/$anchor: 42/42 (100%)Security:
$ref__proto__safeconst/enumcomparisonrefs: #62598