Skip to content

deps: add ata JSON Schema validator for node.config.json#62603

Draft
mertcanaltin wants to merge 26 commits intonodejs:mainfrom
mertcanaltin:feat/json-schema-validation
Draft

deps: add ata JSON Schema validator for node.config.json#62603
mertcanaltin wants to merge 26 commits intonodejs:mainfrom
mertcanaltin:feat/json-schema-validation

Conversation

@mertcanaltin
Copy link
Copy Markdown
Member

@mertcanaltin mertcanaltin commented Apr 5, 2026

Adds ata-validator (v0.10.3) to validate node.config.json against its JSON Schema before parsing.

Clear error messages for invalid config instead of generic "Invalid value" errors.

Example output:

Invalid configuration in node.config.json: /nodeOptions/import: expected exactly one oneOf match, got 0

What's included:

  • deps/ata/ with ata-validator built using ATA_NO_RE2 (no extra dependencies, only simdjson which is already in core)
  • Schema validation step in node_config_file.cc before existing parsing
  • process.versions.ata

Spec compliance:

  • 98.5% Draft 2020-12 test suite (1172/1190)
  • 95.3% @exodus/schemasafe test suite
  • $dynamicRef / $anchor: 42/42 (100%)

Security:

  • Recursion depth guard for circular $ref
  • __proto__ safe const/enum comparison
  • Date format range validation
  • OSS-Fuzz submitted, 283/283 JSONTestSuite passing

refs: #62598

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

nodejs-github-bot commented Apr 5, 2026

Review requested:

  • @nodejs/tsc
  • @nodejs/gyp
  • @nodejs/security-wg

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 5, 2026
@mertcanaltin mertcanaltin marked this pull request as draft April 5, 2026 20:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 81.08108% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.72%. Comparing base (80e0f14) to head (a68e675).

Files with missing lines Patch % Lines
src/node_options.cc 82.11% 15 Missing and 7 partials ⚠️
src/node_config_file.cc 76.19% 0 Missing and 5 partials ⚠️
lib/internal/options.js 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #62603      +/-   ##
==========================================
+ Coverage   89.65%   89.72%   +0.06%     
==========================================
  Files         708      708              
  Lines      220413   220302     -111     
  Branches    42275    42277       +2     
==========================================
+ Hits       197607   197657      +50     
+ Misses      14658    14476     -182     
- Partials     8148     8169      +21     
Files with missing lines Coverage Δ
src/node_metadata.cc 92.95% <100.00%> (+0.10%) ⬆️
src/node_options.h 97.98% <ø> (ø)
lib/internal/options.js 96.05% <66.66%> (+49.46%) ⬆️
src/node_config_file.cc 75.26% <76.19%> (-8.70%) ⬇️
src/node_options.cc 85.96% <82.11%> (+9.35%) ⬆️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@juanarbol
Copy link
Copy Markdown
Member

IMHO this is a bit too much, an extra dep for a schema parser inside Node.js (not to mention that is not even v1.x)

This is not a blocker, this is just my personal opinion

@mertcanaltin
Copy link
Copy Markdown
Member Author

That's a fair concern. The dependency is intentionally minimal, it only uses simdjson which is already in core (no new external dependencies since it's built with ATA_NO_RE2).

On the version, you're right it's pre-1.0, but it already passes 96.9% of the official JSON Schema Test Suite. Happy to work toward 1.0 if that's a requirement.

@TheOneTheOnlyJJ
Copy link
Copy Markdown
Contributor

Maybe ata could also be exposed to JS userland as an API if it ends up being vendored in?

@mertcanaltin
Copy link
Copy Markdown
Member Author

Yes, that's the plan! The initial PR focuses on node.config.json validation as a starting point, but exposing it as a public API is the natural next step. We discussed this in #62598, Joyee mentioned it would be the strongest use case.

The main prerequisite is getting the security story solid first, we've already submitted to OSS-Fuzz and all 283 nst/JSONTestSuite tests are passing.

@ChALkeR
Copy link
Copy Markdown
Member

ChALkeR commented Apr 7, 2026

@mertcanaltin check also https://www.npmjs.com/package/@exodus/schemasafe testsuite

@mertcanaltin
Copy link
Copy Markdown
Member Author

Thanks @ChALkeR, I ran the schemasafe test suite against ata (draft2020-12):

95.3% pass rate, 1103/1157, 59 skipped (remote refs).

Main gaps are unevaluatedProperties/Items annotation tracking, some $ref URI edge cases (URN, absolute-path), grapheme counting in minLength/maxLength, and a few optional format semantics.

Running the suite also helped me catch a few things I fixed since: recursion depth guard for circular $ref, proto handling in const/enum, and date format range checks.

$dynamicRef is fully passing now (42/42). Working on the rest.

@mertcanaltin mertcanaltin force-pushed the feat/json-schema-validation branch from edc0176 to 85e4bca Compare April 10, 2026 20:26
@mertcanaltin mertcanaltin marked this pull request as ready for review April 10, 2026 20:28
@Qard
Copy link
Copy Markdown
Member

Qard commented Apr 11, 2026

CI says:

../../src/node_config_file.cc:2:10: fatal error: 'ata.h' file not found
    2 | #include "ata.h"
      |          ^~~~~~~

Copy link
Copy Markdown
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is a non-trivial amount of C++, it is all neatly contained in the external ata dependency where we can trivially iterate on any correctness requirements if there's any of the remaining spec features folks think are essential. The actual integration surface area is extremely minimal and straightforward.

I think it looks great. Excellent work! 👏🏻

@mertcanaltin mertcanaltin force-pushed the feat/json-schema-validation branch 3 times, most recently from 33202b7 to 32ba0e9 Compare April 11, 2026 06:33
@mertcanaltin mertcanaltin requested a review from Qard April 11, 2026 06:34
@mertcanaltin
Copy link
Copy Markdown
Member Author

thanks for review @Qard, I some conflicts solved, if have available time, can you review again ?

@mertcanaltin mertcanaltin force-pushed the feat/json-schema-validation branch from 32ba0e9 to 5bdb5f5 Compare April 11, 2026 07:51
Copy link
Copy Markdown
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me but I recommending adding more tests. Currently, the builtin tests are somewhat mininal.

@mertcanaltin
Copy link
Copy Markdown
Member Author

Thanks @lemire, added more test coverage, type mismatches for boolean, number, array, and nested array items, plus positive tests for valid configs and empty objects.

Copy link
Copy Markdown
Member

@avivkeller avivkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an update script?

@mertcanaltin mertcanaltin force-pushed the feat/json-schema-validation branch from d72a1af to 1bf468e Compare April 12, 2026 05:29
@mertcanaltin
Copy link
Copy Markdown
Member Author

Can you add an update script?

I added thanks @avivkeller

@ChALkeR
Copy link
Copy Markdown
Member

ChALkeR commented Apr 12, 2026

@mertcanaltin could it notice mistakes like #61595?

@mertcanaltin mertcanaltin force-pushed the feat/json-schema-validation branch from acd291c to 9414f0e Compare April 30, 2026 18:46
@mertcanaltin
Copy link
Copy Markdown
Member Author

@marco-ippolito I removed unused codes 0d17c47, 9b19d8b

@joyeecheung
Copy link
Copy Markdown
Member

joyeecheung commented Apr 30, 2026

I feel that as a 3000+ LOC deps this is a bit of an overkill to vendor in for this use case - node.config.json isn't really on a path that would benefit all that much from it compared to how it's currently implemented. The few extra ns gain is a drop in the ocean in the picture of start up time of dozens if not hundreds of ms; It's part of the trusted configuration that's outside our threat model so the security claim is out of scope. Until ata itself is battle tested enough in the wild by being used by more users, we are introducing maintenance burden for being the most common entry point where people are going to try out this new library, this means:

  1. When people notice bugs, it's likely they will be mostly reported to Node.js core and fixed here instead of being noticed elsewhere first. It means we would effectively take over ata's issue triage load
  2. Many security reporters would report vunerabilities they find in Node.js's dependencies even though they are not applicable to Node.js's use cases (take V8's type confusion vulnerabilities for example). This leads to an issue similar to 1 but even more serious because Node.js security triagers are even more overloaded by spams
  3. When the iteration of a library effectively happens in Node.js core, it will be coupled into Node.js's semver and introduce backport churn for releasers.

Since ata is also available as an npm package, I'd be more comfortable if ata is used by other projects e.g. server frameworks in paths where its stated goal - a more secure and performant validator - actually matters, and get this goal battle tested, bugs reported and fixed in ata through wider adoption, before vendoring it inside Node.js core. At that point using node.config.json would be mostly a drive-by, but at least we know for sure that the library can actually justify being in core as a input validator as proposed in #62598 instead of staying as an overkill for a use case where the gain is negligible.

@mertcanaltin
Copy link
Copy Markdown
Member Author

mertcanaltin commented May 1, 2026

Thanks @joyeecheung, the maintenance burden point is the strongest one and fair.

Some background: ata is in OSS-Fuzz, passes 98.5% of Draft 2020-12 suite, 95.3% of @exodus/schemasafe, 283/283 nst/JSONTestSuite. Dependency is minimal, ATA_NO_RE2 so only simdjson (already in core).

three maintainers (@Qard, @lemire, @anonrig) approved after looking at the code.

you are right on the userland adoption point, that's the bar I have not met yet. Fastify integration work is in progress (compatibility-focused, and perf), the plan was wider adoption before core. This PR was a bit early.

Happy to convert this to draft and revisit once ata reaches 1.0 and has real-world traction. What would count as sufficient testing for you? That bar would help me plan the path back.

@joyeecheung
Copy link
Copy Markdown
Member

joyeecheung commented May 1, 2026

It's hard to say what would be a good threshold for something to be considered mature, or whether a sharp threshold even exists or is it just going to be a very big grey area before it gets there; but I think usually when it reaches there this would be non-disputable for most people you ask; one possible measurement is npm download counts and dependent statistics e.g. this - this has the bias towards certain categories of libraries, but then this is in the category of libraries where these stats would skew towards, if you take a look at ajv's stats. Another possible measurement is simply time, the longer it's depended on by people the more issues will be uncovered and fixed. If the justification for a 3000+ LOC deps is that "it will be used for input validation", then IMO there needs to be some evidence that users actually prefer it over ajv because it's otherwise hard to explain to our users why we are adding a relatively unknown library that have not been battle tested instead of a well known, battle tested one whose API is already depended on by many.

@mertcanaltin
Copy link
Copy Markdown
Member Author

I’ll be focusing on the user side (starting with the Fastify integration) I’m marking this as a draft, and I’d like to thank everyone again for all the comments and feedback.

@mertcanaltin mertcanaltin marked this pull request as draft May 1, 2026 20:27
@TheOneTheOnlyJJ
Copy link
Copy Markdown
Contributor

Hey @mertcanaltin, I just opened rjsf-team/react-jsonschema-form#5052 to try to help you on the user validation front.

I believe a highly performant JSON Schema validation solution vendored into Node out of the box would be valuable (at least in my projects, it would).

Integrating with RJSF would increase the number of monthly ata downloads on npm and allow for faster, effective user validation of the library. For reference, the current RJSF AJV validator gets almost 1M downloads/month. Feel free to interact with the community there; the maintainers are very cooperative and would certainly help with a potential integration. I see it as a good step towards getting ata into Node eventually.

To prevent this discussion from going off-topic, I suggest we continue the discussion about this in the above-linked issue.

@mertcanaltin
Copy link
Copy Markdown
Member Author

Really appreciate this @TheOneTheOnlyJJ, RJSF is a great direction, I will continue the discussion on the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.