Eliminate the parallelization overhead when not needed by xificurk · Pull Request #5538 · phpstan/phpstan-src

xificurk · 2026-04-26T08:46:52Z

The main target of this optimization is partial analysis run with a small number of files.

In my test runs with single file analyzed, it cuts down the total time almost in half (8.9 s -> 5.0 s).

Note this actually reverts commit 2159057 (Run the parallel worker even for a low number of files) - I am not sure, what the motivation has been then.

phpstan-bot · 2026-04-26T08:47:08Z

You've opened the pull request against the latest branch 2.2.x. PHPStan 2.2 is not going to be released for months. If your code is relevant on 2.1.x and you want it to be released sooner, please rebase your pull request and change its target to 2.1.x.

Revert "Run the parallel worker even for a low number of files" This reverts commit 2159057.

ondrejmirtes · 2026-04-26T10:36:07Z

The current code is important so that every analysis runs in a child worker. It means we will get a nice output even if the analysis in the worker crashes.

Addressing the root cause (the overhead you get for running a child worker) is really important here. It'd mean the analysis gets faster for everyone. Instead of working around it and run the analysis in the main thread. So if you're experiencing overhead, it'd be great to profile it with Blackfire or similar and make the code faster, or skip something that it doesn't have to do at all.

Again, to repeat my usual point, it's best to analyse the whole project instead of select files.

xificurk · 2026-04-26T14:03:09Z

Addressing the root cause (the overhead you get for running a child worker) is really important here. It'd mean the analysis gets faster for everyone. Instead of working around it and run the analysis in the main thread. So if you're experiencing overhead, it'd be great to profile it with Blackfire or similar and make the code faster, or skip something that it doesn't have to do at all.

I can try dig into this more, but the bottom line is that there will always be some cost for the orchestration of child workers - the subprocess management is not free and never will be. So, I wouldn't call this a workaround - when analyzing just a few files skipping the subprocess management falls exactly into the category "skip something that it doesn't have to do at all".

xificurk · 2026-04-26T19:16:47Z

@ondrejmirtes I did profile via SPX and the worker overhead comes almost all from the phpstan initialization - phar load, DI initialization...
That is why doing the analysis directly in the main process, instead of spinning up a worker, almost halfs the needed time - the bootup cost is paid only once, instead of twice.

ondrejmirtes · 2026-04-26T19:20:47Z

From your comment I understood this would take 4 seconds which seems weird to me and that something could definitely be cut there.

ondrejmirtes · 2026-04-26T19:27:38Z

Also - is this a graph of the worker process or the main process? My bet is on the worker process.

But the main process might be intersting too - maybe we can cut something from there. If we started to analyse small number of files in the main process, this optimization door would close.

xificurk · 2026-04-29T07:01:48Z

Yes, the posted graph has been from worker, but the initialization is almost identical between main thread and the worker. The notable parts from both traces are: phar loading, neon parsing, DI initialization. DI initialization is dominated primarily by Better Reflection init, secondarily by symfony kernel parsing. As these are the core services phpstan type system, I don't think you can easily cut them away.

Main thread for completeness:

One notable difference from the worker is the result cache loading. For partial analysis there is probably an easy optimization to consider - as the result cache is not used anyway, it should be possible to eliminate the metadata loading, or at least the costly call to getScannedFiles().

ondrejmirtes · 2026-04-29T07:50:40Z

Main worker should not need BetterReflection at all (maybe only at the end for collectors), it could be more lazy.

Child worker doesn't need result cache at all. In fact, in only-files analysis, result cache shouldn't be needed by the main worker at all. So things could definitely be made more lazy (on demand) for everyone.

xificurk · 2026-04-29T10:04:43Z

Main worker should not need BetterReflection at all (maybe only at the end for collectors), it could be more lazy.

It is being initialized by direct request from ContainerFactory::postInitializeContainer(). Any tip, how to easily test, if it can be eliminated in the main thred completely?

Child worker doesn't need result cache at all. In fact, in only-files analysis, result cache shouldn't be needed by the main worker at all. So things could definitely be made more lazy (on demand) for everyone.

As can be seen from the traces this currently affects only the main thread. I already do have a local draft for this fix, but it brings 10x less effect than elimination of the wasteful parallelization prosed in this PR.

ondrejmirtes · 2026-04-29T10:16:06Z

Right now the expensive stuff is done in https://github.com/phpstan/phpstan-src/blob/2.2.x/src/Reflection/BetterReflection/BetterReflectionSourceLocatorFactory.php. But I imagine it could run lazily on the first request to locateIdentifier. So a new SourceLocator implementation would act as a de-facto factory.

staabm · 2026-05-03T07:17:48Z

@xificurk could you do another profile on latest 2.1.x? I guess you manually adjusted the codebase to be able to create a profile for the worker.

could you write down how you created these 2 profiles?

ondrejmirtes · 2026-05-03T07:25:42Z

Yes, I'd love to see profiles for main process and worker process, and see what else we could eliminate there.

xificurk · 2026-05-03T19:45:39Z

I have a hacky local PoC that eliminates the BetterReflection overhad from the main thread. The change implemented in #5577 is only the first step. In short there are two more hurdles to clear

DefaultStubFileProvider - doctrine & symfony extensions both use reflection when deciding what stub files to use
AnalyserResultFinalizer - it only needs collector rules, but the current implementation of RuleRegistry needs to instantiate all the rules to filter them, and some rules (at least our internal ones) access reflection during their instantiation

I'll try to clean this up and send PRs for each of those.

But I would still consider the current PR valid. When the cost of partial analysis is small compared to the cost of phpstan lifecycle, the paralelization will always be an unncessary overhead.

@staabm I'll get back to you with some howto and updated profiles later

xificurk added 2 commits April 26, 2026 10:58

Eliminate the parallelization overhead when not needed

37037d5

Revert "Run the parallel worker even for a low number of files" This reverts commit 2159057.

Update E2E test

c4db303

xificurk force-pushed the optimize-parellalization branch from 16c8e91 to c4db303 Compare April 26, 2026 08:58

xificurk changed the base branch from 2.2.x to 2.1.x April 26, 2026 08:59

staabm requested a review from ondrejmirtes April 26, 2026 10:20

staabm mentioned this pull request May 1, 2026

Lazily initialize AggregateSourceLocator to speedup bootstrapping #5577

Merged

Conversation

xificurk commented Apr 26, 2026

Uh oh!

phpstan-bot commented Apr 26, 2026

Uh oh!

ondrejmirtes commented Apr 26, 2026

Uh oh!

xificurk commented Apr 26, 2026

Uh oh!

xificurk commented Apr 26, 2026

Uh oh!

ondrejmirtes commented Apr 26, 2026

Uh oh!

ondrejmirtes commented Apr 26, 2026

Uh oh!

xificurk commented Apr 29, 2026

Uh oh!

ondrejmirtes commented Apr 29, 2026

Uh oh!

xificurk commented Apr 29, 2026

Uh oh!

ondrejmirtes commented Apr 29, 2026

Uh oh!

staabm commented May 3, 2026

Uh oh!

ondrejmirtes commented May 3, 2026

Uh oh!

xificurk commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xificurk commented May 3, 2026 •

edited

Loading