Skip to content

Eliminate the parallelization overhead when not needed#5538

Open
xificurk wants to merge 2 commits intophpstan:2.1.xfrom
xificurk:optimize-parellalization
Open

Eliminate the parallelization overhead when not needed#5538
xificurk wants to merge 2 commits intophpstan:2.1.xfrom
xificurk:optimize-parellalization

Conversation

@xificurk
Copy link
Copy Markdown

The main target of this optimization is partial analysis run with a small number of files.

In my test runs with single file analyzed, it cuts down the total time almost in half (8.9 s -> 5.0 s).

Note this actually reverts commit 2159057 (Run the parallel worker even for a low number of files) - I am not sure, what the motivation has been then.

@phpstan-bot
Copy link
Copy Markdown
Collaborator

You've opened the pull request against the latest branch 2.2.x. PHPStan 2.2 is not going to be released for months. If your code is relevant on 2.1.x and you want it to be released sooner, please rebase your pull request and change its target to 2.1.x.

Revert "Run the parallel worker even for a low number of files"

This reverts commit 2159057.
@xificurk xificurk force-pushed the optimize-parellalization branch from 16c8e91 to c4db303 Compare April 26, 2026 08:58
@xificurk xificurk changed the base branch from 2.2.x to 2.1.x April 26, 2026 08:59
@staabm staabm requested a review from ondrejmirtes April 26, 2026 10:20
@ondrejmirtes
Copy link
Copy Markdown
Member

The current code is important so that every analysis runs in a child worker. It means we will get a nice output even if the analysis in the worker crashes.

Addressing the root cause (the overhead you get for running a child worker) is really important here. It'd mean the analysis gets faster for everyone. Instead of working around it and run the analysis in the main thread. So if you're experiencing overhead, it'd be great to profile it with Blackfire or similar and make the code faster, or skip something that it doesn't have to do at all.

Again, to repeat my usual point, it's best to analyse the whole project instead of select files.

@xificurk
Copy link
Copy Markdown
Author

Addressing the root cause (the overhead you get for running a child worker) is really important here. It'd mean the analysis gets faster for everyone. Instead of working around it and run the analysis in the main thread. So if you're experiencing overhead, it'd be great to profile it with Blackfire or similar and make the code faster, or skip something that it doesn't have to do at all.

I can try dig into this more, but the bottom line is that there will always be some cost for the orchestration of child workers - the subprocess management is not free and never will be. So, I wouldn't call this a workaround - when analyzing just a few files skipping the subprocess management falls exactly into the category "skip something that it doesn't have to do at all".

@xificurk
Copy link
Copy Markdown
Author

@ondrejmirtes I did profile via SPX and the worker overhead comes almost all from the phpstan initialization - phar load, DI initialization...
That is why doing the analysis directly in the main process, instead of spinning up a worker, almost halfs the needed time - the bootup cost is paid only once, instead of twice.

Scrn_20260426_210816

@ondrejmirtes
Copy link
Copy Markdown
Member

From your comment I understood this would take 4 seconds which seems weird to me and that something could definitely be cut there.

@ondrejmirtes
Copy link
Copy Markdown
Member

Also - is this a graph of the worker process or the main process? My bet is on the worker process.

But the main process might be intersting too - maybe we can cut something from there. If we started to analyse small number of files in the main process, this optimization door would close.

@xificurk
Copy link
Copy Markdown
Author

Yes, the posted graph has been from worker, but the initialization is almost identical between main thread and the worker. The notable parts from both traces are: phar loading, neon parsing, DI initialization. DI initialization is dominated primarily by Better Reflection init, secondarily by symfony kernel parsing. As these are the core services phpstan type system, I don't think you can easily cut them away.

Main thread for completeness:
Scrn_20260429_084545

One notable difference from the worker is the result cache loading. For partial analysis there is probably an easy optimization to consider - as the result cache is not used anyway, it should be possible to eliminate the metadata loading, or at least the costly call to getScannedFiles().

@ondrejmirtes
Copy link
Copy Markdown
Member

Main worker should not need BetterReflection at all (maybe only at the end for collectors), it could be more lazy.

Child worker doesn't need result cache at all. In fact, in only-files analysis, result cache shouldn't be needed by the main worker at all. So things could definitely be made more lazy (on demand) for everyone.

@xificurk
Copy link
Copy Markdown
Author

Main worker should not need BetterReflection at all (maybe only at the end for collectors), it could be more lazy.

It is being initialized by direct request from ContainerFactory::postInitializeContainer(). Any tip, how to easily test, if it can be eliminated in the main thred completely?

Child worker doesn't need result cache at all. In fact, in only-files analysis, result cache shouldn't be needed by the main worker at all. So things could definitely be made more lazy (on demand) for everyone.

As can be seen from the traces this currently affects only the main thread. I already do have a local draft for this fix, but it brings 10x less effect than elimination of the wasteful parallelization prosed in this PR.

@ondrejmirtes
Copy link
Copy Markdown
Member

Right now the expensive stuff is done in https://github.com/phpstan/phpstan-src/blob/2.2.x/src/Reflection/BetterReflection/BetterReflectionSourceLocatorFactory.php. But I imagine it could run lazily on the first request to locateIdentifier. So a new SourceLocator implementation would act as a de-facto factory.

@staabm
Copy link
Copy Markdown
Contributor

staabm commented May 3, 2026

@xificurk could you do another profile on latest 2.1.x? I guess you manually adjusted the codebase to be able to create a profile for the worker.

could you write down how you created these 2 profiles?

@ondrejmirtes
Copy link
Copy Markdown
Member

Yes, I'd love to see profiles for main process and worker process, and see what else we could eliminate there.

@xificurk
Copy link
Copy Markdown
Author

xificurk commented May 3, 2026

I have a hacky local PoC that eliminates the BetterReflection overhad from the main thread. The change implemented in #5577 is only the first step. In short there are two more hurdles to clear

  1. DefaultStubFileProvider - doctrine & symfony extensions both use reflection when deciding what stub files to use
  2. AnalyserResultFinalizer - it only needs collector rules, but the current implementation of RuleRegistry needs to instantiate all the rules to filter them, and some rules (at least our internal ones) access reflection during their instantiation

I'll try to clean this up and send PRs for each of those.

But I would still consider the current PR valid. When the cost of partial analysis is small compared to the cost of phpstan lifecycle, the paralelization will always be an unncessary overhead.

@staabm I'll get back to you with some howto and updated profiles later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants