Eliminate the parallelization overhead when not needed#5538
Eliminate the parallelization overhead when not needed#5538xificurk wants to merge 2 commits intophpstan:2.1.xfrom
Conversation
|
You've opened the pull request against the latest branch 2.2.x. PHPStan 2.2 is not going to be released for months. If your code is relevant on 2.1.x and you want it to be released sooner, please rebase your pull request and change its target to 2.1.x. |
Revert "Run the parallel worker even for a low number of files" This reverts commit 2159057.
16c8e91 to
c4db303
Compare
|
The current code is important so that every analysis runs in a child worker. It means we will get a nice output even if the analysis in the worker crashes. Addressing the root cause (the overhead you get for running a child worker) is really important here. It'd mean the analysis gets faster for everyone. Instead of working around it and run the analysis in the main thread. So if you're experiencing overhead, it'd be great to profile it with Blackfire or similar and make the code faster, or skip something that it doesn't have to do at all. Again, to repeat my usual point, it's best to analyse the whole project instead of select files. |
I can try dig into this more, but the bottom line is that there will always be some cost for the orchestration of child workers - the subprocess management is not free and never will be. So, I wouldn't call this a workaround - when analyzing just a few files skipping the subprocess management falls exactly into the category "skip something that it doesn't have to do at all". |
|
@ondrejmirtes I did profile via SPX and the worker overhead comes almost all from the phpstan initialization - phar load, DI initialization...
|
|
From your comment I understood this would take 4 seconds which seems weird to me and that something could definitely be cut there. |
|
Also - is this a graph of the worker process or the main process? My bet is on the worker process. But the main process might be intersting too - maybe we can cut something from there. If we started to analyse small number of files in the main process, this optimization door would close. |
|
Yes, the posted graph has been from worker, but the initialization is almost identical between main thread and the worker. The notable parts from both traces are: phar loading, neon parsing, DI initialization. DI initialization is dominated primarily by Better Reflection init, secondarily by symfony kernel parsing. As these are the core services phpstan type system, I don't think you can easily cut them away. One notable difference from the worker is the result cache loading. For partial analysis there is probably an easy optimization to consider - as the result cache is not used anyway, it should be possible to eliminate the metadata loading, or at least the costly call to |
|
Main worker should not need BetterReflection at all (maybe only at the end for collectors), it could be more lazy. Child worker doesn't need result cache at all. In fact, in only-files analysis, result cache shouldn't be needed by the main worker at all. So things could definitely be made more lazy (on demand) for everyone. |
It is being initialized by direct request from ContainerFactory::postInitializeContainer(). Any tip, how to easily test, if it can be eliminated in the main thred completely?
As can be seen from the traces this currently affects only the main thread. I already do have a local draft for this fix, but it brings 10x less effect than elimination of the wasteful parallelization prosed in this PR. |
|
Right now the expensive stuff is done in https://github.com/phpstan/phpstan-src/blob/2.2.x/src/Reflection/BetterReflection/BetterReflectionSourceLocatorFactory.php. But I imagine it could run lazily on the first request to locateIdentifier. So a new SourceLocator implementation would act as a de-facto factory. |
|
@xificurk could you do another profile on latest 2.1.x? I guess you manually adjusted the codebase to be able to create a profile for the worker. could you write down how you created these 2 profiles? |
|
Yes, I'd love to see profiles for main process and worker process, and see what else we could eliminate there. |
|
I have a hacky local PoC that eliminates the BetterReflection overhad from the main thread. The change implemented in #5577 is only the first step. In short there are two more hurdles to clear
I'll try to clean this up and send PRs for each of those. But I would still consider the current PR valid. When the cost of partial analysis is small compared to the cost of phpstan lifecycle, the paralelization will always be an unncessary overhead. @staabm I'll get back to you with some howto and updated profiles later |


The main target of this optimization is partial analysis run with a small number of files.
In my test runs with single file analyzed, it cuts down the total time almost in half (8.9 s -> 5.0 s).
Note this actually reverts commit 2159057 (Run the parallel worker even for a low number of files) - I am not sure, what the motivation has been then.