Hi guys,
Thank you a lot for all the tips you have provided us, very much appreciated. We believe we’ve identified the root cause of the recent memory issues.
Previously, running bundle exec rake assets:precompile:build at build-time (as root), did not require having redis nor connection to the database. This behavior has changed (ref: Introducing pre-compiled JS assets for self-hosters and Introducing a new build system for plugins ).
To accommodate this, we moved the bundle exec rake assets:precompile:build step to an init container at runtime (prior to execute db:migrate, etc). This allows it to run as the discourse user with the necessary service access to both redis and the database.
However, during execution, the process hits a loop in lib/plugin/js_manager.rb. Looking at ps -fe, we see pnpm repeatedly attempting to add itself, which leads to memory saturation:
...
discour+ 704 688 5 11:00 pts/0 00:00:00 node /usr/bin/pnpm -C=frontend/asset-processor node build.js
discour+ 718 704 5 11:00 pts/0 00:00:00 node /usr/bin/pnpm add pnpm@10.28.0 --loglevel=error --allow-build=@pnpm
discour+ 729 718 6 11:00 pts/0 00:00:00 node /usr/bin/pnpm add pnpm@10.28.0 --loglevel=error --allow-build=@pnpm
discour+ 740 729 6 11:00 pts/0 00:00:00 node /usr/bin/pnpm add pnpm@10.28.0 --loglevel=error --allow-build=@pnpm
discour+ 754 740 7 11:00 pts/0 00:00:00 node /usr/bin/pnpm add pnpm@10.28.0 --loglevel=error --allow-build=@pnpm
...
# and the list starts growing and goes on and on, provoking the memory saturation
In our tests, we found that running the init container as root instead, and running npm uninstall -g pnpm followed by npm install -g pnpm@10.28.0, resolves the loop and allows the plugin compilation to finish successfully:
...
[Plugin::JsManager] Compiling 49 plugins...
[Plugin::JsManager] Finished initial compilation of plugins in 5.82s
So before overengineering our infra and probably change our design, I think this question is more for @david : are there plans to restore the previous behavior for assets:precompile:build so that it can run without Redis or a DB connection (similar to what you are doing with the DISCOURSE_DOWNLOAD_PRE_BUILT_ASSETS: 0 flow)?
On a side note, and out of curiosity: why does running node process as a non-root user trigger this recursive pnpm installation loop, whereas running as root seems to avoid it?
Cheers,
Ismael