Tried to upgrade on prompt from 3.2.0.beta3-dev to 3.2.0.beta3 and it broke my Discourse instance due to out of memory during ember build of assets. Tried ./launcher rebuild app with same result.
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb83f50 node::Abort() [ember]
2: 0xa94834 [ember]
3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
5: 0xf42265 [ember]
6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [ember]
7: 0xf2ee4e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
8: 0xf30217 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
9: 0xf113ea v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [ember]
10: 0x12d674f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [ember]
11: 0x17035b9 [ember]
Aborted (core dumped)
error Command failed with exit code 134.
I, [2023-11-26T17:19:26.345389 #1] INFO -- : yarn run v1.22.19
$ /var/www/discourse/app/assets/javascripts/node_modules/.bin/ember build
Environment: development
WARNING: ember-test-selectors: You are using an unsupported ember-cli-babel version. data-test properties are not automatically stripped from your JS code.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Running on a DigitalOcean instance with 1GB for a non-profit, so I canāt afford to resize it with more memory. 1GB is the minimum size for discourse and previous versions used to run without issues. Any ideas on how to make it run again?
Adding 1G of swap should be functionally the same as adding 1G of RAM, if you have the disk space to do it. (It will probably take longer to run the upgrade, but thatās performance, rather than function. What you desire is to avoid the out of memory situation.)
I have additional info in case it helps mitigate the issue from Discourseās end. My instance (DigitalOcean ~1GB droplet w/ 2GB swap) recently began taking significantly longer to rebuild and reporting the same fatal error about 3 out of every 4 times (luck seems to improve after running ./launcher cleanup, but I donāt have enough sample size to confirm this).
Shortly before the heap out of memory error, these lines are logged:
Node.js heap_size_limit (491.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (491.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.
I am out of my domain here, so I apologize if I get something wrong. Some quick research indicates that ember-cli depends on node.js which is why I think this is relevant. The --max-old-space-size flag can potentially be set higher than the RAM (it would just go into swap space, which as mentioned is fine for this case), so perhaps 1024 is an artificial ceiling weāre hitting against that Discourse rebuilds can no longer be contained in.
Side notes: apparently --optimize-for-size is a node.js flag which helps reduce memory usage (not sure if itās being used by Discourse/ember, perhaps it already is), and there is an anecdote out there of the garbage collector not being turned on for certain node.js uses, which may be an issue.
If any of this is relevant and controllable from the Discourse side of ember/node.js usage, it might be worth someone looking into it. If not, no worries, I will do the temporary 2GB upgrade solution proposed above.
That is a very good point! Right now we up it to 1024mb on low-RAM machines here. We could certainly experiment with increasing that to 1500 or 2000 and see if it helps.
If you have the time/inclination to try it out yourself, you could configure it by adding a new variable to the env: section of your app.yml file:
Edit: this is now the Discourse default. No need to configure yourself
Since the fatal error doesnāt happen every time, and a rebuild takes about 25 minutes lately (up from 5-10), it may be some time before I know if increasing that number solves the memory issue for these server specs.
But, I can already confirm that the two Node.js heap_size_limit warnings no longer appear in the rebuild log, and my first rebuild was successful, so thatās promising.
EDIT: Iāve been able to rebuild several times now with no issues, thanks to the NODE_OPTIONS setting above in my app.yml. Yay!
EDIT2: This solution should probably make its way into Discourse by way of increasing that magic number (link from Davidās post) so that other low-RAM machines can continue to operate. If anyone reads this who knows how to do that, thatād be great.
We ran ./launcher rebuild app a few times and it failed with various problems.
First we had problems with bundle install complaining about rbtrace (finishing with An error occurred while installing rbtrace (0.5.0), and Bundler cannot continue.)
Then eventually we had this OOM issue:
I, [2023-12-12T07:50:59.497921 #1] INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.
<--- Last few GCs --->
[3683:0x5dab130] 279104 ms: Scavenge 981.3 (1037.1) -> 974.5 (1037.1) MB, 8.3 / 0.0 ms (average mu = 0.699, current mu = 0.681) allocation failure;
[3683:0x5dab130] 279136 ms: Scavenge 981.8 (1037.1) -> 975.0 (1037.1) MB, 8.0 / 0.0 ms (average mu = 0.699, current mu = 0.681) allocation failure;
[3683:0x5dab130] 282606 ms: Mark-sweep 994.8 (1050.6) -> 987.7 (1048.9) MB, 3316.1 / 0.0 ms (average mu = 0.593, current mu = 0.501) allocation failure; GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb83f50 node::Abort() [ember]
2: 0xa94834 [ember]
3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
5: 0xf42265 [ember]
6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, [snip]
Aborted (core dumped)
error Command failed with exit code 134.
And finally running it with ./discourse_doctor is managed to get past that eventually (why though? more stuff in cache in subsequent runs which made it use less memory? )
I, [2023-12-12T08:02:50.556442 #1] INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.
110:M 12 Dec 2023 08:07:50.026 * 100 changes in 300 seconds. Saving...
110:M 12 Dec 2023 08:07:50.030 * Background saving started by pid 3706
3706:C 12 Dec 2023 08:07:51.292 * DB saved on disk
3706:C 12 Dec 2023 08:07:51.294 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
110:M 12 Dec 2023 08:07:51.334 * Background saving terminated with success
Purging temp files
Bundling assets
But this was friction we shouldnāt have had to run into. Hopefully this improves in the future.
FWIW:
# free -h
total used free shared buff/cache available
Mem: 1.9Gi 1.3Gi 87Mi 138Mi 593Mi 394Mi
Swap: 2.0Gi 337Mi 1.7Gi
Definitely, which is why we are gathering info here.
It appears that tweaking our NODE_OPTIONS environment variable is all that is needed, so Iād guess that either a dependency of the app or a V8 change made our previous value there not work anymore.
Looks good to me! Obviously 30m+ rebuilds are still not ideal, so I hope we can improve things in the not-too-distant future. But this seems like a good solution to stop the bleeding.
It is worth noting that the increase in postgres version 16 compared to version 13 consumes less space and is much better optimized. This can reduce the total amount of server memory consumed.