Upgrade from 3.2.0.beta3-dev to 3.2.0.beta3 failed due to out of memory

andreid · November 26, 2023, 6:00pm

Hello,

Tried to upgrade on prompt from 3.2.0.beta3-dev to 3.2.0.beta3 and it broke my Discourse instance due to out of memory during ember build of assets. Tried ./launcher rebuild app with same result.

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [ember]
 2: 0xa94834  [ember]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
 5: 0xf42265  [ember]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [ember]
 7: 0xf2ee4e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
 8: 0xf30217 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
 9: 0xf113ea v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [ember]
10: 0x12d674f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [ember]
11: 0x17035b9  [ember]
Aborted (core dumped)
error Command failed with exit code 134.
I, [2023-11-26T17:19:26.345389 #1]  INFO -- : yarn run v1.22.19
$ /var/www/discourse/app/assets/javascripts/node_modules/.bin/ember build
Environment: development
WARNING: ember-test-selectors: You are using an unsupported ember-cli-babel version. data-test properties are not automatically stripped from your JS code.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Running on a DigitalOcean instance with 1GB for a non-profit, so I can’t afford to resize it with more memory. 1GB is the minimum size for discourse and previous versions used to run without issues. Any ideas on how to make it run again?

itsbhanusharma · November 26, 2023, 6:31pm

Do you have Swap?

What is the output of

free -h

andreid · November 26, 2023, 6:40pm

               total        used        free      shared  buff/cache   available
Mem:           952Mi       321Mi       414Mi       3.1Mi       374Mi       631Mi
Swap:          2.0Gi        75Mi       1.9Gi

RGJ · November 26, 2023, 7:00pm

You would only need to resize it during the rebuild.

itsbhanusharma · November 26, 2023, 7:03pm

You might want to consider moving to Hetzner who offer competitive prices and 2 GB Ram on their base plan

JammyDodger · November 26, 2023, 7:05pm

Hello and welcome @andreid

My 1GB DO test site has been struggling with memory issues during rebuilds recently too. I temporarily upgraded to a 2GB just to get it over the line.

itsbhanusharma · November 26, 2023, 7:07pm

It might be worth the effort to now update the minimum requirements in docs to 2GB RAM then?

JammyDodger · November 26, 2023, 7:09pm

I remember it happening last year and some tweaks were made JavaScript heap out of memory due to Ember CLI - #4 by JammyDodger. I’m not sure if something can be done this time too, but I’ll ask.

andreid · November 26, 2023, 8:39pm

Thank you @RGJ and @JammyDodger, temporarily resizing it did the trick.

Ed_S · November 29, 2023, 10:02pm

Adding 1G of swap should be functionally the same as adding 1G of RAM, if you have the disk space to do it. (It will probably take longer to run the upgrade, but that’s performance, rather than function. What you desire is to avoid the out of memory situation.)

Troid92 · December 11, 2023, 9:30pm

I have additional info in case it helps mitigate the issue from Discourse’s end. My instance (DigitalOcean ~1GB droplet w/ 2GB swap) recently began taking significantly longer to rebuild and reporting the same fatal error about 3 out of every 4 times (luck seems to improve after running ./launcher cleanup, but I don’t have enough sample size to confirm this).

Shortly before the heap out of memory error, these lines are logged:

Node.js heap_size_limit (491.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (491.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.

I am out of my domain here, so I apologize if I get something wrong. Some quick research indicates that ember-cli depends on node.js which is why I think this is relevant. The --max-old-space-size flag can potentially be set higher than the RAM (it would just go into swap space, which as mentioned is fine for this case), so perhaps 1024 is an artificial ceiling we’re hitting against that Discourse rebuilds can no longer be contained in.

Side notes: apparently --optimize-for-size is a node.js flag which helps reduce memory usage (not sure if it’s being used by Discourse/ember, perhaps it already is), and there is an anecdote out there of the garbage collector not being turned on for certain node.js uses, which may be an issue.

If any of this is relevant and controllable from the Discourse side of ember/node.js usage, it might be worth someone looking into it. If not, no worries, I will do the temporary 2GB upgrade solution proposed above.

david · December 11, 2023, 9:38pm

That is a very good point! Right now we up it to 1024mb on low-RAM machines here. We could certainly experiment with increasing that to 1500 or 2000 and see if it helps.

If you have the time/inclination to try it out yourself, you could configure it by adding a new variable to the env: section of your app.yml file:

Edit: this is now the Discourse default. No need to configure yourself

  NODE_OPTIONS: "--max-old-space-size=2048"

Troid92 · December 11, 2023, 10:04pm

Ah, perfect! I went ahead and tried it out.

Since the fatal error doesn’t happen every time, and a rebuild takes about 25 minutes lately (up from 5-10), it may be some time before I know if increasing that number solves the memory issue for these server specs.

But, I can already confirm that the two Node.js heap_size_limit warnings no longer appear in the rebuild log, and my first rebuild was successful, so that’s promising.

EDIT: I’ve been able to rebuild several times now with no issues, thanks to the NODE_OPTIONS setting above in my app.yml. Yay!

EDIT2: This solution should probably make its way into Discourse by way of increasing that magic number (link from David’s post) so that other low-RAM machines can continue to operate. If anyone reads this who knows how to do that, that’d be great.

francislavoie · December 12, 2023, 8:23am

We ran into this as well on https://caddy.community.

We ran ./launcher rebuild app a few times and it failed with various problems.

First we had problems with bundle install complaining about rbtrace (finishing with An error occurred while installing rbtrace (0.5.0), and Bundler cannot continue.)

Then eventually we had this OOM issue:

I, [2023-12-12T07:50:59.497921 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.

<--- Last few GCs --->

[3683:0x5dab130]   279104 ms: Scavenge 981.3 (1037.1) -> 974.5 (1037.1) MB, 8.3 / 0.0 ms  (average mu = 0.699, current mu = 0.681) allocation failure; 
[3683:0x5dab130]   279136 ms: Scavenge 981.8 (1037.1) -> 975.0 (1037.1) MB, 8.0 / 0.0 ms  (average mu = 0.699, current mu = 0.681) allocation failure; 
[3683:0x5dab130]   282606 ms: Mark-sweep 994.8 (1050.6) -> 987.7 (1048.9) MB, 3316.1 / 0.0 ms  (average mu = 0.593, current mu = 0.501) allocation failure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [ember]
 2: 0xa94834  [ember]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
 5: 0xf42265  [ember]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, [snip]
Aborted (core dumped)
error Command failed with exit code 134.

And finally running it with ./discourse_doctor is managed to get past that eventually (why though? more stuff in cache in subsequent runs which made it use less memory? )

I, [2023-12-12T08:02:50.556442 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.
110:M 12 Dec 2023 08:07:50.026 * 100 changes in 300 seconds. Saving...
110:M 12 Dec 2023 08:07:50.030 * Background saving started by pid 3706
3706:C 12 Dec 2023 08:07:51.292 * DB saved on disk
3706:C 12 Dec 2023 08:07:51.294 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
110:M 12 Dec 2023 08:07:51.334 * Background saving terminated with success
Purging temp files
Bundling assets

But this was friction we shouldn’t have had to run into. Hopefully this improves in the future.

FWIW:

# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       1.3Gi        87Mi       138Mi       593Mi       394Mi
Swap:         2.0Gi       337Mi       1.7Gi

Falco · December 12, 2023, 4:21pm

Definitely, which is why we are gathering info here.

It appears that tweaking our NODE_OPTIONS environment variable is all that is needed, so I’d guess that either a dependency of the app or a V8 change made our previous value there not work anymore.

@david how this looks?

david · December 12, 2023, 5:06pm

Looks good to me! Obviously 30m+ rebuilds are still not ideal, so I hope we can improve things in the not-too-distant future. But this seems like a good solution to stop the bleeding.

volanar · December 12, 2023, 5:50pm

It is worth noting that the increase in postgres version 16 compared to version 13 consumes less space and is much better optimized. This can reduce the total amount of server memory consumed.

nathank · January 8, 2024, 10:44pm

I’ve run into a similar rebuild problem today (two container) with a 2GB + 2GB swap setup, for a small site.

Expanding it to 2GB + 4GB swap has gotten it over the line this time.

david · February 5, 2024, 1:17pm

2 posts were split to a new topic: Rebuild is showing “Environment: development” during ember-cli build

tophee · February 6, 2024, 8:27pm

FWIW, in my case, adding

to the app.yml didn’t help. What helped was simply


sudo apt update
sudo apt upgrade

Topic		Replies	Views
Ember-cli build memory usage risks failure (OOM) on minimum instance size Installation server-resources	5	1304	November 15, 2022
JavaScript heap out of memory due to Ember CLI Installation	9	1931	February 8, 2022
Failed to upgrade discourse instance to Feb 15 2022 Installation server-resources	30	2669	March 23, 2022
Precompiling assets takes 20 minutes Installation server-resources	18	1126	January 31, 2024
Failed to bootstrap due to out of memory killer Bug	36	2353	May 3, 2021

Upgrade from 3.2.0.beta3-dev to 3.2.0.beta3 failed due to out of memory

Related topics