Upgrade from 3.2.0.beta3-dev to 3.2.0.beta3 failed due to out of memory

Hello,

Tried to upgrade on prompt from 3.2.0.beta3-dev to 3.2.0.beta3 and it broke my Discourse instance due to out of memory during ember build of assets. Tried ./launcher rebuild app with same result.

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [ember]
 2: 0xa94834  [ember]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
 5: 0xf42265  [ember]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [ember]
 7: 0xf2ee4e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
 8: 0xf30217 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [ember]
 9: 0xf113ea v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [ember]
10: 0x12d674f v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [ember]
11: 0x17035b9  [ember]
Aborted (core dumped)
error Command failed with exit code 134.
I, [2023-11-26T17:19:26.345389 #1]  INFO -- : yarn run v1.22.19
$ /var/www/discourse/app/assets/javascripts/node_modules/.bin/ember build
Environment: development
WARNING: ember-test-selectors: You are using an unsupported ember-cli-babel version. data-test properties are not automatically stripped from your JS code.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Running on a DigitalOcean instance with 1GB for a non-profit, so I canā€™t afford to resize it with more memory. 1GB is the minimum size for discourse and previous versions used to run without issues. Any ideas on how to make it run again?

Do you have Swap?

What is the output of

free -h
1 Like
               total        used        free      shared  buff/cache   available
Mem:           952Mi       321Mi       414Mi       3.1Mi       374Mi       631Mi
Swap:          2.0Gi        75Mi       1.9Gi

You would only need to resize it during the rebuild.

2 Likes

You might want to consider moving to Hetzner who offer competitive prices and 2 GB Ram on their base plan

1 Like

Hello and welcome @andreid :slight_smile:

My 1GB DO test site has been struggling with memory issues during rebuilds recently too. I temporarily upgraded to a 2GB just to get it over the line.

1 Like

It might be worth the effort to now update the minimum requirements in docs to 2GB RAM then?

2 Likes

I remember it happening last year and some tweaks were made JavaScript heap out of memory due to Ember CLI - #4 by JammyDodger. Iā€™m not sure if something can be done this time too, but Iā€™ll ask. :+1:

3 Likes

Thank you @RGJ and @JammyDodger, temporarily resizing it did the trick.

3 Likes

Adding 1G of swap should be functionally the same as adding 1G of RAM, if you have the disk space to do it. (It will probably take longer to run the upgrade, but thatā€™s performance, rather than function. What you desire is to avoid the out of memory situation.)

I have additional info in case it helps mitigate the issue from Discourseā€™s end. My instance (DigitalOcean ~1GB droplet w/ 2GB swap) recently began taking significantly longer to rebuild and reporting the same fatal error about 3 out of every 4 times (luck seems to improve after running ./launcher cleanup, but I donā€™t have enough sample size to confirm this).

Shortly before the heap out of memory error, these lines are logged:

Node.js heap_size_limit (491.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (491.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.

I am out of my domain here, so I apologize if I get something wrong. Some quick research indicates that ember-cli depends on node.js which is why I think this is relevant. The --max-old-space-size flag can potentially be set higher than the RAM (it would just go into swap space, which as mentioned is fine for this case), so perhaps 1024 is an artificial ceiling weā€™re hitting against that Discourse rebuilds can no longer be contained in.

Side notes: apparently --optimize-for-size is a node.js flag which helps reduce memory usage (not sure if itā€™s being used by Discourse/ember, perhaps it already is), and there is an anecdote out there of the garbage collector not being turned on for certain node.js uses, which may be an issue.

If any of this is relevant and controllable from the Discourse side of ember/node.js usage, it might be worth someone looking into it. If not, no worries, I will do the temporary 2GB upgrade solution proposed above. :slight_smile:

1 Like

That is a very good point! Right now we up it to 1024mb on low-RAM machines here. We could certainly experiment with increasing that to 1500 or 2000 and see if it helps.

If you have the time/inclination to try it out yourself, you could configure it by adding a new variable to the env: section of your app.yml file:

Edit: :warning: this is now the Discourse default. No need to configure yourself

  NODE_OPTIONS: "--max-old-space-size=2048"
3 Likes

Ah, perfect! I went ahead and tried it out.

Since the fatal error doesnā€™t happen every time, and a rebuild takes about 25 minutes lately (up from 5-10), it may be some time before I know if increasing that number solves the memory issue for these server specs.

But, I can already confirm that the two Node.js heap_size_limit warnings no longer appear in the rebuild log, and my first rebuild was successful, so thatā€™s promising.

EDIT: Iā€™ve been able to rebuild several times now with no issues, thanks to the NODE_OPTIONS setting above in my app.yml. Yay!

EDIT2: This solution should probably make its way into Discourse by way of increasing that magic number (link from Davidā€™s post) so that other low-RAM machines can continue to operate. If anyone reads this who knows how to do that, thatā€™d be great. :slight_smile:

2 Likes

We ran into this as well on https://caddy.community.

We ran ./launcher rebuild app a few times and it failed with various problems.

First we had problems with bundle install complaining about rbtrace (finishing with An error occurred while installing rbtrace (0.5.0), and Bundler cannot continue.)

Then eventually we had this OOM issue:

I, [2023-12-12T07:50:59.497921 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.

<--- Last few GCs --->

[3683:0x5dab130]   279104 ms: Scavenge 981.3 (1037.1) -> 974.5 (1037.1) MB, 8.3 / 0.0 ms  (average mu = 0.699, current mu = 0.681) allocation failure; 
[3683:0x5dab130]   279136 ms: Scavenge 981.8 (1037.1) -> 975.0 (1037.1) MB, 8.0 / 0.0 ms  (average mu = 0.699, current mu = 0.681) allocation failure; 
[3683:0x5dab130]   282606 ms: Mark-sweep 994.8 (1050.6) -> 987.7 (1048.9) MB, 3316.1 / 0.0 ms  (average mu = 0.593, current mu = 0.501) allocation failure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb83f50 node::Abort() [ember]
 2: 0xa94834  [ember]
 3: 0xd647c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [ember]
 4: 0xd64b67 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [ember]
 5: 0xf42265  [ember]
 6: 0xf5474d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, [snip]
Aborted (core dumped)
error Command failed with exit code 134.

And finally running it with ./discourse_doctor is managed to get past that eventually (why though? more stuff in cache in subsequent runs which made it use less memory? :thinking:)

I, [2023-12-12T08:02:50.556442 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake themes:update assets:precompile'
Node.js heap_size_limit (1010.0) is less than 1024MB. Setting --max-old-space-size=1024.
Node.js heap_size_limit (1010.0) is less than 2048MB. Disabling Webpack parallelization with JOBS=0 to conserve memory.
110:M 12 Dec 2023 08:07:50.026 * 100 changes in 300 seconds. Saving...
110:M 12 Dec 2023 08:07:50.030 * Background saving started by pid 3706
3706:C 12 Dec 2023 08:07:51.292 * DB saved on disk
3706:C 12 Dec 2023 08:07:51.294 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
110:M 12 Dec 2023 08:07:51.334 * Background saving terminated with success
Purging temp files
Bundling assets

But this was friction we shouldnā€™t have had to run into. Hopefully this improves in the future.

FWIW:

# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       1.3Gi        87Mi       138Mi       593Mi       394Mi
Swap:         2.0Gi       337Mi       1.7Gi
1 Like

Definitely, which is why we are gathering info here.

It appears that tweaking our NODE_OPTIONS environment variable is all that is needed, so Iā€™d guess that either a dependency of the app or a V8 change made our previous value there not work anymore.

@david how this looks?

1 Like

Looks good to me! Obviously 30m+ rebuilds are still not ideal, so I hope we can improve things in the not-too-distant future. But this seems like a good solution to stop the bleeding.

2 Likes

It is worth noting that the increase in postgres version 16 compared to version 13 consumes less space and is much better optimized. This can reduce the total amount of server memory consumed.

Iā€™ve run into a similar rebuild problem today (two container) with a 2GB + 2GB swap setup, for a small site.

Expanding it to 2GB + 4GB swap has gotten it over the line this time.

1 Like

2 posts were split to a new topic: Rebuild is showing ā€œEnvironment: developmentā€ during ember-cli build

FWIW, in my case, adding

to the app.yml didnā€™t help. What helped was simply


sudo apt update
sudo apt upgrade
1 Like