Ember-cli 构建内存使用量可能导致最小实例大小失败 (OOM)

Ed_S · 2022 年11 月 14 日 17:35

在升级过程中，内存（RAM+swap）的最大压力出现在“ember”进程运行时。我认为每次运行更新时，它都比以前更大，并且越来越接近无法在推荐的最低配置的计算机上运行。

在它实际失败之前，最好研究一下这个问题。（希望出于成本原因，答案不是增加推荐的最低配置。如果磁盘空间允许，增加 swap 会有帮助。原则上，可以暂时迁移到更昂贵的、内存更大的实例。）

我在小型实例上运行两个中等规模的论坛——我相信它们都在推荐的最低配置范围内。在这两种情况下，RAM+swap=3G。一种情况是使用 1G RAM 和 2G swap 的 Digital Ocean 实例，另一种情况是使用 2G RAM 和 1G swap 的 Hetzner 实例。

以下是 ember 进程在 DO 机器上使用 ps auxc 的三个快照：

USER       PID %CPU %MEM      VSZ    RSS TTY   STAT START   TIME COMMAND
1000     10342 87.7 65.1 32930460 657936 ?     Rl   16:57   2:23 ember

USER       PID %CPU %MEM      VSZ    RSS TTY   STAT START   TIME COMMAND
1000     10342 84.9 60.7 43572204 612668 ?     Rl   16:57   2:57 ember

USER       PID %CPU %MEM      VSZ    RSS TTY   STAT START   TIME COMMAND
1000     10342 81.2 55.2 43405220 557128 ?     Rl   16:57   3:40 ember

显然，43GB 的进程大小并不全部存在于虚拟内存中，因为我们只有 3G 可用。使用 65% 的内存大小作为 RSS 是令人印象深刻的，但本身并不是问题。可用内存和可用 swap 的数量表明机器接近内存不足 (OOM) 状况，这很可能会导致某个进程被终止，并导致更新的非正常结束。

这是 free 的时间点快照：

# free
              total        used        free      shared  buff/cache   available
Mem:        1009140      863552       72768        6224       72820       34868
Swap:       2097144     1160628      936516

为了尝试在最接近失败的情况下捕捉情况，我使用了 vmstat 5：

# vmstat 5 5
procs -----------memory----------    ---swap-- -----io----  -system-- ------cpu----
 r  b   swpd    free   buff  cache    si    so    bi    bo   in    cs us sy id wa st
 3  0 1392140  61200  11632  76432    41    32   117    93    0     1  2  1 97  0  0
 1  1 1467220  63416    324  67284  8786 20499 13178 20567 2539  8924 77 13  0 10  0
 0  2 1593340  57916   1096  53832 24262 46868 29986 46889 5377 18534 44 22  0 34  0
 4  0 1155632 120680   2772  86280 39111 35424 54768 37824 6987 25174 38 27  0 35  0
 3  0 1102988  74096   2852  85276 11261   246 12610   271 1879  6365 86  6  0  8  0

你会注意到大量的上下文切换 (cs)，大量的磁盘活动 (bi, bo) 和大量的 swap 活动 (si, so)，但最重要的是 swap 使用量高达 1.6G，可用内存下降到 60M，只有 54M 的缓冲区使用量。这意味着 3G 可用虚拟内存中有约 2.6G 正在使用。这是容量的 87%。(情况可能更糟，因为我们每 5 秒才采样一次。)

请注意，我在八月份更新时，情况已经令人担忧（使用了约 2G，远未达到今天的临界状态）：

# vmstat 5 5
procs -----------memory----------    ---swap-- -----io----  -system-- ------cpu----
 r  b    swpd   free   buff  cache    si    so    bi    bo   in    cs us sy id wa st
 3  0  700404  62740   1956  48748    35    29   108    92    3     8  2  1 96  0  1
 1  0  741000  65996   1880  44360  3708 11190  3982 11191  643  1437 92  4  0  3  1
 1  0  834836  70452   1480  53856   528 18969  4274 18974  532  1575 93  6  0  1  0
 4  1 1010144  82192   4644  44400 30065 38803 35455 39946 4432 19267 28 26  0 39  7
 1  0  644116 307764   1644  55348 24406 21154 27724 21945 2551  8672 52 22  0 21  6

david · 2022 年11 月 15 日 10:28

你好 @Ed_S - 你在测试中使用了哪个版本的 Discourse？我们定期更新 ember-cli 及其插件，所以我想确保我们查看的是相同的东西。

另外，你的虚拟机有多少个 CPU 核心？1 个？（你可以在控制台中运行 lscpu 来检查）

为了确保我们使用相同的数据，你能尝试运行以下命令吗：

/var/discourse/launcher enter app
cd /var/www/discourse/app/assets/javascripts/discourse
apt-get update && apt-get install time
NODE_OPTIONS='--max-old-space-size=2048' /usr/bin/time -v yarn ember build -prod

在我的测试液滴（1 CPU，1GB RAM，2GB 交换空间）上，我看到这个：

Command being timed: "yarn ember build -prod"
	User time (seconds): 369.74
	System time (seconds): 22.62
	Percent of CPU this job got: 81%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:02.73
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 774912
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 253770
	Minor (reclaiming a frame) page faults: 1158920
	Voluntary context switches: 519269
	Involuntary context switches: 383328
	Swaps: 0
	File system inputs: 7521784
	File system outputs: 316304
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

我们在这里使用的是相当标准的 ember 工具，所以我不太确定在配置方面我们能做些什么来减少内存使用。我们的长期目标是迁移到使用 Embroider，这可能会给我们提供更多选择。

Ed_S · 2022 年11 月 15 日 16:40

感谢 @david - 我知道 Ember 本身就是一个独立的事物。

我刚刚执行了那些命令。

# /var/discourse/launcher enter app
检测到 x86_64 架构。

警告：我们将开始下载 Discourse 基础镜像
此过程可能需要几分钟到一小时不等，具体取决于您的网络速度

请耐心等待

2.0.20220720-0049：正在从 discourse/base 拉取
摘要：sha256:7ff397003c78b64c9131726756014710e2e67568fbc88daad846d2b368a02364
状态：已下载较新镜像 discourse/base:2.0.20220720-0049
docker.io/discourse/base:2.0.20220720-0049

这是一个生产环境安装，截至昨天，它是最新的。目前报告：

已安装 2.9.0.beta12 (8f5936871c)

它是一个单 CPU 实例，与您的类似，拥有 1G 内存和 2G 交换空间。

time 命令的结果是

完成于 303.21 秒。

	正在计时命令：“yarn ember build -prod”
	用户时间（秒）：222.71
	系统时间（秒）：17.17
	此作业获得的 CPU 百分比：78%
	经过（挂钟）时间（小时：分钟：秒 或 分钟：秒）：5:04.15
	平均共享文本大小（千字节）：0
	平均非共享数据大小（千字节）：0
	平均堆栈大小（千字节）：0
	平均总大小（千字节）：0
	最大驻留集大小（千字节）：702292
	平均驻留集大小（千字节）：0
	主要（需要 I/O）页面错误：348190
	次要（重新分配帧）页面错误：1152689
	自愿上下文切换：617736
	非自愿上下文切换：774189
	交换：0
	文件系统输入：5001936
	文件系统输出：318280
	发送的套接字消息：0
	接收的套接字消息：0
	传递的信号：0
	页面大小（字节）：4096
	退出状态：0

在此之前，我已更新了主机并重新启动，因此容器中的所有内容都已全新重启。

vmstat 在另一个窗口中报告的最糟糕的内存使用情况：

# vmstat 1
procs  -----------memory----------    ---swap--  -----io----   -system-- ------cpu-----\n r  b    swpd   free   buff  cache    si     so    bi     bo    in    cs us sy id wa st\n 2  0  704000 136044  24136 158144  1517   3503  8256   4377   886  3564 43  8 43  6  0\n...\n 5  0 1451436  71604   1248  50196 55016 110236 73204 121060 13152 45971 29 60  0 10  1\n```

Ed_S · 2022 年11 月 15 日 17:46

看起来我们明确地将 Node 的允许堆从 500M 增加到了 2G - 也许这有点过了，1.5G 会更好：

github.com/discourse/discourse

PERF: Update ember-auto-import and webpack

main ← webpack-5

opened 11:31AM - 12 Feb 22 UTC

davidtaylorhq

+1740 -123

This makes a small improvement to 'cold cache' ember-cli build times, and a larg…e improvement to 'warm cache' build times The ember-auto-import update means that vendor is now split into multiple files for efficiency. These are named `chunk.*`, and should be included immediately after the `vendor.js` file. This commit also updates the rails app to render script tags for these chunks. This change was previously merged, and caused memory-related errors on RAM-constrained machines. This was because Webpack 5 switches from multiple worker processes to a single multi-threaded process. This meant that it was hitting node's default heap size limit (~500mb on a 1GB RAM server). Discourse's standard install procedure recommends adding 2GB swap to 1GB-RAM machines, so we can afford to override's Node's default via the `--max-old-space-size` flag.

值得注意的是，Ember 并不是机器上唯一运行的程序，我们正面临着 RAM+swap 的全局限制。因此，机器的历史以及所有其他运行进程的需求都会发挥作用。我的重启可能有助于达到比昨天更低的最高水位线。

上面的拉取请求在 Failed to upgrade discourse instance to Feb 15 2022 中被引用，我们在其中也提到有人遇到了内存不足的问题，通过重启得到了解决。

不幸的是，time 命令不报告峰值内存使用量。可能，在一台至少有 3G 内存且没有交换空间的机器上，RSS 计数将告诉我们 Ember 的峰值使用量。或者我们也可以采用其他策略 - 其中一些策略概述在这里，还有一些想法在这里。

令人尴尬的是，我们确实对内存使用感兴趣，而在许多情况下，人们对 RAM 使用感兴趣，这是一个不同的问题。

david · 2022 年11 月 15 日 21:43

我们添加该标志的原因是 Node 自己的 OOM killer 正在杀死构建 - 500M 不够。我很乐意尝试将其调整到 1.5G - 我刚在我的液滴上试了一下，似乎运行得还可以。事实上，即使是 1.0G 也足够了。

我尝试使用不同的 max_heap 大小来跟踪内存使用情况：

(while(true); do (free -m -t | grep Total | awk '{print $3}') &amp;&amp; sleep 0.5; done) | tee 1000mb.csv

在构建过程中显示了以下使用情况：

构建时间差异很小，但 1GB 和 1.5GB 的限制显然产生了更少的总体使用量。正如预期的那样，time 输出显示当 node 限制较低时，“Major page faults” 明显减少。

奇怪的是 1.5GB 和 1GB 之间的差异如此之小……

无论如何，我同意降低限制是个好主意。为了确保它不会影响更高规格机器上的构建性能，我认为我们应该只在知道限制太低时才覆盖它。否则，我们可以让 Node 使用其默认值。

这是一个 PR - 我们将尽快将其合并。感谢您提出这个问题 @Ed_S！

github.com/discourse/discourse

PERF: Adjust node memory threshold for `assets:precompile`

main ← ember-build-memory

opened 09:37PM - 15 Nov 22 UTC

davidtaylorhq

+14 -2

Previously we were forcing node's max-old-space-size to be 2GB. This override wa…s added in a01b1dd6 to avoid issues caused by a lower default node heap_size_limit on machines with less memory. This commit makes that `max-old-space-size` override more specific so that it only applies to machines with less memory. Other machines will go use Node's defaults. The override is also lowered to 1GB. This is still high enough for the build to complete, while reducing memory usage. https://meta.discourse.org/t/245547

system · 2022 年12 月 15 日 21:44

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

话题		回复	浏览量
Upgrade from 3.2.0.beta3-dev to 3.2.0.beta3 failed due to out of memory Self-hosting server-resources	20	1482	2024 年3 月 7 日
JavaScript heap out of memory due to Ember CLI Self-hosting	9	1976	2022 年2 月 8 日
High rebuild memory requirements: April 2025 edition Self-hosting hosting	32	612	2026 年2 月 15 日
Precompiling assets takes 20 minutes Self-hosting server-resources	18	1282	2024 年1 月 31 日
Failed to upgrade discourse instance to Feb 15 2022 Self-hosting server-resources	30	2799	2022 年3 月 23 日

Ember-cli 构建内存使用量可能导致最小实例大小失败 (OOM)

相关话题