极端负载/Disk IO/CPU/内存使用率问题

您好,最近我们的实例频繁收到“极端负载”消息。今天发生在上午 9:40。

我查看了服务器日志。上午 9:40 时,CPU 和负载似乎正常:

内存在全天也保持相当稳定:

峰值似乎出现在磁盘 I/O 以及出站/入站带宽上:

这让我怀疑是上午 9:40 的爬虫请求所致。但我不确定是否有方法可以核实这一点。我查看了爬虫列表及其请求数量,大部分来自 Google 和 Bing,因此我们肯定不会屏蔽它们。

由此我产生了以下疑问:

  • 是否有日志可以显示在特定时间点谁在访问网站?
  • 是否有方法让这些“优质”爬虫分散其请求?
  • 增加 CPU 或内存是否有助于改善这种情况?我对此有些怀疑,因为 CPU 和内存使用率并未出现峰值。平均内存使用率达到 80% 是否过高?

我们目前的配置是 2 个 vCPU 和 2GB 内存。我们将实例设置为 4 个 Unicorn 工作进程,这似乎与我们拥有的内存量相符。

Yes, check /var/discourse/shared/standalone/log/var-log/nginx/access.log.

Yes, the slow down crawler user agents site setting.

Looks like what you had was I/O wait during the 9:40 peak. Increasing RAM may help as more stuff can be kept at cache, but I don’t know if that peak was read or write, as you did cut off the graph legend :upside_down_face:.

That said, if you can afford, increasing the droplet size to the next available size will always help.

2 个赞

Thanks. I checked out the log at 9:40am and the requests look like they come from user actions (consumer browsers) rather than crawlers.

The green peak was read.

Yeah, since it doesn’t look like its crawlers, I think I will trial with the next droplet size and increase RAM from 2gb to 4gb and see if it helps.

It is a little surprising to me that user activity is the cause, because I have always been under the impression that we have fewer active posters in the last 2 years than before. But when I looked at Google Analytics, we do have steadily increase users- perhaps even though posters have decreased, lurkers have increased…

image

Thanks for the pointers. Appreciate it.

1 个赞

It looks like doubling the memory does make a difference, at least from looking at the graphs, especially with the disk i/o and the load spikes.

2 个赞