“随机”502 错误

我在 GCE 服务器上运行 Discourse 安装。用户报告系统随机返回 502 错误。我可以通过点击“最新”、“新帖”、“未读”、“热门”和“分类”链接来复现该问题。迟早其中一个链接会返回 502 错误。

我检查了代理服务器的日志,发现针对失败的 URL 记录了如下条目:
“upstream prematurely closed connection while reading response header from upstream”(上游过早关闭连接,无法从上游读取响应头)。存在大量此类错误,且看似针对随机 URL。

根据我看到的帖子,我已采取以下步骤尝试解决问题:

  • 升级操作系统
  • 升级 Docker
  • 升级 Discourse
  • 重启服务器

最初的安装是使用 Docker Cloud 设置指南完成的。随后我遵循指南将备份和镜像切换到使用 S3。

我的服务器运行环境如下:
Ubuntu 14.04.6 LTS (GNU/Linux 4.4.0-148-generic x86_64)

根据 discourse-doctor 的输出:

     DOCKER VERSION: Docker version 18.06.3-ce, build d7080c1

==================== MEMORY INFORMATION ====================
RAM (MB): 4820

             total       used       free     shared    buffers     cached
Mem:          4707       2206       2501        140        101        948
-/+ buffers/cache:       1156       3550
Swap:         2047          0       2047

==================== DISK SPACE CHECK ====================
---------- OS Disk Space ----------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   33G   15G  70% /
/dev/sda1        50G   33G   15G  70% /var/lib/docker

==================== DISK INFORMATION ====================

Disk /dev/sda: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *       16065   104856254    52420095   83  Linux
Partition 1 does not start on physical sector boundary.

==================== END DISK INFORMATION ====================

我运行了 top 命令并观察了 CPU 和内存数值,未发现任何异常。我也查看了日志,但没有发现指向问题的线索。

我还可以提供哪些其他详细信息以帮助排查此问题?我应该采取哪些步骤来追踪根源?

谢谢,

Stephen

1 个赞

It could be that Postgres needs a bit more memory. You’ve got plenty, so you might bump db_shared_buffers to 1024MB. You might also bump db_work_mem to 80MB.

Thank you for the suggestion. I made both of those changes in the yml file. Restarting the app didn’t seem to make a difference, so I ended up rebooting the server. Unfortunately I can still replicate the problem.

You need to rebuild or

cd /var/discourse
./launcher destroy app
./launcher start app

for the changes to take effect.

And, this might not be a silver bullet, but I have seen it help.

2 个赞

So far so good, we’ll monitor and see how this helps. Thank you!

2 个赞