健康检查 API

axil · 2019 年6 月 4 日 08:40

It would be nice to have some sort of health check API. We recently faced an issue with an update and Discourse was producing 500 errors.

However, curl returned 200:

curl -I https://forum.gitlab.com
HTTP/2 200
server: nginx

codinghorror · 2019 年6 月 4 日 08:58

This exists, see

/srv/status

axil · 2019 年6 月 4 日 11:53

Ah, thank you! I was searching “health check” and that didn’t yield any results.

ryancey · 2019 年7 月 29 日 08:44

帖子 404。端点是什么？

codinghorror · 2019 年7 月 29 日 08:56

我已更新了帖子。

michaeld · 2019 年7 月 29 日 11:21

实际上，我不认为 /srv/status 能捕获上述那样的迁移问题……

（而且，要构建一个能捕获此类问题的检查机制，难度相当大）

sam · 2019 年7 月 30 日 08:22

是的……/srv/status 作为一个非常轻量级的测试存在，它的作用仅仅是确保应用的中间件栈正常工作。

为了捕获自动部署中出现的问题，建议您监控 HTTP 200 状态码；如果非 200 状态码的数量大幅增加，则应触发告警。

downey · 2020 年1 月 28 日 21:18

将 https://discourse.example.org/srv/status 作为可用性监控的目标地址是否合适？我认为仅凭它可能不足以可靠地衡量“网站是否在线”，但若能有一个对系统负载更小的监控方案那就更好了。

（或者，是否有计划扩展该端点列出的组件？）

sam · 2020 年1 月 28 日 21:19

是的，那是一个合理的位置。如果你想要更高级的功能，也可以指向特定主题并搜索文本。

downey · 2020 年1 月 28 日 21:27

是的，我们之前一直使用 /about，但更倾向于改用这个。

我过去负责运维和值班的工作习惯让我觉得，如果它像这样：

db ok
middleware ok
whatever-else ok
...
all systems ok

或许仍然会很有趣（偶尔对故障排查也很有帮助）。

话题		回复	浏览量
How to test /srv/status Support	1	739	2021 年3 月 17 日
`/srv/status` returns OK even if database is broken Development	6	687	2020 年7 月 18 日
What URL should we monitor to be sure Discourse is up Support	2	1582	2016 年4 月 25 日
Webhook for Discourse Uptime Monitoring? Development	24	1911	2026 年1 月 16 日
`/srv/status` monitoring endpoint doesn't catch some service unavailability issues - one example free space Feature	14	1618	2017 年4 月 26 日