It would be nice to have some sort of health check API. We recently faced an issue with an update and Discourse was producing 500 errors.
However, curl returned 200:
curl -I https://forum.gitlab.com
HTTP/2 200
server: nginx
It would be nice to have some sort of health check API. We recently faced an issue with an update and Discourse was producing 500 errors.
However, curl returned 200:
curl -I https://forum.gitlab.com
HTTP/2 200
server: nginx
This exists, see
/srv/status
Ah, thank you! I was searching “health check” and that didn’t yield any results.
帖子 404。端点是什么?
我已更新了帖子。
实际上,我不认为 /srv/status 能捕获上述那样的迁移问题……
(而且,要构建一个能捕获此类问题的检查机制,难度相当大)
是的……/srv/status 作为一个非常轻量级的测试存在,它的作用仅仅是确保应用的中间件栈正常工作。
为了捕获自动部署中出现的问题,建议您监控 HTTP 200 状态码;如果非 200 状态码的数量大幅增加,则应触发告警。
将 https://discourse.example.org/srv/status 作为可用性监控的目标地址是否合适?我认为仅凭它可能不足以可靠地衡量“网站是否在线”,但若能有一个对系统负载更小的监控方案那就更好了。
(或者,是否有计划扩展该端点列出的组件?)
是的,那是一个合理的位置。如果你想要更高级的功能,也可以指向特定主题并搜索文本。
是的,我们之前一直使用 /about,但更倾向于改用这个。
我过去负责运维和值班的工作习惯让我觉得,如果它像这样:
db ok
middleware ok
whatever-else ok
...
all systems ok
或许仍然会很有趣(偶尔对故障排查也很有帮助)。