It would be nice to have some sort of health check API. We recently faced an issue with an update and Discourse was producing 500 errors.
However, curl returned 200:
curl -I https://forum.gitlab.com
HTTP/2 200
server: nginx
It would be nice to have some sort of health check API. We recently faced an issue with an update and Discourse was producing 500 errors.
However, curl returned 200:
curl -I https://forum.gitlab.com
HTTP/2 200
server: nginx
This exists, see
/srv/status
Ah, thank you! I was searching “health check” and that didn’t yield any results.
Post is 404. What are the endpoints?
I’ve updated my post.
Actually I don’t think /srv/status would catch migration issues like the one mentioned above…
(and it would be pretty hard to build a check that does catch issues like that one)
Yes… /srv/status is there as a very cheap test, all it does is ensures the apps middleware stack is working.
To catch issues where you auto deploy I would recommend monitoring 200s, if there is a large increase in non 200s alert.
将 https://discourse.example.org/srv/status 作为可用性监控的目标地址是否合适?我认为仅凭它可能不足以可靠地衡量“网站是否在线”,但若能有一个对系统负载更小的监控方案那就更好了。
(或者,是否有计划扩展该端点列出的组件?)
是的,那是一个合理的位置。如果你想要更高级的功能,也可以指向特定主题并搜索文本。
是的,我们之前一直使用 /about,但更倾向于改用这个。
我过去负责运维和值班的工作习惯让我觉得,如果它像这样:
db ok
middleware ok
whatever-else ok
...
all systems ok
或许仍然会很有趣(偶尔对故障排查也很有帮助)。