I would suggest making it send a weekly or monthly message to admins which summarized:
Disk space growth rate (uploads and backup size)
Estimated time until disk doesnāt have enough space for the next update or backup (based on growth rate above).
Bounce count and growth rate (i.e. are ISPs blacklisting you)
Even a rudimentary correlation between event log errors and plugins, would be helpful.
Some kind of indication of the ādistanceā between the instance and the current git version, particularly if it picked up on git changes to major bits like Ruby, or Postgres, or Docker, would be nice.
Even if all it created was a plain ascii report which answered the usual support questions (RAM, Swap, disk space, Version, docker version etc) that would be a startā¦ Many of us running Discourse arenāt experienced sys admins, so we end up posting support threads of the form: āSomethingās broke, please help.ā to which the response is āHow much disk space have you got free?ā which prompts āHow do I check that?ā. Having a report that covered the most salient points which you could include in your āSomething brokeā message would save the people answering support questions some time.
Hi @JagWaugh, I can understand the value of this, but as I see it at the moment, itās out of discourse-doctor scope. This utility is meant as a fast answer to āWhy is my discourse instance not working?ā and not a monitoring tool.
However, I do plan to add checks on ram, disk space, swap, dockerā¦ but as a one shot command, at least for the moment. Letās concentrate on building a tool answering these questions, and maybe in the future we can see how to integrate it more into Discourse.
The problem is, if itās a plugin, then itās only going to help with āWhy is my discourse instance only working partially?ā Once the instance has failed to rebuild itās often working less than partially.
But a report which says āHere is what it was like before it brokeā would help.
Like the status emails you get with your air miles membership: It canāt tell where youāre going, but it clearly says where youāve been, and it gives you an indication of how much it will cost to get where you want to go.
Yes itās not clear for me at the moment what kind of checks we could do concerning plugins. The only thing I can think of, would be searching github issues or meta for some recent topics with the installed plugin names, not sure it would be very effective.
Checking which plugins are not official ones and recommending that the user tries disabling them if you donāt find other issues might be a good (and easy) start?
After my last contribution was not so successful I still want to help. Checking RAM and disk space should be done. The same requirements as in the discourse docker setup script apply, I guess?
Giving it a little bit more thought, we should just make sure anything we add into discourse-doctor is giving hints to a real world problem a user once faced. That would be my rule of thumb.
So is this meant to me only a command line script or are you planning to also provide a UI in discourse itself?
When your instance is down and fails to start, you obviously need to co via command line, but they say that itās good to see a doctor before itās too late, i.e. while your forum is still up and running. If youāre planning to go down that route, there surely is some āAn apple a dayā type advice that the plugin could provide. For example, how about performing various checks on sidekiq, the error logs, perhaps even the NGINX logs or screened IPs?
As regards the plugins, you could hook into admin/plugins and highlight which plugin is official or even which one is tagged as broken here on meta.
Itās a script not a plugin. Its goal at the moment is to answer this kind of questions:
I just setup my discourse instance and xxx is not working
Help I have done xxx and my instance is not working
My instance was working yesterday and is not working today
So at the moment nothing in the UI, nothing automatic and nothing not directly related to operations. Also we have to make sure we are not checking something already checked in ./launcher rebuild app (like the storage driverā¦)
Iām not against widening the scope in the future, but I would like to focus on this at the moment.
Basically, the best help you can provide is topics where user were having issues and we lost time figuring what was wrong when discourse-doctor could have find something in few seconds.
So checking available RAM would not be very important at the moment? But checking available disk space is useful, since that might be the cause for many problems (if it is near zero).