Discourse-doctor šŸ‘©ā€āš•ļø

Hi everyone,

lately Iā€™ve been playing with the idea of having a discourse-doctor utility to help people troubleshoot their docker discourse instance.

https://github.com/jjaffeux/discourse-doctor

Iā€™m opening this topic to:

  • get feedback on this idea
  • get ideas on what checks we could add
  • maybe get some PRs :innocent:
18 Likes

I really like the idea! Made a PR :slight_smile:

3 Likes

Good idea.

I would suggest making it send a weekly or monthly message to admins which summarized:

Disk space growth rate (uploads and backup size)
Estimated time until disk doesnā€™t have enough space for the next update or backup (based on growth rate above).
Bounce count and growth rate (i.e. are ISPs blacklisting you)
Even a rudimentary correlation between event log errors and plugins, would be helpful.

Some kind of indication of the ā€œdistanceā€ between the instance and the current git version, particularly if it picked up on git changes to major bits like Ruby, or Postgres, or Docker, would be nice.

Even if all it created was a plain ascii report which answered the usual support questions (RAM, Swap, disk space, Version, docker version etc) that would be a startā€¦ Many of us running Discourse arenā€™t experienced sys admins, so we end up posting support threads of the form: ā€œSomethingā€™s broke, please help.ā€ to which the response is ā€œHow much disk space have you got free?ā€ which prompts ā€œHow do I check that?ā€. Having a report that covered the most salient points which you could include in your ā€œSomething brokeā€ message would save the people answering support questions some time.

3 Likes

Hi @JagWaugh, I can understand the value of this, but as I see it at the moment, itā€™s out of discourse-doctor scope. This utility is meant as a fast answer to ā€œWhy is my discourse instance not working?ā€ and not a monitoring tool.

However, I do plan to add checks on ram, disk space, swap, dockerā€¦ but as a one shot command, at least for the moment. Letā€™s concentrate on building a tool answering these questions, and maybe in the future we can see how to integrate it more into Discourse.

5 Likes

The problem is, if itā€™s a plugin, then itā€™s only going to help with ā€œWhy is my discourse instance only working partially?ā€ Once the instance has failed to rebuild itā€™s often working less than partially.

But a report which says ā€œHere is what it was like before it brokeā€ would help.

Like the status emails you get with your air miles membership: It canā€™t tell where youā€™re going, but it clearly says where youā€™ve been, and it gives you an indication of how much it will cost to get where you want to go.

1 Like

It is not a plugin, itā€™s a script you can run at the command line. So I am not sure much of this commentary makes senseā€¦

1 Like

Yes itā€™s not clear for me at the moment what kind of checks we could do concerning plugins. The only thing I can think of, would be searching github issues or meta for some recent topics with the installed plugin names, not sure it would be very effective.

Ahā€¦ when you put it that way.

I missed that. Iā€™ll go set the coffee machine up.

2 Likes

Checking which plugins are not official ones and recommending that the user tries disabling them if you donā€™t find other issues might be a good (and easy) start? :slight_smile:

5 Likes

and also suggest safe-mode as a final check

+ :100: on that one; unofficial plugins are much riskier than official.

4 Likes

Checking whether the URL points to Discourseā€™s namespace in GitHub is a good enough approximation for is this plugin official, right?

5 Likes

thatā€™s my plan indeed

Last revision should issue a warning if you have unofficial plugins:

warning If you encounter issues, you might want to consider disabling these unofficial plugins: discourse-mathjax

2 Likes

After my last contribution was not so successful :slight_smile: I still want to help. Checking RAM and disk space should be done. The same requirements as in the discourse docker setup script apply, I guess?

3 Likes

I havent looked at it yet, but yes we should work on it, feel free to PR, we can tweak it after if the numbers are not right no worries

Giving it a little bit more thought, we should just make sure anything we add into discourse-doctor is giving hints to a real world problem a user once faced. That would be my rule of thumb.

3 Likes

So is this meant to me only a command line script or are you planning to also provide a UI in discourse itself?

When your instance is down and fails to start, you obviously need to co via command line, but they say that itā€™s good to see a doctor before itā€™s too late, i.e. while your forum is still up and running. If youā€™re planning to go down that route, there surely is some ā€œAn apple a dayā€ type advice that the plugin could provide. For example, how about performing various checks on sidekiq, the error logs, perhaps even the NGINX logs or screened IPs?

As regards the plugins, you could hook into admin/plugins and highlight which plugin is official or even which one is tagged as broken here on meta.

Itā€™s a script not a plugin. Its goal at the moment is to answer this kind of questions:

  • I just setup my discourse instance and xxx is not working
  • Help I have done xxx and my instance is not working
  • My instance was working yesterday and is not working today

So at the moment nothing in the UI, nothing automatic and nothing not directly related to operations. Also we have to make sure we are not checking something already checked in ./launcher rebuild app (like the storage driverā€¦)

Iā€™m not against widening the scope in the future, but I would like to focus on this at the moment.

Basically, the best help you can provide is topics where user were having issues and we lost time figuring what was wrong when discourse-doctor could have find something in few seconds.

9 Likes

So checking available RAM would not be very important at the moment? But checking available disk space is useful, since that might be the cause for many problems (if it is near zero).