Discourse-doctor 👩‍⚕️

docker

(Joffrey Jaffeux) #1

Hi everyone,

lately I’ve been playing with the idea of having a discourse-doctor utility to help people troubleshoot their docker discourse instance.

I’m opening this topic to:

  • get feedback on this idea
  • get ideas on what checks we could add
  • maybe get some PRs :innocent:

Announcing Discourse Doctor
Blank page loaded after todays upgrade due to incompatible plugin
Problems with Discourse install in VMWare
( Sven Koschnicke) #2

I really like the idea! Made a PR :slight_smile:


(Andrew Waugh) #3

Good idea.

I would suggest making it send a weekly or monthly message to admins which summarized:

Disk space growth rate (uploads and backup size)
Estimated time until disk doesn’t have enough space for the next update or backup (based on growth rate above).
Bounce count and growth rate (i.e. are ISPs blacklisting you)
Even a rudimentary correlation between event log errors and plugins, would be helpful.

Some kind of indication of the “distance” between the instance and the current git version, particularly if it picked up on git changes to major bits like Ruby, or Postgres, or Docker, would be nice.

Even if all it created was a plain ascii report which answered the usual support questions (RAM, Swap, disk space, Version, docker version etc) that would be a start… Many of us running Discourse aren’t experienced sys admins, so we end up posting support threads of the form: “Something’s broke, please help.” to which the response is “How much disk space have you got free?” which prompts “How do I check that?”. Having a report that covered the most salient points which you could include in your “Something broke” message would save the people answering support questions some time.


(Joffrey Jaffeux) #4

Hi @JagWaugh, I can understand the value of this, but as I see it at the moment, it’s out of discourse-doctor scope. This utility is meant as a fast answer to “Why is my discourse instance not working?” and not a monitoring tool.

However, I do plan to add checks on ram, disk space, swap, docker… but as a one shot command, at least for the moment. Let’s concentrate on building a tool answering these questions, and maybe in the future we can see how to integrate it more into Discourse.


(Andrew Waugh) #5

The problem is, if it’s a plugin, then it’s only going to help with “Why is my discourse instance only working partially?” Once the instance has failed to rebuild it’s often working less than partially.

But a report which says “Here is what it was like before it broke” would help.

Like the status emails you get with your air miles membership: It can’t tell where you’re going, but it clearly says where you’ve been, and it gives you an indication of how much it will cost to get where you want to go.


(Jeff Atwood) #6

It is not a plugin, it’s a script you can run at the command line. So I am not sure much of this commentary makes sense…


(Joffrey Jaffeux) #7

Yes it’s not clear for me at the moment what kind of checks we could do concerning plugins. The only thing I can think of, would be searching github issues or meta for some recent topics with the installed plugin names, not sure it would be very effective.


(Andrew Waugh) #8

Ah… when you put it that way.

I missed that. I’ll go set the coffee machine up.


(Felix Freiberger) #9

Checking which plugins are not official ones and recommending that the user tries disabling them if you don’t find other issues might be a good (and easy) start? :slight_smile:


(Joffrey Jaffeux) #10

and also suggest safe-mode as a final check


(Jeff Atwood) #11

+ :100: on that one; unofficial plugins are much riskier than official.


(Felix Freiberger) #12

Checking whether the URL points to Discourse’s namespace in GitHub is a good enough approximation for is this plugin official, right?


(Joffrey Jaffeux) #13

that’s my plan indeed


(Joffrey Jaffeux) #14

Last revision should issue a warning if you have unofficial plugins:

warning If you encounter issues, you might want to consider disabling these unofficial plugins: discourse-mathjax


( Sven Koschnicke) #15

After my last contribution was not so successful :slight_smile: I still want to help. Checking RAM and disk space should be done. The same requirements as in the discourse docker setup script apply, I guess?


(Joffrey Jaffeux) #16

I havent looked at it yet, but yes we should work on it, feel free to PR, we can tweak it after if the numbers are not right no worries


(Joffrey Jaffeux) #17

Giving it a little bit more thought, we should just make sure anything we add into discourse-doctor is giving hints to a real world problem a user once faced. That would be my rule of thumb.


(Christoph) #18

So is this meant to me only a command line script or are you planning to also provide a UI in discourse itself?

When your instance is down and fails to start, you obviously need to co via command line, but they say that it’s good to see a doctor before it’s too late, i.e. while your forum is still up and running. If you’re planning to go down that route, there surely is some “An apple a day” type advice that the plugin could provide. For example, how about performing various checks on sidekiq, the error logs, perhaps even the NGINX logs or screened IPs?

As regards the plugins, you could hook into admin/plugins and highlight which plugin is official or even which one is tagged as broken here on meta.


(Joffrey Jaffeux) #19

It’s a script not a plugin. Its goal at the moment is to answer this kind of questions:

  • I just setup my discourse instance and xxx is not working
  • Help I have done xxx and my instance is not working
  • My instance was working yesterday and is not working today

So at the moment nothing in the UI, nothing automatic and nothing not directly related to operations. Also we have to make sure we are not checking something already checked in ./launcher rebuild app (like the storage driver…)

I’m not against widening the scope in the future, but I would like to focus on this at the moment.

Basically, the best help you can provide is topics where user were having issues and we lost time figuring what was wrong when discourse-doctor could have find something in few seconds.


( Sven Koschnicke) #20

So checking available RAM would not be very important at the moment? But checking available disk space is useful, since that might be the cause for many problems (if it is near zero).