Webhooks/Sidekiq issue on dev instance

I run 2 Discourse instances:

  1. a standard one (Docker)
  2. a dev one, behind a nginx proxy

I’ve set up the same webhook on both instances. It works well from the standard instance, but not from the dev instance:

  • the Ping button never gets a response and stays gray, although the corresponding POST event seems to appear in the nginx log (EDIT: this log entry is about the click event localhost->Discourse, not the outgoing webhook ping message).

  • I see no error in the Discourse server console or nginx logs.

What should I check?

My nginx is set up according to @riking’s excellent post.

You might need to set a pointer in /etc/hosts to 127.0.0.1 (or your public IP, depending on where your nginx is listening) for your dev host.

2 Likes

Thanks @hellekin.
My nginx routes requests from www.myhostname.org to 127.0.0.1:3000.
So I’ve tried to add 127.0.0.1 www.myhostname.org to my host file, but it hasn’t solved the problem…

What do the logs say?

I don’t see anything pertaining to the webhook call neither in nginx nor Discourse logs.
However, I do see the webhook call in Sidekiq “Enqueued” list. The entry stays there forever. Any idea why Sidekiq never processes the job?

P.S.: because I could curl to my webhook service from the Discourse server, I believe nginx is not involved in the issue. I’m going to change the topic’s title to reflect that.

Using Firefox or Chromium, do you have network logs?

I’m sorry I can’t help you much, as I didn’t look into web hooks so far.

It’s solved now. The problem was about Sidekiq not processing jobs. I did a lot of things (updating Discourse, flushing redis, restarting Sidekiq, changing database.yml then restoring it, rebooting the server) and now it works.
Thanks again @hellekin!

3 Likes

It was too good to be true. Here is what I do on my dev instance:

  1. Create a webhook
  2. Use the Ping button: it works (I can trace it all the way up to my webhook service)
  3. Create a post: this adds 4 jobs in Sidekiq (2 x “event_name”=>“user_updated” and 2 x “event_name”=>“post_created”). But those jobs stays in the Busy list and aren’t processed.
  4. If I keep triggering events, they add up to the busy list. Somewhere along the way, even Ping events get stuck.
  5. At that point, I need to flush Redis and restart Sidekiq if I want to go back to point 2.

If I do the same on my Docker instance, it works like a charm.

I also want to mention that, in my admin Dashboard (on the dev instance), I have the following warning: “A check for updates has not been performed. Ensure sidekiq is running.”

Stuck jobs in development mode are a known problem with Sidekiq and Rails 5.1
It’s probably because of missing dependencies. See the following post for more information on that.

Feel free to send a pull request if you find missing dependencies in sidekiq jobs.

Unfortunately the ProcessPost job can’t be fixed that way. We are aware of the problem…
As a workaround you can change config.eager_load to true in development.rb

7 Likes

Thanks a lot @gerhard, setting config.eager_load to true seemed to solve the issue.
EDIT: I could work for 2 hours without any problem, then the issue came back…

Hi @jack2,

The next time it gets stucked, can you run kill -TTIN <pid of sidekiq process>? It’ll print out the backtrace of where the code is stucked at.

5 Likes

@tgxworld, the trace is too long to be posted here (52390 characters > 40000 character limit). Please advise.

Post it on https://pastebin.com/

Here it is: Sidekiq trace log - Pastebin.com

1 Like

Autoreloading for Sidekiq that was made available with the Rails 5 upgrade wasn’t compatible with our code which was causing the jobs to be stucked. The main problem is that the job that is being execute by sidekiq has to be execute in the same thread as the sidekiq processor. However, that wasn’t the case as we were wrapping each job in a new thread from within Sidekiq iteself. Once I figured out what the problem was, the fix is pretty straight forward.

https://github.com/discourse/discourse/commit/59aeb0bc56634edd8a8b35f638c30c014a826004

It also seems like ActiveSupport::Concurrency::ShareLock code in Rails 5 doesn’t have any form of timeout and just waits forever if it can’t acquire the lock.

7 Likes

We need to add a timeout. Infinite timeout on db related stuff is a recipe for suffering, as we have seen several times now…

Yea I’ll get a reproducible script up and open an issue with the Rails team.

4 Likes

It looks like there is something unique about our setup as I couldn’t reproduce it on a fresh Rails app. I’m going to leave this for now as I’ve spent way too much time on this for something that only affects development mode.

1 Like