Number of scheduled jobs is near 2,000

(Paul Apostolos) #1

In Sidekiq the “Scheduled” tab shows 1,960 scheduled jobs with When = 3 minutes from now.

Many of them seem to be Jobs::RunHeartBeat.

I think this is bad, because some of the site’s functionality seems to be broken (email notifications most notably).

How can I fix this without losing data (pending notifications)?

"Ensure sidekiq is running." when it is definitely running
(Sam Saffron) #2

simplest immediate fix is to rebuild the container, but this does sound like a bug to me

(Paul Apostolos) #3

@sam you wanna have a look? I just upgraded docker manager and discourse…Still stuck around 2,000.

(Sam Saffron) #4

I mean from outside…

cd /var/docker
git pull
./launcher rebuild container

(Paul Apostolos) #5

Yeah, I knew what you meant…I just thought if you wanted to see the bug live while it was happening (if it truly was a bug), you might want to do that before I rebuild.

No worries. I started the rebuild. Thanks.

(Paul Apostolos) #6

I did the git pull and rebuild app (three times).

I even upgraded the instance server to be dual core and 4GB during one of the reboots I did.

Still the same result. 1970 scheduled jobs.

(Sam Saffron) #7

let me log in then, do “ssh-import-id sam-saffron” as root and PM me the ip address, will take a look in a couple of hours once I am done with child care (that I am doing a bit poorly)

(Sam Saffron) #8

Looks like a bug with backup restore … somehow your instance is stuck thinking it is in the middle of a backup/restore operations.

@zogstrip we need some guarantees this can not happen, recommend you set the key with a 1 minute expiry and have the job performing work extend the expiry by a minute every 30 seconds.

That means that if you kill 9 the whole business the key will go away.

Workaround to fix issue:

cd /var/docker
./launcher ssh app
rails c
irb > BackupRestore.mark_as_not_running!
irb> Sidekiq.unpause!

Same fix should apply to sidekiq pausing, its a very hard problem to solve if it creeps up.

Bonus points, I was able to debug this by looking at logs when I ran a backup job

Backup process broken
Number of schedule jobs not going down after backup / restore
"Ensure sidekiq is running." when it is definitely running
(Paul Apostolos) #9

Thank you so much @sam for helping. If it helps…I have noticed this before and @zogstrip helped me last time.

I think it may have something to do with the automated backup to S3. It started shortly after I implemented that.

Here is the previous (related) issue.

(Régis Hanol) #10

This is now done :crocodile:

(Sam Saffron) #11

Closing this, I think it is fixed now.

Feel free to flag to reopen if needed.

(Sam Saffron) #12