Locking issues in sidekiq jobs?


(Darix) #1

twice already we had cases where sidekiq was doing 100% cpu load if we strace the 2 processes shown maxing out the cpus we get:

[pid 17688] futex(0x1e9b6e4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e9b6e0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b718, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b71c, FUTEX_WAIT_PRIVATE, 759265, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 17688] futex(0x1e9b6e4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e9b6e0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b718, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b71c, FUTEX_WAIT_PRIVATE, 759267, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 17688] sched_yield()               = 0
[pid 17688] futex(0x1e9b6e4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e9b6e0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b718, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b71c, FUTEX_WAIT_PRIVATE, 759269, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 17688] futex(0x1e9b6e4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1e9b6e0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b718, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b71c, FUTEX_WAIT_PRIVATE, 759271, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17688] write(6, "!", 1)            = 1
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b6e0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 17688] futex(0x1e9b6e4, FUTEX_WAIT_PRIVATE, 2478513, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17688] futex(0x1e9b6b0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 17688] sched_yield()               = 0

Any advice how to debug the issue?

running: discourse-1.9.0.beta8~git1.3bdade8970


(Darix) #2

a few updates

  1. it didnt show any busy jobs while we saw the high cpu usage.
  2. Jobs::PullHotlinkedImages was the only job scheduled when we checked why the cpu usage was so high