"Your Redis network connection is performing extremely poorly"

I am consistently getting this in the logs - with values between ~100k to ~1.35m - but the readings near 100k seem to be quite common:

Your Redis network connection is performing extremely poorly. Last RTT readings were [97069, 103986, 98459, 100762, 381617], ideally these should be < 1000. Ensure Redis is running in the same AZ or datacenter as Sidekiq. If these values are close to 100,000, that means your Sidekiq process may be CPU-saturated; reduce your concurrency and/or see https://github.com/mperham/sidekiq/discussions/5039

This indicates that perhaps Redis isn’t able to use enough CPU? There seems to be plenty of breathing room for CPU and ram on the server itself though.

also:
Sidekiq is consuming too much memory (using: 3570.19M) for 'www.example.com', restarting

This is using the all in one app.yml with Discourse stable 3.3.2.

From the app.yml:

UNICORN_SIDEKIQS: 9
DISCOURSE_SIDEKIQ_WORKERS: 5

I added this configuration to the host also:

Sidekiq dashboard info:


It does seem like Redis is not able to surpass 1024M memory usage.

If anyone has any ideas, I’d appreciate it! :meow_heart:

To follow up with this, I’m having this same issue with Jobs::PostAlert:

With those jobs often going up to 15 minutes when using 4 sidekiqs with 5 (default) threads with current testing. Seems like the jobs per second speed for Sidekiq is mostly dependent on how many of those jobs are being ran simultaneously and how many threads are free for the other jobs.

Increasing Sidekiqs to 6 or higher (5 threads) will increase the queue clearing speed, but postgres will crash fairly regularly (I am guessing from too many Jobs::PostAlert jobs being ran simultaneously.

This is on Stable 3.3.2. The changes and fixes from the linked thread seem to be already be implemented in 3.3.2, if I am not mistaken.

Postgres should never crash and generally indicates a postgres bug or some sort of larger problem.

Do you have logs?

1 Like

Have you rebooted the server since making those kernel config changes?

Maybe

lscpu

would also be helpful

You should never bump UNICORN_SIDEKIQS that high, only increasing workers but

This should never happen.

The possibilities are:

  1. You are constrained on resources because either
    a) Your site has over grown the server resources
    b) You are misallocating resources
  2. There is a bug somewhere in the stack

I’d start making

UNICORN_SIDEKIQS: 1
DISCOURSE_SIDEKIQ_WORKERS: 20

which should release some RAM from your server.

For further information you will need to run the offending jobs in a PostgreSQL console and report what is the bottleneck.