"Your Redis network connection is performing extremely poorly"

I am consistently getting this in the logs - with values between ~100k to ~1.35m - but the readings near 100k seem to be quite common:

Your Redis network connection is performing extremely poorly. Last RTT readings were [97069, 103986, 98459, 100762, 381617], ideally these should be < 1000. Ensure Redis is running in the same AZ or datacenter as Sidekiq. If these values are close to 100,000, that means your Sidekiq process may be CPU-saturated; reduce your concurrency and/or see https://github.com/mperham/sidekiq/discussions/5039

This indicates that perhaps Redis isn’t able to use enough CPU? There seems to be plenty of breathing room for CPU and ram on the server itself though.

also:
Sidekiq is consuming too much memory (using: 3570.19M) for 'www.example.com', restarting

This is using the all in one app.yml with Discourse stable 3.3.2.

From the app.yml:

UNICORN_SIDEKIQS: 9
DISCOURSE_SIDEKIQ_WORKERS: 5

I added this configuration to the host also:

Sidekiq dashboard info:


It does seem like Redis is not able to surpass 1024M memory usage.

If anyone has any ideas, I’d appreciate it! :meow_heart:

To follow up with this, I’m having this same issue with Jobs::PostAlert:

With those jobs often going up to 15 minutes when using 4 sidekiqs with 5 (default) threads with current testing. Seems like the jobs per second speed for Sidekiq is mostly dependent on how many of those jobs are being ran simultaneously and how many threads are free for the other jobs.

Increasing Sidekiqs to 6 or higher (5 threads) will increase the queue clearing speed, but postgres will crash fairly regularly (I am guessing from too many Jobs::PostAlert jobs being ran simultaneously.

This is on Stable 3.3.2. The changes and fixes from the linked thread seem to be already be implemented in 3.3.2, if I am not mistaken.

Postgres should never crash and generally indicates a postgres bug or some sort of larger problem.

Do you have logs?

2 Likes

Have you rebooted the server since making those kernel config changes?

Maybe

lscpu

would also be helpful

1 Like

You should never bump UNICORN_SIDEKIQS that high, only increasing workers but

This should never happen.

The possibilities are:

  1. You are constrained on resources because either
    a) Your site has over grown the server resources
    b) You are misallocating resources
  2. There is a bug somewhere in the stack

I’d start making

UNICORN_SIDEKIQS: 1
DISCOURSE_SIDEKIQ_WORKERS: 20

which should release some RAM from your server.

For further information you will need to run the offending jobs in a PostgreSQL console and report what is the bottleneck.

1 Like

Apologies for disappearing and thank you for the responses. :slight_smile:

I believe that the main issue for Redis being slow, was that THP was still enabled (when I had thought otherwise):

For PG crashing, the main solution for me was adding this to the app.yml:

docker_args:
  - "--shm-size=34g"

With the value set to db_shared_buffers + 2GB, with db_shared_buffers being 25% of the total host machine ram.

Overriding the default 512m:

1 Like

I looked back at your posting history, and I see in Very slow Sidekiq issue … massive numbers of unread user notifications that you were running a 32 core 128 GB server, with a very large and active userbase. So in that context, I see why 34G is not such a large number! For context, though, it might be helpful (and interesting) to know the size of your setup - possibly here or even in your bio? (maybe daily and monthly active users, size of database backups, server config in RAM, swap, disk, CPUs.) Maybe even a thread where we just share our stats - large and small.