Сетевое соединение Redis работает крайне медленно

Я постоянно вижу это в логах — значения варьируются от ~100 тыс. до ~1,35 млн, но показания около 100 тыс. встречаются довольно часто:

Ваше сетевое соединение Redis работает крайне плохо. Последние значения RTT: [97069, 103986, 98459, 100762, 381617]; в идеале они должны быть < 1000. Убедитесь, что Redis работает в той же зоне доступности (AZ) или дата-центре, что и Sidekiq. Если эти значения близки к 100 000, это может означать, что процесс Sidekiq исчерпал ресурсы процессора; уменьшите уровень параллелизма и/или обратитесь к https://github.com/mperham/sidekiq/discussions/5039

Это может указывать на то, что Redis не может использовать достаточное количество ресурсов процессора? Однако на самом сервере для процессора и оперативной памяти, похоже, есть значительный запас.

Также:
Sidekiq потребляет слишком много памяти (используется: 3570.19M) для 'www.example.com', выполняется перезапуск

Используется единый файл app.yml с версией Discourse stable 3.3.2.

Из app.yml:

UNICORN_SIDEKIQS: 9
DISCOURSE_SIDEKIQ_WORKERS: 5

Я также добавил эту конфигурацию на хост:

Информация из панели управления Sidekiq:


Кажется, что Redis не может превысить использование памяти в 1024 МБ.

Если у кого-то есть идеи, буду признателен! :meow_heart:

To follow up with this, I’m having this same issue with Jobs::PostAlert:

With those jobs often going up to 15 minutes when using 4 sidekiqs with 5 (default) threads with current testing. Seems like the jobs per second speed for Sidekiq is mostly dependent on how many of those jobs are being ran simultaneously and how many threads are free for the other jobs.

Increasing Sidekiqs to 6 or higher (5 threads) will increase the queue clearing speed, but postgres will crash fairly regularly (I am guessing from too many Jobs::PostAlert jobs being ran simultaneously.

This is on Stable 3.3.2. The changes and fixes from the linked thread seem to be already be implemented in 3.3.2, if I am not mistaken.

Postgres should never crash and generally indicates a postgres bug or some sort of larger problem.

Do you have logs?

2 лайка

Have you rebooted the server since making those kernel config changes?

Maybe

lscpu

would also be helpful

1 лайк

You should never bump UNICORN_SIDEKIQS that high, only increasing workers but

This should never happen.

The possibilities are:

  1. You are constrained on resources because either
    a) Your site has over grown the server resources
    b) You are misallocating resources
  2. There is a bug somewhere in the stack

I’d start making

UNICORN_SIDEKIQS: 1
DISCOURSE_SIDEKIQ_WORKERS: 20

which should release some RAM from your server.

For further information you will need to run the offending jobs in a PostgreSQL console and report what is the bottleneck.

1 лайк

Apologies for disappearing and thank you for the responses. :slight_smile:

I believe that the main issue for Redis being slow, was that THP was still enabled (when I had thought otherwise):

For PG crashing, the main solution for me was adding this to the app.yml:

docker_args:
  - "--shm-size=34g"

With the value set to db_shared_buffers + 2GB, with db_shared_buffers being 25% of the total host machine ram.

Overriding the default 512m:

1 лайк

I looked back at your posting history, and I see in Very slow Sidekiq issue … massive numbers of unread user notifications that you were running a 32 core 128 GB server, with a very large and active userbase. So in that context, I see why 34G is not such a large number! For context, though, it might be helpful (and interesting) to know the size of your setup - possibly here or even in your bio? (maybe daily and monthly active users, size of database backups, server config in RAM, swap, disk, CPUs.) Maybe even a thread where we just share our stats - large and small.