Too many connections to DB, how to optimize

I have frequent 500 errors due to exceeding number of connections to my DB.

This results in errors like this in /logs:

Job exception: FATAL: remaining connection slots are reserved for non-replication superuser connections

My config:

UNICORN_WORKERS: 12
UNICORN_SIDEKIQS: 2
DISCOURSE_DB_POOL: 90

My DB under load performs like this:

How can I optimize this? Please keep in mind the default config of PSQL has max connections limit set to 100.

I have plenty of RAM and CPUs for both web_data container and data container (which is hosted on a second server).

Update: I have a lot of open connections to DB from sidekiq, a lot more than it should (I have just 2 sidekiqs with 5 threads each).

1 Like

Did you change sidekiq threads manually? I believe the default is 25, and you have 2 process, so 50 connections, only with sidekiq.

At high loads I strongly recommend running pgbouncer, how is your db configured?

1 Like

I would have to check, but /sidekiq reports only 5 threads per sidekiq:

Each web runs multiple threads as well, the numbers add up real quick

1 Like

I have a data container running on a second server.

I have changed the data.yml to expose db and redis ports and to give some extra memory:

db_shared_buffers: "8GB"
db_work_mem: "100MB"

I have 25 connections from unicorn workers and over 70 from sidekiqs.

Restarting the web_only container doesn’t solve the problem, it seems like sidekiq connection persist.
Restarting the data container solvers the problem for another couple of days.
It seems like there are some issues with sidekiq connections.

Interesting. We are usually internally totally shielded from this bug cause we use pg bouncer, but we will have a look internally to see if we see similar leaks. :man_farmer:t4:

Can you debug into the “connections” hanging there, do you have any info on what the last query is these stalled connections ran? how long are they stuck without running anything?

@tgxworld where is our official pg image with bouncer? we need a howto explaining how to use it.

1 Like

How do you rotate connections with pgbouncer (session pooling, transaction pooling)?
I will try to debug these stalled connections but I think most of them were idle.

We use transaction pooling, session pooling is a no-go with Rails.

2 Likes

Thx I will set this up but this will make it much harder to diagnose these problems.

After DB reset sidekiq connections slowly start piling up.

Right now I have 3 stuck connections with such queries:

INSERT INTO directory_items(period_type, user_id, likes_received, likes_given, topics_entered, days_visited, posts_read, topic_count, post_count) SELECT 5, u.id, 0, 0, 0, 0, 0, 0, 0 FROM users u LEFT JOIN directory_items di ON di.user_id = u.id AND di.period_type = 5 WHERE di.id IS NULL AND u.id > 0

INSERT INTO directory_items(period_type, user_id, likes_received, likes_given, topics_entered, days_visited, posts_read, topic_count, post_count) SELECT 5, u.id, 0, 0, 0, 0, 0, 0, 0 FROM users u LEFT JOIN directory_items di ON di.user_id = u.id AND di.period_type = 5 WHERE di.id IS NULL AND u.id > 0	

WITH x AS (SELECT u.id user_id, SUM(CASE WHEN ua.action_type = 2 THEN 1 ELSE 0 END) likes_received, SUM(CASE WHEN ua.action_type = 1 THEN 1 ELSE 0 END) likes_given, COALESCE((SELECT COUNT(topic_id) FROM topic_views AS v WHERE v.user_id = u.id AND v.viewed_at >= '2017-07-18 10:08:47.784897'), 0) topics_entered, COALESCE((SELECT COUNT(id) FROM user_visits AS uv WHERE uv.user_id = u.id AND uv.visited_at >= '2017-07-18 10:08:47.784897'), 0) days_visited, COALESCE((SELECT SUM(posts_read) FROM user_visits AS uv2 WHERE uv2.user_id = u.id AND uv2.visited_at >= '2017-07-18 10:08:47.784897'), 0) posts_read, SUM(CASE WHEN ua.action_type = 4 THEN 1 ELSE 0 END) topic_count, SUM(CASE WHEN ua.action_type = 5 THEN 1 ELSE 0 END) post_count FROM users AS u LEFT OUTER JOIN user_actions AS ua ON ua.user_id = u.id LEFT
2 Likes

Curious, how big is your database, how big is the directory_items table on your instance?

select pg_size_pretty(pg_database_size('discourse'));

returns 55 GB

The directory_items table is 207 MB.

1 Like

And in number of rows?

Record count for the table is 1698938

1 Like

Just updated the description with our README :slight_smile:

https://hub.docker.com/r/discourse/postgres/

4 Likes