Endlessly running Postgres processes & bad performance after Reinstall/restore

helmi · February 17, 2025, 9:48am

In an effort to update our Forum I did a fresh VPS install + Restore on the weekend.

This was going to solve multiple issues for us:

Renew outdated ubuntu
Update Discourse
Upgrade to Postgres 15

While generally things went fine I saw issues coming up afterwards with Postgres processes going wild using 100% of a core. Varying numbers of processes though. I tried a few things from a rebuild to restarts. Currently trying a rake db:validate_indexes which is already running for a few hours without any feedback. I’m not sure if I had done this before and if it’s supposed to happen quicker.

Using the forum works well basically but definitely slowed down. Some longer running tasks like pulling up user profiles from more active users takes notably longer than usual.

I’m quite sure there are some issues with the database but I’m having a hard time finding out which ones.

I gotta say our database is quite huge - we’re at around 150gb after the restore and after index creation. From monitoring the restore process I have not seen any errors and index creation went fine to my eyes.

Any idea on how to tackle this? It’s 3 postgress processes right now - it’s been 6 before a restart that I did a few hours ago - I’ve already seen all 16 cores being used after the restore as well.

EDIT: I just noticed right now that 3 sidekiq processes are buys with “indexing categories for search”. May this all just be search index rebuilding? If so can this be solved otherwise? When we do the live system restore, this will be a huge problem if it degrades performance like that over multiple hours or even days.

helmi · February 17, 2025, 2:00pm

Right now it’s only one sidekiq task running with “Jobs::BackfillBadge” but still 7 postgres processes are seemingly blocking 100% cpu constantly. Really curious what’s going on there.

Falco · February 17, 2025, 2:08pm

After such moves it is a good idea to run a vacuum for database stats.

pfaffman · February 17, 2025, 3:53pm

How much ram and cpu do you have?

How much memory do you give postgres?

helmi · February 17, 2025, 4:02pm

That test serer is on 32GB, 16 core, config is set to 64MB work mem.

EDIT: Shared buffers are at 8GB

Currently doing a vaccuum which looks stuck

Not sure if it’s doing something but it’s been there for 30+ minutes already.

I did put the forum to read only and reboot the VM to end the 7 Postgres processes that were “stuck” there before. Quickly after the restart 2 of these postgres processes were back and didn’t change. Nothing running in sidekiq right now.

Falco · February 17, 2025, 4:14pm

You really don’t want to run a vacuum full. All you need to get performance back is a VACUUM VERBOSE ANALYZE. You can’t run a FULL in a running site.

pfaffman · February 17, 2025, 4:21pm

I’m not expert at huge databases, but I’d make the buffers two or three times that.

I’m sure you have indexes that are 8gb

RGJ · February 17, 2025, 5:32pm

Postgres recommends never setting shared_buffers over 40% of the internal memory?

That said,

Your server might be underdimensioned.

pfaffman · February 17, 2025, 5:40pm

Aha! Sensible advice from an expert! So maybe I was right that 8GB/25% isn’t enough, and though 16GB is more than 40%, it might still be a good suggestion because . . . .

helmi · February 17, 2025, 7:00pm

Guys. as said this is a test server - there is no traffic on it. This server definitely isn’t good enough for production use but that’s not the issue here. The question is why we do see postgres processes sticking like that (with 100% CPU usage) and slowing down things drastically. We were running the testserver with lower capacity even until a few days ago - it was only upsized due to the lack of disk size for a restore.

Production machine is running with 128GB of ram with the same Shared buffer settings without any issues - So I don’t think there’s a general issue with these settings and shared buffer size - especially not a private test machine without traffic.

But anyways - I will just redo the restore and see if something might have gone wrong as seemingly there’s not good explanation for the behaviour.

Topic		Replies	Views
Slow Sidekiq + Postmaster using 95%+ CPU (32 cores) after Postgresql Version Upgrade Installation server-resources	23	2875	June 28, 2020
Postgres using 100% of CPU Installation server-resources	18	19972	April 10, 2021
Post 3.5.1 update - CPU maxed out 100% Support	5	70	February 4, 2025
Primary Postgres database process (postmaster) eating all CPU Installation server-resources	13	3962	May 28, 2020
Server load issues (after rebuild hanged) Installation	4	186	May 19, 2024

Endlessly running Postgres processes & bad performance after Reinstall/restore

Related topics