Postgres using 100% of CPU

MiG · June 5, 2016, 12:19am

Hi there, I’m seeing an issue on my docker install. It’s on Linode on Ubuntu 14.04.

Postgres is using 100% of the CPU. The forum itself is working just fine, version v1.6.0.beta7 +156. Doesn’t seem any slower or anything.

Here’s a screen capture from “htop”:

Any pointers on how I can diagnose this and remedy it?

I’ve tried rebuilding with ./launcher rebuild and even cleaning up with ./launcher clean. I’ve tried rebooting the server as well.

I have plenty of disk space left (10GB free), and Discourse is the only thing running on this server.

Thanks in advance for any pointers.

DeanMarkTaylor · June 5, 2016, 12:25am

Are there any jobs running?

Try opening the following URL for your site:
/sidekiq

MiG · June 5, 2016, 12:28am

Doesn’t look like anything else is running.

MiG · June 5, 2016, 12:34am

Ok, there is one job in the “scheduler” tab that is running.

Jobs::EnsureDbConsistency 41 minutes ago RUNNING -1ms _scheduler_localhost-app:29119:1:29c83b7cc82551f6a2b801e82a1acb38

Should I just wait that out? Seems like it’s checking the database?

DeanMarkTaylor · June 5, 2016, 12:38am

To see exactly what queries are being executed you can run something like the following from the console:

cd /var/discourse/
./launcher enter app
sudo -u postgres psql -c "select * from pg_stat_activity WHERE state='active'";

Try running this a couple of times with a 3/4 minute wait in-between, capturing the output each time.

MiG · June 5, 2016, 12:54am

Ok, looks like it’s running 3 copies of this query (with 3 different pids)

DELETE FROM user_actions USING user_actions ua2 
WHERE (  user_actions.action_type = ua2.action_type AND
  user_actions.user_id = ua2.user_id AND
  user_actions.acting_user_id = ua2.acting_user_id AND
  user_actions.target_post_id = ua2.target_post_id AND
  user_actions.target_post_id > 0 AND 
  user_actions.id > ua2.id 
  )

Full copy/paste here: http://pastebin.com/pTTThKir

I’ll let it run for a while and see what happens. Maybe I’m just being impatient

mpalmer · June 5, 2016, 2:55am

Yes, you’ll need to let those queries run, because otherwise the scheduler will just keep spawning them and you’ll be playing whack-a-mole with them forever.

MiG · June 5, 2016, 3:08am

Do they start every x minutes or something like that? I currently have 5 of them running … Up from 3 a few hours ago.

MiG · June 5, 2016, 1:21pm

Well, this morning we’re up to a load average of 12

But it does appear to have moved on from the “DELETE” queries, and is running others. I assume this is some sort of database check.

The forum is still responsive with this load, so I’ll let it proceed for a while. Seems a bit crazy though.

sam · June 5, 2016, 9:29pm

Are you running on our latest docker image? How many posts does your forum have?

MiG · June 5, 2016, 11:53pm

Yes, I updated the docker image yesterday, which is when this started happening.

I did

git pull
./launcher cleanup
./launcher rebuild app
./launcher cleanup

As for posts and topics, it has 17001 topics, with 178297 posts. They were imported from a phpBB install last year.

Anyway, it eventually got up to 16 instances of that query running, so I just killed them off. They haven’t returned in a few hours. I’ll check if it starts using CPU when that scheduled task starts again.

MiG · June 6, 2016, 2:24pm

So it was running fine for most of yesterday after I killed the processes. This morning they’re back, with 8 of those queries running, and a load average of 8.

Is there something I can do about this? Either not have that scheduled task run, or somehow run it manually? If it’s trying to verify the integrity of the database, can I just do that manually so it doesn’t spawn all these processes when the scheduled task runs?

Falco · June 6, 2016, 3:17pm

If the deletes are taking this long, your tables may be bloated.

EXPLAIN ANALYZE
SELECT * FROM user_actions , user_actions ua2 
WHERE (  user_actions.action_type = ua2.action_type AND
  user_actions.user_id = ua2.user_id AND
  user_actions.acting_user_id = ua2.acting_user_id AND
  user_actions.target_post_id = ua2.target_post_id AND
  user_actions.target_post_id > 0 AND 
  user_actions.id > ua2.id 
  )

The query above may show you where time is being spent the most:

"Nested Loop  (cost=0.84..52037.91 rows=174200175 width=96) (actual time=8612.056..8612.056 rows=0 loops=1)"
"  ->  Index Scan using index_user_actions_on_target_post_id on user_actions  (cost=0.42..3752.75 rows=68381 width=48) (actual time=6.663..38.637 rows=74415 loops=1)"
"        Index Cond: (target_post_id > 0)"
"  ->  Index Scan using index_user_actions_on_user_id_and_action_type on user_actions ua2  (cost=0.42..0.70 rows=1 width=48) (actual time=0.115..0.115 rows=0 loops=74415)"
"        Index Cond: ((user_id = user_actions.user_id) AND (action_type = user_actions.action_type))"
"        Filter: ((user_actions.id > id) AND (user_actions.acting_user_id = acting_user_id) AND (user_actions.target_post_id = target_post_id))"
"        Rows Removed by Filter: 224"
"Total runtime: 8612.127 ms"

The above, ran on a 130k posts Discourse in 8 seconds.

Maybe running this can help:

VACUUM VERBOSE ANALYZE user_actions;```

MiG · June 6, 2016, 4:43pm

That seems to have done the trick.

I killed the (16!) running processes and then ran that query, and it then manually triggered the Jobs::EnsureDbConsistency job from the sidekiq web interface. This time it took 5 seconds to run.

Fingers crossed.

Thanks everyone, for your assistance. Your time and attention is definitely appreciated!

semyoshin · June 18, 2018, 9:00am

Hi to everyone,

I have the same problem, and the trick with VACUUM did’t help.

I have a lot of connections running “DELETE FROM user_actions…” and it seems that this SQL query never ends.

The table is rather big, over 9 millions,

Any help or advice is highly appreciated,

MiG · August 25, 2018, 2:11am

So I’m getting the exact same problem again, 2 years later

It’s still getting stuck on the EnsureDbConsistency task, and spawning new ones until the server load is very high.

All of the connections are the same as before,

DELETE FROM user_actions USING user_actions ua2    
WHERE (user_actions.action_type = ua2.action_type AND
 user_actions.user_id = ua2.user_id AND user_actions.acting_user_id = ua2.acting_user_id AND 
 user_actions.target_post_id > 0 AND user_actions.id > ua2.id
 )

I’ve tried the previous solution, the vacuum command… except now it tells me this:

postgres=# VACUUM VERBOSE ANALYZE user_actions;
ERROR:  relation "user_actions" does not exist

Any help would be appreciated. Is there a command I can run on the Postgres command line that will clean up the whole database?

This keeps spawning new jobs but they never seem to finish.

MiG · August 25, 2018, 5:34pm

Replying to my own post, because I realized my mistake

VACUUM VERBOSE ANALYZE only works if I select the discourse database, of course

system · May 10, 2021, 5:37am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Postgres has 100% CPU for large databases, Discourse 2.7.7 Installation server-resources	4	710	August 30, 2021
Some queries that don't finish Support	18	904	June 18, 2022
Primary Postgres database process (postmaster) eating all CPU Installation server-resources	13	4012	May 28, 2020
Slow Sidekiq + Postmaster using 95%+ CPU (32 cores) after Postgresql Version Upgrade Installation server-resources	23	2918	June 28, 2020
PostgreSQL runaway IO Installation	37	2980	March 17, 2021

Postgres using 100% of CPU

Related topics