Ok, looks like it’s running 3 copies of this query (with 3 different pids)
DELETE FROM user_actions USING user_actions ua2
WHERE ( user_actions.action_type = ua2.action_type AND
user_actions.user_id = ua2.user_id AND
user_actions.acting_user_id = ua2.acting_user_id AND
user_actions.target_post_id = ua2.target_post_id AND
user_actions.target_post_id > 0 AND
user_actions.id > ua2.id
)
Yes, you’ll need to let those queries run, because otherwise the scheduler will just keep spawning them and you’ll be playing whack-a-mole with them forever.
As for posts and topics, it has 17001 topics, with 178297 posts. They were imported from a phpBB install last year.
Anyway, it eventually got up to 16 instances of that query running, so I just killed them off. They haven’t returned in a few hours. I’ll check if it starts using CPU when that scheduled task starts again.
So it was running fine for most of yesterday after I killed the processes. This morning they’re back, with 8 of those queries running, and a load average of 8.
Is there something I can do about this? Either not have that scheduled task run, or somehow run it manually? If it’s trying to verify the integrity of the database, can I just do that manually so it doesn’t spawn all these processes when the scheduled task runs?
If the deletes are taking this long, your tables may be bloated.
EXPLAIN ANALYZE
SELECT * FROM user_actions , user_actions ua2
WHERE ( user_actions.action_type = ua2.action_type AND
user_actions.user_id = ua2.user_id AND
user_actions.acting_user_id = ua2.acting_user_id AND
user_actions.target_post_id = ua2.target_post_id AND
user_actions.target_post_id > 0 AND
user_actions.id > ua2.id
)
The query above may show you where time is being spent the most:
"Nested Loop (cost=0.84..52037.91 rows=174200175 width=96) (actual time=8612.056..8612.056 rows=0 loops=1)"
" -> Index Scan using index_user_actions_on_target_post_id on user_actions (cost=0.42..3752.75 rows=68381 width=48) (actual time=6.663..38.637 rows=74415 loops=1)"
" Index Cond: (target_post_id > 0)"
" -> Index Scan using index_user_actions_on_user_id_and_action_type on user_actions ua2 (cost=0.42..0.70 rows=1 width=48) (actual time=0.115..0.115 rows=0 loops=74415)"
" Index Cond: ((user_id = user_actions.user_id) AND (action_type = user_actions.action_type))"
" Filter: ((user_actions.id > id) AND (user_actions.acting_user_id = acting_user_id) AND (user_actions.target_post_id = target_post_id))"
" Rows Removed by Filter: 224"
"Total runtime: 8612.127 ms"
The above, ran on a 130k posts Discourse in 8 seconds.
I killed the (16!) running processes and then ran that query, and it then manually triggered the Jobs::EnsureDbConsistency job from the sidekiq web interface. This time it took 5 seconds to run.
Fingers crossed.
Thanks everyone, for your assistance. Your time and attention is definitely appreciated!
So I’m getting the exact same problem again, 2 years later
It’s still getting stuck on the EnsureDbConsistency task, and spawning new ones until the server load is very high.
All of the connections are the same as before,
DELETE FROM user_actions USING user_actions ua2
WHERE (user_actions.action_type = ua2.action_type AND
user_actions.user_id = ua2.user_id AND user_actions.acting_user_id = ua2.acting_user_id AND
user_actions.target_post_id > 0 AND user_actions.id > ua2.id
)
I’ve tried the previous solution, the vacuum command… except now it tells me this:
postgres=# VACUUM VERBOSE ANALYZE user_actions;
ERROR: relation "user_actions" does not exist
Any help would be appreciated. Is there a command I can run on the Postgres command line that will clean up the whole database?
This keeps spawning new jobs but they never seem to finish.