Query causing 400% CPU Load

We recently been having CPU overload.
We’ve managed to pinpoint it to one query that keeps running simultaneously for hours (see below).
The task running is postmaster - once all are killed CPU is reduced until it is rebuilt every five minutes.

SELECT “users”.“id”, ARRAY_AGG(ARRAY [uccm.id, c_msg.id]) AS memberships_with_unread_messages
FROM “users”
         INNER JOIN “user_options” ON “user_options”.“user_id” = “users”.“id”
         INNER JOIN user_chat_channel_memberships uccm ON uccm.user_id = users.id
         INNER JOIN chat_channels cc ON cc.id = uccm.chat_channel_id
         INNER JOIN chat_messages c_msg ON c_msg.chat_channel_id = uccm.chat_channel_id
         LEFT OUTER JOIN chat_mentions c_mentions ON c_mentions.chat_message_id = c_msg.id
WHERE “user_options”.“chat_enabled” = TRUE
  AND “user_options”.“chat_email_frequency” = 1
  AND (users.last_seen_at < ‘2023 - 11 - 15 09:54:26.931723’)
  AND (c_msg.deleted_at IS NULL AND c_msg.user_id <> users.id)
  AND (c_msg.created_at > ‘2023 - 11 - 08 10:09:26.931848’)
  AND ((uccm.last_read_message_id IS NULL OR c_msg.id > uccm.last_read_message_id) AND
       (uccm.last_unread_mention_when_emailed_id IS NULL OR c_msg.id > uccm.last_unread_mention_when_emailed_id) AND
       (uccm.user_id = c_mentions.user_id AND uccm.following IS true AND
        cc.chatable_type = ‘Category’

Can anyone help or suggest a solution?

More info:

  • running 3.2.0.beta4-dev, but this has been happening for at least a few months
  • Here’s a list of plugins we have installed:

Thank you!

2 Likes

@andrei worked on optimizing chat queries recently, does this still happen after updating to latest?

4 Likes

I worked on another similar query and that fix was local to that query, so it probably won’t help here. Still, it’s worth updating to latest, there can be other optimizations.

2 Likes

We are on the latest indeed.

I figure it was a chat issue from the query. Anything we can tweak from the settings to fix it?

1 Like

We had a client turn on @all mentions in chat, with two channels of 16k users. That was fun!
After 19 mentions to all they quickly turned it off again. 12 hours later their (beefy) system was still at high CPU load.

The problem was resolved by clearing the 320k offending records from the mentions table

delete from chat_mentions 
where chat_message_id in (
  (select chat_message_id 
   from chat_mentions 
   group by chat_message_id having count(*) > 100)
);

but we were still puzzled why this was causing so much issues.
There were a number of instances of this query still running.

Because the rows were already gone, an explain did not give good results, but it is clear that it was doing some heavy grinding. All indexes are in place.

1 Like

I think someone mentioned that query was intensive yesterday as well:

2 Likes

Thanks - I should learn how to search this forum more effectively.

1 Like

That didn’t solve it for us.

We’ve also seen chat messages disappear for up to a few minutes before reappearing.

Generally I think there’s a lot to improve in the chat, though I totally like the direction it’s going in. My main concern is how a chat can essentially cause an overload that won’t stop. We turned off the ability to mention, and that didn’t work either.

Is there a way to stop the query I mentioned from coming back? @andrei @JammyDodger

TIA

Hey @sok777 , thanks for following up — we’ve got a few engineers looking into this, and we’ll check in with you here once we have an update for you. Appreciate your patience in the meantime!

2 Likes

One trick that works well is Admin - Customize - Watched Words and add both mentions to the “Censor” list.

so ‘all’ and ‘here’ with a prefix of @ ?

Thank you!

1 Like

Great. Thank you!

1 Like

A quick update @lindsey @RGJ

We had one message with @all which we deleted. However this has not solved the issue. The query keeps coming back every 5 minutes and is taking forever to a point that there’s a queue of many other queries.
Once we disabled the chat, the CPU load went down to 15%. Crazy

You need to go into the database and remove the mention records

psql : delete from chat_mentions where chat_message_id = X
where X is the ID of the message you removed.

A more generic query is described in my message here, adjust the number as you see fit.

1 Like

Yes we already tried this.

For reference these are our stats:

users: 239251
chat_channels: 2864
chat_drafts: 205
chat_mentions: 155527 ->7500~
chat_messages: 390453
chat_threads: 25131
chat_message_reactions: 5993
user_chat_channel_memberships: 158480

1 Like

The only reason that query can take long is because of the many entries in the chat_mentions table. Have you tried the generic query I posted? Maybe it’s not @all but another large group that is mentioned.

The problematic query scans for all mentions from chats in the past 7 days.

You should also kill all the existing instances of the query.

Yes we killed then but they keep coming back.
We checked all mentions above 100 and we didn’t have any. Not sure why else it would happen.
Sounds to me like there’s a fundamental issue with the query but I could be wrong.

1 Like

This query is likely slow even with small amount of mentions. I’ll be working on fixing it.

2 Likes

Great. Happy to continue to support from our use case!

2 Likes

@sok777 I’m going to need a bit of your assistance with this. I’d like to take a look at the query plan Postrges uses on your site. Could you please run this against your database and share the results:

EXPLAIN VERBOSE SELECT "users"."id", ARRAY_AGG(ARRAY [uccm.id, c_msg.id]) AS memberships_with_unread_messages
FROM "users"
         INNER JOIN "user_options" ON "user_options"."user_id" = "users"."id"
         INNER JOIN "group_users" ON "group_users"."user_id" = "users"."id"
         INNER JOIN "groups" ON "groups"."id" = "group_users"."group_id"
         INNER JOIN user_chat_channel_memberships uccm ON uccm.user_id = users.id
         INNER JOIN chat_channels cc ON cc.id = uccm.chat_channel_id
         INNER JOIN chat_messages c_msg ON c_msg.chat_channel_id = uccm.chat_channel_id
         LEFT OUTER JOIN chat_mentions c_mentions ON c_mentions.chat_message_id = c_msg.id
WHERE "user_options"."chat_enabled" = TRUE
  AND "user_options"."chat_email_frequency" = 1
  AND (users.last_seen_at < '2023-11-27 08:00:00.0000000')
  AND (c_msg.deleted_at IS NULL AND c_msg.user_id <> users.id)
  AND (c_msg.created_at > '2023-11-20 08:00:00.0000000')
  AND ((uccm.last_read_message_id IS NULL OR c_msg.id > uccm.last_read_message_id) AND
       (uccm.last_unread_mention_when_emailed_id IS NULL OR c_msg.id > uccm.last_unread_mention_when_emailed_id) AND
       (
               (uccm.user_id = c_mentions.user_id AND uccm.following IS true AND cc.chatable_type = 'Category') OR
               (cc.chatable_type = 'DirectMessage')
           )
    )
GROUP BY users.id, uccm.user_id
ORDER BY "users"."id" ASC
LIMIT 1000;
1 Like