Oh I see, that will struggle with this query. After updating to latest, can you please extract the query from the mini_profiler and run a explain analyze
on it and share with us?
On Meta (RDS db.r6g.large 30GB) it looks like this:
Oh I see, that will struggle with this query. After updating to latest, can you please extract the query from the mini_profiler and run a explain analyze
on it and share with us?
On Meta (RDS db.r6g.large 30GB) it looks like this:
I’ll try running explain analyze again, but we haven’t been able to get that query to successfully complete.
Update: this commit didn’t change anything. Database still roaring at >95% constantly after update and killing all old running queries. At beta4 it used to run at a 20-30% stable cadence for this forum. We already ran an auto-vacuum and schema reindex as well.
How do we disable this feature? It doesn’t seem right to get a larger database instance to resolve this, there seems to be something wrong with how this query is computed based on the fact it ran great before this update.
Query managed to finish after ~21 minutes
https://explain.depesz.com/s/gs84
Also to note we are on PG 10.17 as it seems like our outputs differ quite a bit.
We monkey-patched the issue by re-routing all */private-messages-all/*
routes to just */private-messages/*
temporarily. The result is that “all inboxes” has the same content as “Personal”, but at least we don’t need to deal with 100% CPU usage constantly this way.
Monkeypatch code:
# name: discourse-private-messages-perf-hotfix
# version: 0.0.1
# authors:
# Prepend to override existing routes
Discourse::Application.routes.prepend do
scope path: nil, constraints: { format: /(json|html|\*\/\*)/ } do
scope "/topics", username: RouteFormat.username do
# Reroute all */private-messages-all/* routes to go to */private-messages/* instead (personal messages)
# Former is expensive, latter is cheap, potentially saves significant database CPU usage
get "private-messages-all/:username" => "list#private_messages", as: "topics_private_messages_override", defaults: { format: :json }
get "private-messages-all-sent/:username" => "list#private_messages_sent", as: "topics_private_messages_sent_override", defaults: { format: :json }
get "private-messages-all-new/:username" => "list#private_messages_new", as: "topics_private_messages_new_override", defaults: { format: :json }
get "private-messages-all-unread/:username" => "list#private_messages_unread", as: "topics_private_messages_unread_override", defaults: { format: :json }
get "private-messages-all-archive/:username" => "list#private_messages_archive", as: "topics_private_messages_archive_override", defaults: { format: :json }
end
end
end
CPU usage of our forum database after deploying the monkeypatch above:
@tgxworld You’ll want to take another look at whatever the */private-messages-all/*
routes are doing, there’s clearly something wrong with how it’s implemented, it’s not efficient enough for large forum databases. Either the implementation is not using the right database index or the query generation is resulting in extremely expensive queries (particular on PSQL 10, not sure about 12/13?).
The current implementation basically crippled our forum, increasing CPU from ~15% to constant 100% and causing slow performance on all other forum features. I don’t see any reason why Personal / group inbox queries take <50ms but the “all inboxes” ones take 20+ minutes to complete.
You can use the analyze dump @forkythetoy posted just above to see the 20m+ run before.
@Falco I just noticed that you merged our topic here, but this actually seems to be about a different endpoint. This bug report is about private-message-topic-tracking-state
endpoint, but we’re talking about */private-messages-all/*
. This might be causing some of the confusion here, my apologies for that. (I initially linked to this one which might have started the mix-up)
The private-message-topic-tracking-state
is fast on our forum so this is not the issue for us.
For us this one is taking ~200-300ms of database time. A bit longer than one may expect but still within normal ranges, yeah.
We’re on Postgres 13, however.
@Hooksmith I have a fix in the pipeline
@Hooksmith @forkythetoy Will you be able to update to PG 13? That is the minimum version that Discourse requires at the moment. It is also harder for me to compare query plans when queries are not executed using the same PG version.
I had to revert this because the performance of the new query differs too much between users.
@blattersturm Are you still seeing topic tracking state perf being slow?
Unsure, haven’t done an upgrade in a few days now. Any commits to take in to see if anything improved?
Nope nothing changed but it you can provide me with an EXPLAIN ANALYZE on the query it’ll be helpful.
Note that we revert the all inboxes for now because of the performance concerns.
This topic was automatically closed after 14 days. New replies are no longer allowed.