Our site keeps full chat history, which has been accumulating for a long time now. Recently I had an issue where users were complaining about how long the chat button takes to load or if it even shows up at all! The behaviour also seemed inconsistent between users.
I think I found the issue with the help of Claude Code and it may affect all sites. Below is a report compiled of the issue. It might have implications for load on the entire site since the problem query is triggered so often.
The tldr; is that it seems replies in chat create tracking events that begin to bloat over time (even when threading is off). The query that checks for updates to those tracked messages is slow or gets timed out for many users.
Below is the AI generated report with slight edits from me
The problem
Active users were reporting that chat would spin forever and never load. In the admin logs we saw Pitchfork worker timeouts being dumped with backtraces like:
Pitchfork worker is about to timeout, dumping backtrace for main thread
...
lib/mini_sql/postgres/connection.rb:... MiniSql::Postgres::Connection#query
plugins/chat/app/queries/chat/thread_unreads_query.rb:132 Chat::ThreadUnreadsQuery.call
plugins/chat/app/queries/chat/tracking_state_report_query.rb:71 Chat::TrackingStateReportQuery.call
plugins/chat/app/services/chat/list_user_channels...
The web worker was being killed because a single SQL query (ThreadUnreadsQuery) did not return within the worker timeout. The same backtrace repeated hundreds of times in a day.
On the client side, the failure surfaces as a 500 on /chat/api/me/channels:
/chat/api/me/channels Failed to load resource: the server responded with a status of 500
That’s the browser-side face of the same Pitchfork timeout — the endpoint that triggers ThreadUnreadsQuery never finishes, the worker is recycled, and the UI never gets the channel list it needs to render chat.
The identified cause, in plain language
Every time a user clicks “Reply” on a specific chat message, Discourse creates (or reuses) a thread to hold that reply — even when the channel has threading disabled. In group DMs, it also auto-adds every DM participant to the new thread’s membership. Over time, every active user accumulates thousands of membership rows in user_chat_thread_memberships without ever intentionally choosing to track anything.
When any of those users opens chat, the core tracking query iterates across all of their tracked threads and runs three correlated subqueries per thread. A user with a few thousand memberships pushes the query past what Postgres + Pitchfork will wait for, and the worker is killed before chat finishes loading.
There is no cleanup mechanism: memberships never get pruned, and nothing in the current plugin code ever sets notification_level = muted/normal on the auto-created rows.
Data
From our production DB at the point of investigation:
- Stale tracking-level memberships (
notification_level = 2) on threads idle 180+ days: 29,309 - Same rule at 15 days: 41,139
- Users with >1,500 thread memberships: 11
- Top individual user: 3,738 memberships — above the
MAX_THREADS = 3000cap inThreadUnreadsQuery - Among those 11 heavy users, effectively 100% of memberships were
notification_level = 2(auto-tracking) — none were user-chosen “watching” (level 3)
Tracking memberships for top users
Which channels were affected
Two distinct paths both produced the same outcome:
-
Group DMs (
chatable_type = DirectMessage,threading_enabled = false)
On every reply in a group DM, a thread is created and all DM participants are auto-added. This is explicit increate_message.rb(see code below). -
Category/public channels with
threading_enabled = false
We found thousands of threads on a category channel whosethreading_enabledflag is currentlyfalseand has been for over a year — yet threads were still being created today (fresh thread IDs within minutes of running the query). Thefetch_threadpath increate_message.rbsilently builds a thread on any reply without checking the channel’s threading setting.
So “turn threading off” is not effective: the setting hides the dedicated thread UI but does not stop threads from being created under the hood when users use “Reply to message.”
Relevant code
Reply path that silently creates threads — plugins/chat/app/services/chat/create_message.rb:131
def fetch_thread(params:, reply:, channel:, options:)
return Chat::Thread.find_by(id: params.thread_id) if params.thread_id.present?
reply.thread ||
reply.build_thread(
...
force: options.force_thread,
...
)
end
No check against channel.threading_enabled?. In non-DM category channels, a thread is built regardless.
DM auto-enrollment of all participants — same file, ~line 203
if channel.direct_message_channel? && !channel.threading_enabled
# Add all DM participants to threads so they have memberships
# for unread tracking and mark-as-read functionality
channel.chatable.users.each { |user| thread.add(user) }
thread.membership_for(guardian.user).update!(last_read_message: message_instance)
Every reply in a group DM expands membership by the size of the DM, and those rows are never pruned.
Bottleneck query — plugins/chat/app/queries/chat/thread_unreads_query.rb:14-54
class ThreadUnreadsQuery
MAX_THREADS = 3000
...
# three correlated subqueries per row of user_chat_thread_memberships:
# SELECT (SELECT COUNT(*) ... unread) AS unread_count,
# (SELECT COUNT(*) ... mentions) AS mention_count,
# (SELECT COUNT(*) ... watched) AS watched_threads_unread_count,
# chat_threads.channel_id, memberships.thread_id
# FROM user_chat_thread_memberships AS memberships
# ...
# WHERE memberships.user_id = :user_id
# ...
# LIMIT :limit
For N memberships, this produces 3 Ă— N correlated subquery executions, each joining 5+ tables. Once a user exceeds a few thousand memberships, the query reliably blows past the Pitchfork timeout. MAX_THREADS = 3000 is already an acknowledgement of the scaling problem, but users can and do exceed it.
Indexes on user_chat_thread_memberships (user_id, thread_id) UNIQUE and (thread_id, user_id) are present, so this is not a missing-index issue.
Temporary solution
As an immediate mitigation we deleted tracking-level memberships where the underlying thread had no activity in the last 15 days:
DELETE FROM user_chat_thread_memberships uctm
USING chat_threads t
WHERE t.id = uctm.thread_id
AND uctm.notification_level = 2
AND NOT EXISTS (
SELECT 1 FROM chat_messages m
WHERE m.thread_id = t.id
AND m.created_at > NOW() - INTERVAL '15 days'
)
After pruning, that top user dropped from 3,738 → 77 memberships and chat loaded instantly.
This is safe: none of those rows correspond to a user’s explicit choice (they’re all auto-tracking, level 2), and if a user interacts with a pruned thread again, a fresh membership is auto-created on the next reply/read.
Heavy users dropped from 1,500–3,700 memberships to under 150, and chat loads normally again. We plan to schedule this as a periodic job until the underlying behaviour changes upstream.
Suggested directions for a fix
- Cap or prune auto-tracking memberships at a sensible horizon (e.g. drop
notification_level = 2rows for threads idle > N days via a scheduled job). - Consider whether the “auto-add every DM participant to every thread” behaviour in group DMs can use a cheaper bookkeeping mechanism than a full
user_chat_thread_membershipsrow per participant per thread.

