Chat Pitchfork timeouts: replies silently create threads and auto-tracking bloats over time

Our site keeps full chat history, which has been accumulating for a long time now. Recently I had an issue where users were complaining about how long the chat button takes to load or if it even shows up at all! The behaviour also seemed inconsistent between users.

I think I found the issue with the help of Claude Code and it may affect all sites. Below is a report compiled of the issue. It might have implications for load on the entire site since the problem query is triggered so often.

The tldr; is that it seems replies in chat create tracking events that begin to bloat over time (even when threading is off). The query that checks for updates to those tracked messages is slow or gets timed out for many users.

Below is the AI generated report with slight edits from me


The problem

Active users were reporting that chat would spin forever and never load. In the admin logs we saw Pitchfork worker timeouts being dumped with backtraces like:

Pitchfork worker is about to timeout, dumping backtrace for main thread
...
lib/mini_sql/postgres/connection.rb:... MiniSql::Postgres::Connection#query
plugins/chat/app/queries/chat/thread_unreads_query.rb:132  Chat::ThreadUnreadsQuery.call
plugins/chat/app/queries/chat/tracking_state_report_query.rb:71 Chat::TrackingStateReportQuery.call
plugins/chat/app/services/chat/list_user_channels...

The web worker was being killed because a single SQL query (ThreadUnreadsQuery) did not return within the worker timeout. The same backtrace repeated hundreds of times in a day.

On the client side, the failure surfaces as a 500 on /chat/api/me/channels:

/chat/api/me/channels  Failed to load resource: the server responded with a status of 500

That’s the browser-side face of the same Pitchfork timeout — the endpoint that triggers ThreadUnreadsQuery never finishes, the worker is recycled, and the UI never gets the channel list it needs to render chat.


The identified cause, in plain language

Every time a user clicks “Reply” on a specific chat message, Discourse creates (or reuses) a thread to hold that reply — even when the channel has threading disabled. In group DMs, it also auto-adds every DM participant to the new thread’s membership. Over time, every active user accumulates thousands of membership rows in user_chat_thread_memberships without ever intentionally choosing to track anything.

When any of those users opens chat, the core tracking query iterates across all of their tracked threads and runs three correlated subqueries per thread. A user with a few thousand memberships pushes the query past what Postgres + Pitchfork will wait for, and the worker is killed before chat finishes loading.

There is no cleanup mechanism: memberships never get pruned, and nothing in the current plugin code ever sets notification_level = muted/normal on the auto-created rows.


Data

From our production DB at the point of investigation:

  • Stale tracking-level memberships (notification_level = 2) on threads idle 180+ days: 29,309
  • Same rule at 15 days: 41,139
  • Users with >1,500 thread memberships: 11
  • Top individual user: 3,738 memberships — above the MAX_THREADS = 3000 cap in ThreadUnreadsQuery
  • Among those 11 heavy users, effectively 100% of memberships were notification_level = 2 (auto-tracking) — none were user-chosen “watching” (level 3)

Tracking memberships for top users


Which channels were affected

Two distinct paths both produced the same outcome:

  1. Group DMs (chatable_type = DirectMessage, threading_enabled = false)
    On every reply in a group DM, a thread is created and all DM participants are auto-added. This is explicit in create_message.rb (see code below).

  2. Category/public channels with threading_enabled = false
    We found thousands of threads on a category channel whose threading_enabled flag is currently false and has been for over a year — yet threads were still being created today (fresh thread IDs within minutes of running the query). The fetch_thread path in create_message.rb silently builds a thread on any reply without checking the channel’s threading setting.

So “turn threading off” is not effective: the setting hides the dedicated thread UI but does not stop threads from being created under the hood when users use “Reply to message.”


Relevant code

Reply path that silently creates threads — plugins/chat/app/services/chat/create_message.rb:131

def fetch_thread(params:, reply:, channel:, options:)
  return Chat::Thread.find_by(id: params.thread_id) if params.thread_id.present?

  reply.thread ||
    reply.build_thread(
      ...
      force: options.force_thread,
      ...
    )
end

No check against channel.threading_enabled?. In non-DM category channels, a thread is built regardless.

DM auto-enrollment of all participants — same file, ~line 203

if channel.direct_message_channel? && !channel.threading_enabled
  # Add all DM participants to threads so they have memberships
  # for unread tracking and mark-as-read functionality
  channel.chatable.users.each { |user| thread.add(user) }
  thread.membership_for(guardian.user).update!(last_read_message: message_instance)

Every reply in a group DM expands membership by the size of the DM, and those rows are never pruned.

Bottleneck query — plugins/chat/app/queries/chat/thread_unreads_query.rb:14-54

class ThreadUnreadsQuery
  MAX_THREADS = 3000
  ...
  # three correlated subqueries per row of user_chat_thread_memberships:
  #   SELECT (SELECT COUNT(*) ... unread) AS unread_count,
  #          (SELECT COUNT(*) ... mentions) AS mention_count,
  #          (SELECT COUNT(*) ... watched)  AS watched_threads_unread_count,
  #          chat_threads.channel_id, memberships.thread_id
  #   FROM user_chat_thread_memberships AS memberships
  #   ...
  #   WHERE memberships.user_id = :user_id
  #   ...
  #   LIMIT :limit

For N memberships, this produces 3 Ă— N correlated subquery executions, each joining 5+ tables. Once a user exceeds a few thousand memberships, the query reliably blows past the Pitchfork timeout. MAX_THREADS = 3000 is already an acknowledgement of the scaling problem, but users can and do exceed it.

Indexes on user_chat_thread_memberships (user_id, thread_id) UNIQUE and (thread_id, user_id) are present, so this is not a missing-index issue.


Temporary solution

As an immediate mitigation we deleted tracking-level memberships where the underlying thread had no activity in the last 15 days:

DELETE FROM user_chat_thread_memberships uctm
USING chat_threads t
WHERE t.id = uctm.thread_id
  AND uctm.notification_level = 2
  AND NOT EXISTS (
    SELECT 1 FROM chat_messages m
    WHERE m.thread_id = t.id
      AND m.created_at > NOW() - INTERVAL '15 days'
  )

After pruning, that top user dropped from 3,738 → 77 memberships and chat loaded instantly.

This is safe: none of those rows correspond to a user’s explicit choice (they’re all auto-tracking, level 2), and if a user interacts with a pruned thread again, a fresh membership is auto-created on the next reply/read.

Heavy users dropped from 1,500–3,700 memberships to under 150, and chat loads normally again. We plan to schedule this as a periodic job until the underlying behaviour changes upstream.


Suggested directions for a fix

  1. Cap or prune auto-tracking memberships at a sensible horizon (e.g. drop notification_level = 2 rows for threads idle > N days via a scheduled job).
  2. Consider whether the “auto-add every DM participant to every thread” behaviour in group DMs can use a cheaper bookkeeping mechanism than a full user_chat_thread_memberships row per participant per thread.
1 Like