Performance improvements on long topics?

(Jeff Atwood) #8

Why are you scrolling without reading?

  • If you need to jump to the bottom, click the bottom date on the timeline at the right… or press end on your keyboard.

  • If you need to enter the topic at the bottom, click the last post date in the topic list.


I noticed a couple of months ago that our major off-topic/spam thread was becoming slower than other threads. Yesterday a user finally complained about it. At 188.7k replies it consistently takes 4 seconds or longer to perform any navigation in the thread, while other threads take less than a second.

We’re used to shutting these threads down and creating news ones, as this was a problem on vBulletin as well. I’m not a forum architect so I’m not here to say it can (or should) be fixed, and I don’t know how many other forums have topics with so many replies, but I figured I’d put the information here. This is obviously nonessential, but if you ever achieve all of your hopes for Discourse and want one last thing to work on, I guess there’s this. :joy:

(Jeff Atwood) #10

By default we auto-close topics that reach 10k replies. So did you bypass that default?


Ah, we did. I guess we didn’t consider that there might be consequences. We have three other threads that are longer than 10k posts at the moment, so I guess it’d be good to keep an eye on them.

When auto-closing, is there any sort of thread re-creation that ties the new thread back to the old one? Just curious, as we might re-enable that limit if so. Otherwise we’ll just handle long threads ourselves since they don’t crop up too often.

(Jeff Atwood) #12

Links between topics always connect them, this has been true of Discourse since beta. So just add a link to a post pointing to the new topic.

(ljpp) #13

Whats the performance impact of long threads?

I have one that has escalated to 4.5K very rapidly and will keep escalating for at least 36 hours. Expecting thousands of messages, server is screaming.

Will it help anything to close the thread and start a new one, a “part 2” to continue the discussion? Ping @codinghorror, @sam

Topic Ratings Plugin
(Sam Saffron) #14

I would recommend keeping topics below the 10k count for now. Worth closing and doing a part2 when it reaches that point.

(ljpp) #15

Yeah, I am guessing it will be below 10K and cools down tomorrow midnight (trade deadline). Just wondering if it gives me any benefit to cut it as we have our annual super peak.

(Jeff Atwood) #16

We keep deferring this work, the bandage to stop the bleeding was a new default that auto-closes topics at 10k replies… but the underlying issue is still present.

(Mittineague) #17

Only ignorant brainstorming. If the bottleneck is a large number of posts belonging to a topic, would there be a way to insert a middle layer something like
topic has many blocks
block has many posts
similar to changing a large array into a multidimensional array but for database queries.

(Alan Tan) #18

I stopped this bleeding in

While I was looking through the client side code, I noticed that the client only cares about the posts in the response payload and doesn’t do anything with the stream (all the post ids).

(Jeff Atwood) #19

Maybe only the timeline does, and @sam already optimized that yesterday (by removing date from the timeline), which should also be covered in this topic I think.

Threads with >10k Posts have the date missing on the scrollbar
(Alan Tan) #20

From what I’m seeing the client do with the response I’m 99% sure that we don’t need to send down all the post ids when scrolling.

Maybe @eviltrout can review?

Yup we removed the date in

However, it didn’t actually affect the performance of the query much when we tried it out. The root cause is due to the fact that we’re over selecting the ids here.

Instead of having the client send the post ids that it wants to the server while scrolling, I plan to have the client send the current position and tell the server whether it is looking up/down the stream. I’ve got it working locally but I’ll need to sort out all the other edge cases.

(Jeff Atwood) #21

:+1: I’m fine with a few days spent on this because we have deferred the work for two years now…

(Robin Ward) #22

It’s tricky though, because what about filters? For example best of mode, or when restricting to one particular poster, or when moderators have “gaps” that they can expand to see deleted posts.

There are a lot of edge cases. I’m all for optimizing it, but be prepared for a LOT of regressions and testing if you take this path.

(Alan Tan) #23

I’m not sure about the challenges with filters at the moment but my plan is to move most if not all of the calculations that we’re doing client side into the server side. The logic for scrolling up and down would simply be given the current position which is determined by the post_number ask the server for the next segment of posts/gaps. The server has knowledge of what filters are applied and can query for the segments accordingly. The client would just render what is given by the server. I’m still in the experimental stages but that is how I think the architecture should be.

(Jeff Atwood) #24

Did you want to summarize here tomorrow?

(Alan Tan) #27

There was another spot that has been fixed and should resolve the following problem:

What is the deal with performance as the size of a topic grows?

The first load performance of a topic degrades as a topic gets larger and is due to the fact that we have to send down every single post id when we load a topic. This would naturally raise a red flag for anyone reading this for the first time but after reading through the code base, I’ve concluded that this is a trade off that we’re making in order to be able to have clean and accurate implementation for the topic timeline.


If we look at the two numbers on our topic timeline and translate them into their respective DB queries, we would get the following:

SELECT posts.*
  SELECT posts.*, ROW_NUMBER() OVER () AS row_number 
  FROM posts 
  WHERE posts.deleted_at IS NULL 
  AND posts.topic_id = 1 
  AND posts.post_type in (1, 2, ,3, 4)
) AS posts
FROM posts 
WHERE posts.deleted_at IS NULL 
AND posts.topic_id = 1 AND posts.post_type in (1, 2, ,3, 4)

For the first query, we basically need the DB to fetch all the posts for the given filters before we can figure out the “index” of each post in the stream.

The second query is a count which naturally becomes expensive as the topic grows. A counter cache would not work here because the count changes based on what filters are applied to the topic.

In addition, having the entire stream of post ids present on the client side makes the following features straight forward to implement:

  1. Jumping to a certain index in the stream,
  2. Selecting all posts below a certain index
  3. Fetching excepts while scrolling through a stream.

Attempts were made to re-implement (1) and (3) above to work without the stream of post ids but that either made the feature inaccurate or complicated the client side code so much that I felt it wasn’t worth the trade off

Stopping the bleeding on MEGATOPICs (> 10_000 posts)

Megatopics are expensive to load for three main reasons:

  1. To calculate all the gaps within a topic, a query is run to fetch all the for posts regardless of whether the posts have been deleted or not.
  2. To generate the stream of all post ids, a query is run to fetch all the for posts given a set of filters.
  3. First load time suffers because the client has to download the stream of post ids which has a length that is greater than 10_000.

Our plan to stop the bleeding here is to drop/approximate certain features on Megatopics:

  1. The ability to display the closest date for a given index is dropped on megatopics

  2. Gaps are not supported on megatopics

  3. Loading excerpts while scrolling through the timeline is not supported


  4. The numbers on the timeline becomes an approximation. Instead of the index of the post in the set of possible results, we use Post#post_number of the post. Instead of a count of all the possible results, we use the Topic#highest_post_number.

  5. Jumping to an index on the timeline becomes jumping to the closest post number on the timeline

This may seem like taking a step backwards but do note that MEGATOPICs are quite rate in the wild. We’ve mainly seen them appear on sites with imported content since normal sites would hit the SiteSetting#auto_close_topics_post_count guard that is in place. For more information on why we think Megatopics are bad, you can read @codinghorror’s analysis about it.

(Mittineague) #28

Might there be a way to take advantage of the difference between count(*) and count(field_identier) ?

eg. count(*) counts all rows, whereas count(field_intentier) counts only the fields that are not null.

My thinking is that if each row had something like an is_countable field that was null for rows not wanted to be counted it could simplify the query and hopefully improve performance. The cost being that the field would need to be updated which might offset any benefit it might provide.

(Jeff Atwood) #29

Just confirming one thing. With these megatopic-specific shortcuts in place, performance on megatopics improved by two thirds, correct? So what used to take 1000ms will now take ~333ms with these changes?