Improve performance of Topic load - ORDER BY RANDOM

DeanMarkTaylor · February 24, 2015, 4:48pm

Continuing the discussion from Slow sql queries:

##The Problem
Clicking between topics is slow - for a few reasons - one of which is the generation of suggested topics.

For me I currently consider this a bug because suggested topics currently adds a ~640ms additional query delay to the initial display of a topic which is not critical to that topics display.

##What’s the impact?
For me with 40K topics and 430K posts on a 2GB Digital Ocean instance I’m seeing almost 500ms added to each topic load just from the “random” query for suggested topics.

##Problem Detail
Currently the function TopicQuery::list_suggested_for generates a list of random topics to augment the unread and new topics listed at the bottom of the each topic page.

The random element of this uses ORDER BY RANDOM in the SQL query, this is an easy but slow way of getting a random list of rows from the database.

Here is the function:

github.com

discourse/discourse/blob/738f2cb72c295c63c3581ae69d990344d9b947a8/lib/topic_query.rb#L55-L63


      
          def list_suggested_for(topic)
            builder = SuggestedTopicsBuilder.new(topic)
          
            # When logged in we start with different results
            if @user
              builder.add_results(unread_results(topic: topic, per_page: builder.results_left), :high)
              builder.add_results(new_results(topic: topic, per_page: builder.category_results_left)) unless builder.full?
            end
            builder.add_results(random_suggested(topic, builder.results_left, builder.excluded_topic_ids)) unless builder.full?

##The Suggestions Queries
As an example here are the numbers for a slightly faster than typical load - perhaps due to being cached.

###New Topic Suggestions
So new topic suggestions query is sorted as follows:

ORDER BY CASE WHEN topics.category_id = 79 THEN 0 ELSE 1 END, topics.bumped_at DESC LIMIT 5

This “new” query takes 120.1 ms

:straight_ruler: Screen capture of "new" query times (click to expand)

###Random Suggestions

ORDER BY CASE WHEN topics.category_id = 79 THEN 0 ELSE 1 END, RANDOM() LIMIT 5

This “random” query takes 479.3 ms - this was a quite fast example.

:straight_ruler: Screen capture of "random" query times (click to expand)

##Solution / Alternative to ORDER BY RANDOM
There are several alternatives to ORDER BY RANDOM, one of which is this:

ORDER BY RAND() - ~jk

Linked from here:

codinghorror · February 24, 2015, 4:59pm

I believe @sam talked about adding some intermediate cache tables here. It is more difficult for us because we are doing random within a category, not random across all topics.

riking · February 24, 2015, 10:11pm

The proposed solution is something like this:

have a Redis key named after the category, for example suggested:support/wordpress (category.url)
if we have N suggested topic slots to fill, try to RPOP n times
if a RPOP returns nil (the list ran out), refill it:
- Query for a thousand topics
```
 SELECT id FROM TOPICS WHERE category_id = ? ORDER BY RANDOM() LIMIT 1000
```
- LPUSH them all into the list 100 at a time (LPUSH suggested:support/wordpress 123 54 87 35 78 15 64 79 ...)
- Expire the key in 2 days or so
Continue RPOPing until either we have N unique topic ids or it returns nil again.
If RPOP returned nil again, and the category was specified, repeat the fill attempt, but use suggested:@all instead of a category-specific one
Pass our topic ID list to Topic.find and then on to the serializer

This is also known as a “tetris random”.

sam · February 24, 2015, 10:13pm

That is actually quite efficient and simple, I like it.

sam · February 25, 2015, 6:22am

This is now implemented per @riking’s suggestion

https://github.com/discourse/discourse/commit/b760d22460ef5eef4c81b4ba4fbf1a8cdd384558

we backfill up to 3000 random items in the list and trigger a background refresh if the list is shorter than 500.

this technique works very efficiently for categories with a large number of topics, if categories are short then we round trip some extra data to redis, but its still more efficient cause we stabalize on issuing the backfill in the background.

DeanMarkTaylor · February 25, 2015, 2:24pm

Thanks so much @sam - this update is a great improvement!

Here is a little more detail about what I get now.

###Initial Random Backfill

Executed in ~64.5ms (click to expand)

###Random Selection

Executed in ~6.2ms (click to expand)

###Total Topic Load Time
The total topic load time is now far lower at ~390ms to ~450ms

###In-category ORDER BY topics.bumped_at
This is now the slowest query in a topic load - loading somewhere between ~112ms and ~140ms.

Executed in ~112ms (click to expand)

###Question
Is there any chance of the topics.bumped_at query being cached?

codinghorror · February 26, 2015, 12:11am

Remind me again what the topic / post / user count is for your Discourse dataset? Visiting /about should be fine.

DeanMarkTaylor · February 26, 2015, 12:12am

codinghorror · February 26, 2015, 12:17am

Hmm, surprising, does not seem that much higher than other sites we have, though all Discourse sites (that are not imports) cannot be too large as they have not had sufficient years to grow.

I know @sam was testing with a dataset that had 1 million topics (but not so many posts, proportionally) just to experience maximum pain. Surprised it is an issue with 40k…

DeanMarkTaylor · February 26, 2015, 12:19am

Does the count on the about page include PM’s?

There are also 101K of PM’s - but that’s posts right?

cpradio · February 26, 2015, 12:19am

That’s a lot of PMs…

codinghorror · February 26, 2015, 12:20am

Why are there 101,000 PMs? Yes that means your topic count is effectively 140k. Still nowhere near a million.

DeanMarkTaylor · February 26, 2015, 12:22am

Users like their PM’s - over 11 years of users being active.

sam · February 26, 2015, 12:27am

stuff was way worse with a million, on my desktop it was 800ms for that query which is about 2 seconds on digital ocean

DeanMarkTaylor · February 28, 2015, 1:36am

Out of interest in improving the topics.bumped_at list_suggested_for query I tested adding an index.

###Without Index

Executed in ~112ms (click to expand)

Executed in ~141.3ms (click to expand)

###With Index

Executed in ~13.2ms (click to expand)

Executed in ~18.7ms (click to expand)

##index tested

CREATE INDEX idx_topic_load_suggested_for
    ON topics
    USING btree
    (deleted_at, visible, archetype COLLATE pg_catalog."default", created_at);

###Tested table / index sizes
topics row count: 66,373
topics table size: 29MB
idx_topic_load_suggested_for index size: 3,488KB

###Thoughts and questions
To me it seems far better to scan only 3.5MB of data instead of 29MB per topic page load considering how frequently it occurs.

This was just a quick pass at what I thought was a matching index for the query - I’m sure someone with more knowledge of the details could do better.

Is this an index the Discourse Team can get behind and perhaps include?
Or is there a more cached solution that’s needed?
Perhaps just a good stop-gap?

sam · February 28, 2015, 1:54am

Totally open to improving indexing, but in this case I think a filtered index ׳where deleted at is null and visible׳ should work better

DeanMarkTaylor · February 28, 2015, 2:06am

I assume you mean this?

CREATE INDEX idx_topic_load_not_deleted_and_visible
    ON topics
    USING btree
    (deleted_at, visible)
    WHERE (deleted_at IS NULL AND visible)

I tested this and the index doesn’t get used, it still does a Seq Scan of the table.

edit
I tried the original index with the addition of the WHERE (deleted_at IS NULL AND visible) constraint, it had no performance or index size impact as far as I could tell. But it did prevent the table scan.

riking · February 28, 2015, 2:48am

I think sam meant this:

CREATE INDEX idx_topic_load_suggested_for
    ON topics
    USING btree
    (archetype COLLATE pg_catalog."default", created_at)
    WHERE (deleted_at IS NULL AND visible);

sam · February 28, 2015, 3:01am

By default indexes in postgres are using btree so we don’t need that, also collation should be good so we don’t need that… What I want you to test is:

create index idx_topics_suggested_for
on topics (visible, created_at) 
where archetype <> 'private_message' and deleted_at is null

This is a bit odd cause we are ordering on bumped_at so I would assume

create index idx_topics_suggested_for
on topics (bumped_at) 
where archetype <> 'private_message' and deleted_at is null and visible

Should work best, however admins see non-visible topics so perhaps adding it in makes sense.

What I am really struggling here with is why would you be adding deleted_at to the actual index when the only valid value for it on selection is NULL. We are never ordering on deleted_at

DeanMarkTaylor · February 28, 2015, 8:50pm

###Best Results
Yes this gives the best results:

create index idx_topics_suggested_for
on topics (visible, created_at) 
where archetype <> 'private_message' and deleted_at is null

Index size: 1248KB
Method: Uses Index scan
Reduces the query time to: 6.7-5.8ms from 141.3-112ms

###Other Notes
####Note 1
In this query the index is not used because of the join reducing the result set before bumped_at is used / useful.

create index idx_topics_suggested_for
on topics (bumped_at) 
where archetype <> 'private_message' and deleted_at is null and visible

Index size: 896KB

####Note 2
The following index causes “recheck condition” on the topics table using bitmap heap scan after initially reducing the result set, causing slower total query time.

create index idx_topics_suggested_for
on topics (created_at) 
where archetype <> 'private_message' and deleted_at is null and visible

Index size: 896KB

####Note 3

Yeah that would be because I’m used to not using INDEX’s that support WHERE conditions, you don’t need to be ordering on an item for it to help with index lookup performance in those cases.

Topic		Replies	Views
Consider Random Topic backfill as background task? Dev	1	1172	April 13, 2016
Slow SQL query causes homepage to load in 2-4 sec Support	16	1627	July 1, 2018
Suggested Topics - Title & Content based Suggestions Feature suggested-topics	16	3015	September 3, 2023
Understanding suggested topics logic Using Discourse explanation , suggested-topics	2	1139	April 25, 2024
App/models/topic_tracking_state - `report` - long response times? Dev slow-sql	6	1758	February 26, 2016

Improve performance of Topic load - ORDER BY RANDOM

Related topics