Slow tags query when alphabetical sort is disabled


(Simon Wu) #1

Hi,

I found a slow query during the test.

SELECT  COUNT(topics.id) AS count_topics_id, tags.id, tags.name AS tags_id_tags_name
FROM "tags"
LEFT JOIN topic_tags ON tags.id = topic_tags.tag_id
LEFT JOIN topics ON topics.id = topic_tags.topic_id AND topics.deleted_at IS NULL
WHERE (topics.category_id in (3,5,12,11,9,6,8,1,7,10,2,4))
GROUP BY tags.id, tags.name
ORDER BY count_topics_id DESC
LIMIT 30

This query will be effective by enabling Show a dropdown a filter a topic list by tag. and disabling Show tags in alphabetical order. Configuration as below:
image

And the table topic_tags has more than 3 million records. The query will run 4 seconds.

Regards,


(Matt Palmer) #2

What does EXPLAIN show for the query, on your database?


(Simon Wu) #3

Explain


(Sam Saffron) #4

@neil is working on improving tagging here so he may have already addressed this in his dev env.


(Neil Lalonde) #6

Improving this performance is a bit tricky. I think we’ll need to change the functionality of that filter. This case is easy and can be optimized now that we normalize the number of times a tag has been used:

51 PM

But when scoped to a category, we currently re-count how many times every tag has been used in topics in that category and sub-categories and then sort.

10 PM

Also included in the query is which categories you have permission to see. Tags allowed only in private categories shouldn’t be exposed in this filter.

One option is to only show tags used in the category, but sort by their global counts instead of category-specific counts.


(Sam Saffron) #7

I just question the entire premise of this, is 3 million tags even a real use case?

I guess we could create a “category_tag” table and keep it up to date then sort on that. But I really want to deal with some real world data here before going crazy.


(Jeff Atwood) #11

3 million tags isn’t remotely a target in our software. We’re looking at more like 100,000 max certainly FAR under a million.


(David Taylor) #12

I think 3 million in this case is the number of topic-tag relations, rather than the number of unique tags.

So that could be 100,000 tags, with 30 topics attached to each tag. That’s still very large, but not quite as unbelievable as 3 million unique tags.


(Sam Saffron) #13

Well I guess we are going to need a table to track counts of tag per category, then we can cheaply figure this out.


(Simon Wu) #14

@David_Taylor , you’re right. It’s 20 unique tags. And 1~5 tags per topic.
@sam, please refer to Tags - Stack Overflow. You will see the real world data could be way more than 3 million records in topic_tags table.


(Sam Saffron) #15

I know that page, I remember hacking on that


(Simon Wu) #16

Great to hear :slight_smile:


(Neil Lalonde) #17

I committed a fix for this, so please give it a try.


(Simon Wu) #18

Hi Neil. I try the latest bits. But I don’t find the tag dropdown after turning on show filter by tag.
But /tags page looks good.


(Neil Lalonde) #19

Do you see any errors in your logs that mention “category_tag_stats”? There’s a job that runs which will initialize the stats needed by the dropdown. It might take a while to run on your database, but should have completed.


(Simon Wu) #20

Yes, I can see it now. It will take a while since my forum has over 5M topics.