SELECT COUNT(topics.id) AS count_topics_id, tags.id, tags.name AS tags_id_tags_name
FROM "tags"
LEFT JOIN topic_tags ON tags.id = topic_tags.tag_id
LEFT JOIN topics ON topics.id = topic_tags.topic_id AND topics.deleted_at IS NULL
WHERE (topics.category_id in (3,5,12,11,9,6,8,1,7,10,2,4))
GROUP BY tags.id, tags.name
ORDER BY count_topics_id DESC
LIMIT 30
This query will be effective by enabling Show a dropdown a filter a topic list by tag. and disabling Show tags in alphabetical order. Configuration as below:
And the table topic_tags has more than 3 million records. The query will run 4 seconds.
Improving this performance is a bit tricky. I think we’ll need to change the functionality of that filter. This case is easy and can be optimized now that we normalize the number of times a tag has been used:
But when scoped to a category, we currently re-count how many times every tag has been used in topics in that category and sub-categories and then sort.
Also included in the query is which categories you have permission to see. Tags allowed only in private categories shouldn’t be exposed in this filter.
One option is to only show tags used in the category, but sort by their global counts instead of category-specific counts.
I just question the entire premise of this, is 3 million tags even a real use case?
I guess we could create a “category_tag” table and keep it up to date then sort on that. But I really want to deal with some real world data here before going crazy.
@david , you’re right. It’s 20 unique tags. And 1~5 tags per topic. @sam, please refer to Tags - Stack Overflow. You will see the real world data could be way more than 3 million records in topic_tags table.
Do you see any errors in your logs that mention “category_tag_stats”? There’s a job that runs which will initialize the stats needed by the dropdown. It might take a while to run on your database, but should have completed.