Large forums with 1+ MM posts, how's your Search performing?

Jumanji · April 28, 2020, 3:51pm

We’ve had a recent influx of users and traffic surge, and with that, more searches performed simultaneously.

We’re starting to see search timeouts across about 1.2 million posts.

What are you larger forums doing Search-wise? Have you moved outside of core Discourse search feature or is it still performing well? Perhaps updated the search algo/queries to be more restrictive/specific? DB tuning?

Thanks for the comments!

Ghan · April 28, 2020, 4:01pm

What are your server specs and what resources seem most constrained?

I’m looking into potentially moving a board with about 20 million posts (public posts + private conversations) so I’m definitely interested in learning about potential performance issues with large boards.

pfaffman · April 28, 2020, 4:56pm

It’s a long way from 1M posts to 20M posts, so there’s little advice that will apply to both of you. FWIW, I’ve got a site with 9M posts that has acceptable performance on a i7-7500U @ 2.7Ghz with 16GB of ram. There are a few other low volume sites on the same server.

A site using AWS database with 8GB of ram with 4M posts is having some struggle on searches especially.

@Jumanji, what are your server specs?

Falco · April 28, 2020, 4:58pm

Jumanji · May 7, 2020, 1:31am

We’re on AWS. Forums are running on 4 instances, I believe t3.larges, so 4 vcpus and 16 GB ram. We’re running Kubernetes on a custom docker image. Our DB is on it’s own instance, m4.large, so 2 vcpus and 8 GB ram. The ram on the db instances could be the bottleneck for sure.

The queries that are timing out are generally more generic terms, single words. We’re had some run for greater than 60s. Now, we have also seen an influx of traffic to our website in the recent month. Last month, the month over previous month traffic jumped 40%. We also moved from v1.9 to v2.4 during that time, and I know since 1.9, Search has added the autocomplete search, so I assume that’s more load on the db server.

We did some query analysis and the following query was running anywhere from 20-60s:

`SELECT "posts".*
FROM "posts"
JOIN (
    SELECT *, row_number() over() row_number
    FROM (
        SELECT topics.id, min(posts.post_number) post_number
        FROM "posts"
        INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id"
        INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL)
        LEFT JOIN categories ON categories.id = topics.category_id
        WHERE 
            ("posts"."deleted_at" IS NULL)
            AND "posts"."post_type" IN (1, 2, 3)
            AND (topics.visible)
            AND (topics.archetype <> 'private_message')
            AND (post_search_data.search_data @@ TO_TSQUERY('english', '''a'':*ABD & ''price'':*ABD'))
            AND (categories.id NOT IN (SELECT categories.id WHERE categories.search_priority = 1))
            AND ((categories.id IS NULL) OR (NOT categories.read_restricted))
        GROUP BY topics.id
        ORDER BY
            MAX((
                TS_RANK_CD(
                    post_search_data.search_data,
                    TO_TSQUERY('english', '''a'':*ABD & ''price'':*ABD'),
                    1|32
                ) * (
                    CASE categories.search_priority
                    WHEN 2
                    THEN 0.6
                    WHEN 3
                    THEN 0.8
                    WHEN 4
                    THEN 1.2
                    WHEN 5
                    THEN 1.4
                    ELSE
                    CASE
                        WHEN topics.closed
                        THEN 0.9
                        ELSE 1
                        END
                    END
                )
            )) DESC,
            topics.bumped_at DESC
        LIMIT 6
        OFFSET 0
    ) xxx
) x ON x.id = posts.topic_id AND x.post_number = posts.post_number
WHERE ("posts"."deleted_at" IS NULL)
ORDER BY row_number;

---

 Sort  (cost=495214.55..495214.56 rows=1 width=1177) (actual time=20529.725..20529.727 rows=6 loops=1)
   Sort Key: (row_number() OVER (?))
   Sort Method: quicksort  Memory: 29kB
   ->  Nested Loop  (cost=495164.08..495214.54 rows=1 width=1177) (actual time=20525.899..20529.703 rows=6 loops=1)
         ->  WindowAgg  (cost=495163.65..495163.80 rows=6 width=16) (actual time=20520.976..20521.078 rows=6 loops=1)
               ->  Limit  (cost=495163.65..495163.66 rows=6 width=24) (actual time=20520.969..20521.063 rows=6 loops=1)
                     ->  Sort  (cost=495163.65..495232.24 rows=27438 width=24) (actual time=20520.967..20520.969 rows=6 loops=1)
                           Sort Key: (max((ts_rank_cd(post_search_data.search_data, '''price'':*ABD'::tsquery, 33) * (CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE '1'::numeric END END)::double precision))) DESC, topics.bumped_at DESC
                           Sort Method: top-N heapsort  Memory: 25kB
                           ->  GroupAggregate  (cost=493642.90..494671.83 rows=27438 width=24) (actual time=19082.214..20506.763 rows=32951 loops=1)
                                 Group Key: topics.id
                                 ->  Sort  (cost=493642.90..493711.50 rows=27438 width=400) (actual time=19082.184..19283.907 rows=191436 loops=1)
                                       Sort Key: topics.id
                                       Sort Method: external merge  Disk: 77632kB
                                       ->  Hash Left Join  (cost=36655.60..486646.69 rows=27438 width=400) (actual time=1562.696..18611.724 rows=191436 loops=1)
                                             Hash Cond: (topics.category_id = categories.id)
                                             Filter: (((categories.id IS NULL) OR (NOT categories.read_restricted)) AND (NOT (SubPlan 1)))
                                             ->  Gather  (cost=36645.63..486471.60 rows=58991 width=400) (actual time=1562.623..18249.349 rows=191436 loops=1)
                                                   Workers Planned: 2
                                                   Workers Launched: 2
                                                   ->  Nested Loop  (cost=35645.63..479572.50 rows=24580 width=400) (actual time=1556.547..18541.793 rows=63812 loops=3)
                                                         ->  Hash Join  (cost=35645.20..328688.61 rows=157831 width=25) (actual time=1551.912..13356.416 rows=285279 loops=3)
                                                               Hash Cond: (posts_1.topic_id = topics.id)
                                                               ->  Parallel Seq Scan on posts posts_1  (cost=0.00..286245.92 rows=504559 width=12) (actual time=0.280..11249.160 rows=404770 loops=3)
                                                                     Filter: ((deleted_at IS NULL) AND (post_type = ANY ('{1,2,3}'::integer[])))
                                                                     Rows Removed by Filter: 21884
                                                               ->  Hash  (cost=33938.80..33938.80 rows=92912 width=17) (actual time=1549.103..1549.103 rows=80351 loops=3)
                                                                     Buckets: 65536  Batches: 2  Memory Usage: 2557kB
                                                                     ->  Seq Scan on topics  (cost=0.00..33938.80 rows=92912 width=17) (actual time=0.010..1492.606 rows=80351 loops=3)
                                                                           Filter: ((deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text))
                                                                           Rows Removed by Filter: 216751
                                                         ->  Index Scan using posts_search_pkey on post_search_data  (cost=0.43..0.96 rows=1 width=383) (actual time=0.017..0.017 rows=0 loops=855836)
                                                               Index Cond: (post_id = posts_1.id)
                                                               Filter: (search_data @@ '''price'':*ABD'::tsquery)
                                                               Rows Removed by Filter: 1
                                             ->  Hash  (cost=9.43..9.43 rows=43 width=9) (actual time=0.059..0.059 rows=43 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                                   ->  Seq Scan on categories  (cost=0.00..9.43 rows=43 width=9) (actual time=0.007..0.045 rows=43 loops=1)
                                             SubPlan 1
                                               ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=191436)
                                                     One-Time Filter: (categories.search_priority = 1)
         ->  Index Scan using index_posts_on_topic_id_and_post_number on posts  (cost=0.43..8.45 rows=1 width=1169) (actual time=1.435..1.435 rows=1 loops=6)
               Index Cond: ((topic_id = topics.id) AND (post_number = (min(posts_1.post_number))))
               Filter: (deleted_at IS NULL)
 Planning time: 3.508 ms
 Execution time: 20541.411 ms
(46 rows)`

Falco · May 7, 2020, 1:34am

Os your total database size less than 8GB? You will want to fit as much possible of your data in RAM.

Oh this will be slow. And a lot worse if the disk is a network mount.

Also check my reply above for tweaks.

Faizan_Zahid · May 7, 2020, 12:20pm

@Jumanji Could you please share some high traffic forums sites running on discourse, links please?
Traffic over millions.

Jumanji · May 7, 2020, 6:18pm

I’m not part of the Discourse team, but I found this using search.

Jumanji · May 8, 2020, 6:49pm

We’re at 24 GB. Definitely larger than the RAM available to the instance.

sam · May 12, 2020, 8:04am

In some cases after massive moves running vacuum analyze can improve performance.

Also if you are on AWS are you using RDS? If not… why not?

Jumanji · May 12, 2020, 11:56pm

Thanks for that. Yup, we ran than and are on RDS.

Falco · May 14, 2020, 4:05pm

17 posts were split to a new topic: Posting is taking multiple seconds

Topic		Replies	Views
Improving Instance Performance (Megatopics, Database Size and Extreme Load) Installation	60	4825	October 13, 2020
Discourse installation has been getting slower and slower and slower Installation server-resources	37	1530	May 15, 2023
Search query times out for big sites Dev	2	1339	January 13, 2020
Slow Profile Loads with 100GB+ database Feature	43	2862	January 1, 2021
Are there any commands that will speed up the site? Support	19	2114	December 2, 2020

Large forums with 1+ MM posts, how's your Search performing?

Related topics