Slow Profile Loads with 100GB+ database

Use the queries you listed in the OP. Run those in a psql shell prepended by EXPLAIN ANALYZE and paste the output here.

2 Likes

Looks like there’s some caching at work so subsequent loads tend to be faster, but > 5 seconds is still brutal when the cache expires.

                                                                                             QUERY PLAN                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=26396.41..26397.10 rows=6 width=1278) (actual time=90.879..114.499 rows=6 loops=1)
   ->  Gather Merge  (cost=26396.41..26448.85 rows=456 width=1278) (actual time=90.877..114.496 rows=6 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=25396.40..25397.54 rows=456 width=1278) (actual time=87.195..87.197 rows=5 loops=2)
               Sort Key: posts.like_count DESC, posts.created_at DESC
               Sort Method: top-N heapsort  Memory: 38kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 39kB
               ->  Nested Loop  (cost=103.30..25388.23 rows=456 width=1278) (actual time=3.200..83.497 rows=3610 loops=2)
                     ->  Parallel Bitmap Heap Scan on posts  (cost=99.06..9509.95 rows=1356 width=783) (actual time=3.075..56.880 rows=5932 loops=2)
                           Recheck Cond: (user_id = 27510)
                           Filter: ((deleted_at IS NULL) AND (post_number > 1) AND (post_type = ANY ('{1,2,3,4}'::integer[])))
                           Rows Removed by Filter: 1071
                           Heap Blocks: exact=6272
                           ->  Bitmap Index Scan on index_posts_on_user_id_and_created_at  (cost=0.00..98.48 rows=2389 width=0) (actual time=3.916..3.917 rows=20157 loops=1)
                                 Index Cond: (user_id = 27510)
                     ->  Index Scan using topics_pkey on topics  (cost=4.24..11.71 rows=1 width=495) (actual time=0.004..0.004 rows=1 loops=11864)
                           Index Cond: (id = posts.topic_id)
                           Filter: ((deleted_at IS NULL) AND (deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text) AND ((category_id IS NULL) OR (hashed SubPlan 1)))
                           Rows Removed by Filter: 0
                           SubPlan 1
                             ->  Seq Scan on categories  (cost=0.00..3.74 rows=28 width=4) (actual time=0.013..0.029 rows=35 loops=2)
                                   Filter: ((NOT read_restricted) OR (id = ANY ('{3,4,5,17,19,25,26,27,28}'::integer[])))
 Planning Time: 2.535 ms
 Execution Time: 114.657 ms
(25 rows)
                                                                                          QUERY PLAN                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25004.87..25004.87 rows=1 width=1464) (actual time=98.136..121.987 rows=6 loops=1)
   ->  Sort  (cost=25004.87..25004.87 rows=1 width=1464) (actual time=98.134..121.984 rows=6 loops=1)
         Sort Key: topic_links.clicks DESC, topic_links.created_at DESC
         Sort Method: top-N heapsort  Memory: 55kB
         ->  Nested Loop  (cost=1103.75..25004.86 rows=1 width=1464) (actual time=6.763..118.114 rows=3443 loops=1)
               ->  Gather  (cost=1099.51..24993.34 rows=1 width=969) (actual time=6.464..88.294 rows=9130 loops=1)
                     Workers Planned: 1
                     Workers Launched: 1
                     ->  Nested Loop  (cost=99.51..23993.24 rows=1 width=969) (actual time=3.151..73.939 rows=4565 loops=2)
                           ->  Parallel Bitmap Heap Scan on posts  (cost=99.08..9506.46 rows=1405 width=783) (actual time=3.032..43.325 rows=7003 loops=2)
                                 Recheck Cond: (user_id = 27510)
                                 Filter: ((deleted_at IS NULL) AND (post_type = ANY ('{1,2,3,4}'::integer[])))
                                 Heap Blocks: exact=5953
                                 ->  Bitmap Index Scan on index_posts_on_user_id_and_created_at  (cost=0.00..98.48 rows=2389 width=0) (actual time=3.790..3.791 rows=20157 loops=1)
                                       Index Cond: (user_id = 27510)
                           ->  Index Scan using index_topic_links_on_post_id on topic_links  (cost=0.43..10.30 rows=1 width=186) (actual time=0.003..0.004 rows=1 loops=14006)
                                 Index Cond: (post_id = posts.id)
                                 Filter: ((NOT internal) AND (NOT reflection) AND (NOT quote) AND (user_id = 27510))
                                 Rows Removed by Filter: 0
               ->  Index Scan using topics_pkey on topics  (cost=4.24..11.52 rows=1 width=495) (actual time=0.003..0.003 rows=0 loops=9130)
                     Index Cond: (id = topic_links.topic_id)
                     Filter: ((deleted_at IS NULL) AND (deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text) AND ((category_id IS NULL) OR (hashed SubPlan 1)))
                     Rows Removed by Filter: 1
                     SubPlan 1
                       ->  Seq Scan on categories  (cost=0.00..3.74 rows=28 width=4) (actual time=0.009..0.022 rows=35 loops=1)
                             Filter: ((NOT read_restricted) OR (id = ANY ('{3,4,5,17,19,25,26,27,28}'::integer[])))
 Planning Time: 1.102 ms
 Execution Time: 122.098 ms
(28 rows)
SELECT "posts"."id" AS t0_r0, "posts"."user_id" AS t0_r1, "posts"."topic_id" AS t0_r2, "posts"."post_number" AS t0_r3, "posts"."raw" AS t0_r4, "posts"."cooked" AS t0_r5, "posts"."created_at" AS t0_r6, "posts"."updated_at" AS t0_r7, "posts"."reply_to_post_number" AS t0_r8, "posts"."reply_count" AS t0_r9, "posts"."quote_count" AS t0_r10, "posts"."deleted_at" AS t0_r11, "posts"."off_topic_count" AS t0_r12, "posts"."like_count" AS t0_r13, "posts"."incoming_link_count" AS t0_r14, "posts"."bookmark_count" AS t0_r15, "posts"."score" AS t0_r16, "posts"."reads" AS t0_r17, "posts"."post_type" AS t0_r18, "posts"."sort_order" AS t0_r19, "posts"."last_editor_id" AS t0_r20, "posts"."hidden" AS t0_r21, "posts"."hidden_reason_id" AS t0_r22, "posts"."notify_moderators_count" AS t0_r23, "posts"."spam_count" AS t0_r24, "posts"."illegal_count" AS t0_r25, "posts"."inappropriate_count" AS t0_r26, "posts"."last_version_at" AS t0_r27, "posts"."user_deleted" AS t0_r28, "posts"."reply_to_user_id" AS t0_r29, "posts"."percent_rank" AS t0_r30, "posts"."notify_user_count" AS t0_r31, "posts"."like_score" AS t0_r32, "posts"."deleted_by_id" AS t0_r33, "posts"."edit_reason" AS t0_r34, "posts"."word_count" AS t0_r35, "posts"."version" AS t0_r36, "posts"."cook_method" AS t0_r37, "posts"."wiki" AS t0_r38, "posts"."baked_at" AS t0_r39, "posts"."baked_version" AS t0_r40, "posts"."hidden_at" AS t0_r41, "posts"."self_edits" AS t0_r42, "posts"."reply_quoted" AS t0_r43, "posts"."via_email" AS t0_r44, "posts"."raw_email" AS t0_r45, "posts"."public_version" AS t0_r46, "posts"."action_code" AS t0_r47, "posts"."locked_by_id" AS t0_r48, "posts"."image_upload_id" AS t0_r49, "topics"."id" AS t1_r0, "topics"."title" AS t1_r1, "topics"."last_posted_at" AS t1_r2, "topics"."created_at" AS t1_r3, "topics"."updated_at" AS t1_r4, "topics"."views" AS t1_r5, "topics"."posts_count" AS t1_r6, "topics"."user_id" AS t1_r7, "topics"."last_post_user_id" AS t1_r8, "topics"."reply_count" AS t1_r9, "topics"."featured_user1_id" AS t1_r10, "topics"."featured_user2_id" AS t1_r11, "topics"."featured_user3_id" AS t1_r12, "topics"."deleted_at" AS t1_r13, "topics"."highest_post_number" AS t1_r14, "topics"."like_count" AS t1_r15, "topics"."incoming_link_count" AS t1_r16, "topics"."category_id" AS t1_r17, "topics"."visible" AS t1_r18, "topics"."moderator_posts_count" AS t1_r19, "topics"."closed" AS t1_r20, "topics"."archived" AS t1_r21, "topics"."bumped_at" AS t1_r22, "topics"."has_summary" AS t1_r23, "topics"."archetype" AS t1_r24, "topics"."featured_user4_id" AS t1_r25, "topics"."notify_moderators_count" AS t1_r26, "topics"."spam_count" AS t1_r27, "topics"."pinned_at" AS t1_r28, "topics"."score" AS t1_r29, "topics"."percent_rank" AS t1_r30, "topics"."subtype" AS t1_r31, "topics"."slug" AS t1_r32, "topics"."deleted_by_id" AS t1_r33, "topics"."participant_count" AS t1_r34, "topics"."word_count" AS t1_r35, "topics"."excerpt" AS t1_r36, "topics"."pinned_globally" AS t1_r37, "topics"."pinned_until" AS t1_r38, "topics"."fancy_title" AS t1_r39, "topics"."highest_staff_post_number" AS t1_r40, "topics"."featured_link" AS t1_r41, "topics"."reviewable_score" AS t1_r42, "topics"."image_upload_id" AS t1_r43, "topics"."slow_mode_seconds" AS t1_r44 FROM "posts" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) WHERE ("posts"."deleted_at" IS NULL) AND (posts.post_type IN (1,2,3,4)) AND ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') AND "topics"."visible" = TRUE AND (topics.category_id IS NULL OR topics.category_id IN (SELECT id FROM categories WHERE NOT read_restricted OR id IN (3,4,5,17,19,25,26,27,28))) AND "posts"."user_id" = 27510 AND (post_number > 1) ORDER BY posts.like_count DESC, posts.created_at DESC LIMIT 6;
SELECT "topic_links"."id" AS t0_r0, "topic_links"."topic_id" AS t0_r1, "topic_links"."post_id" AS t0_r2, "topic_links"."user_id" AS t0_r3, "topic_links"."url" AS t0_r4, "topic_links"."domain" AS t0_r5, "topic_links"."internal" AS t0_r6, "topic_links"."link_topic_id" AS t0_r7, "topic_links"."created_at" AS t0_r8, "topic_links"."updated_at" AS t0_r9, "topic_links"."reflection" AS t0_r10, "topic_links"."clicks" AS t0_r11, "topic_links"."link_post_id" AS t0_r12, "topic_links"."title" AS t0_r13, "topic_links"."crawled_at" AS t0_r14, "topic_links"."quote" AS t0_r15, "topic_links"."extension" AS t0_r16, "topics"."id" AS t1_r0, "topics"."title" AS t1_r1, "topics"."last_posted_at" AS t1_r2, "topics"."created_at" AS t1_r3, "topics"."updated_at" AS t1_r4, "topics"."views" AS t1_r5, "topics"."posts_count" AS t1_r6, "topics"."user_id" AS t1_r7, "topics"."last_post_user_id" AS t1_r8, "topics"."reply_count" AS t1_r9, "topics"."featured_user1_id" AS t1_r10, "topics"."featured_user2_id" AS t1_r11, "topics"."featured_user3_id" AS t1_r12, "topics"."deleted_at" AS t1_r13, "topics"."highest_post_number" AS t1_r14, "topics"."like_count" AS t1_r15, "topics"."incoming_link_count" AS t1_r16, "topics"."category_id" AS t1_r17, "topics"."visible" AS t1_r18, "topics"."moderator_posts_count" AS t1_r19, "topics"."closed" AS t1_r20, "topics"."archived" AS t1_r21, "topics"."bumped_at" AS t1_r22, "topics"."has_summary" AS t1_r23, "topics"."archetype" AS t1_r24, "topics"."featured_user4_id" AS t1_r25, "topics"."notify_moderators_count" AS t1_r26, "topics"."spam_count" AS t1_r27, "topics"."pinned_at" AS t1_r28, "topics"."score" AS t1_r29, "topics"."percent_rank" AS t1_r30, "topics"."subtype" AS t1_r31, "topics"."slug" AS t1_r32, "topics"."deleted_by_id" AS t1_r33, "topics"."participant_count" AS t1_r34, "topics"."word_count" AS t1_r35, "topics"."excerpt" AS t1_r36, "topics"."pinned_globally" AS t1_r37, "topics"."pinned_until" AS t1_r38, "topics"."fancy_title" AS t1_r39, "topics"."highest_staff_post_number" AS t1_r40, "topics"."featured_link" AS t1_r41, "topics"."reviewable_score" AS t1_r42, "topics"."image_upload_id" AS t1_r43, "topics"."slow_mode_seconds" AS t1_r44, "posts"."id" AS t2_r0, "posts"."user_id" AS t2_r1, "posts"."topic_id" AS t2_r2, "posts"."post_number" AS t2_r3, "posts"."raw" AS t2_r4, "posts"."cooked" AS t2_r5, "posts"."created_at" AS t2_r6, "posts"."updated_at" AS t2_r7, "posts"."reply_to_post_number" AS t2_r8, "posts"."reply_count" AS t2_r9, "posts"."quote_count" AS t2_r10, "posts"."deleted_at" AS t2_r11, "posts"."off_topic_count" AS t2_r12, "posts"."like_count" AS t2_r13, "posts"."incoming_link_count" AS t2_r14, "posts"."bookmark_count" AS t2_r15, "posts"."score" AS t2_r16, "posts"."reads" AS t2_r17, "posts"."post_type" AS t2_r18, "posts"."sort_order" AS t2_r19, "posts"."last_editor_id" AS t2_r20, "posts"."hidden" AS t2_r21, "posts"."hidden_reason_id" AS t2_r22, "posts"."notify_moderators_count" AS t2_r23, "posts"."spam_count" AS t2_r24, "posts"."illegal_count" AS t2_r25, "posts"."inappropriate_count" AS t2_r26, "posts"."last_version_at" AS t2_r27, "posts"."user_deleted" AS t2_r28, "posts"."reply_to_user_id" AS t2_r29, "posts"."percent_rank" AS t2_r30, "posts"."notify_user_count" AS t2_r31, "posts"."like_score" AS t2_r32, "posts"."deleted_by_id" AS t2_r33, "posts"."edit_reason" AS t2_r34, "posts"."word_count" AS t2_r35, "posts"."version" AS t2_r36, "posts"."cook_method" AS t2_r37, "posts"."wiki" AS t2_r38, "posts"."baked_at" AS t2_r39, "posts"."baked_version" AS t2_r40, "posts"."hidden_at" AS t2_r41, "posts"."self_edits" AS t2_r42, "posts"."reply_quoted" AS t2_r43, "posts"."via_email" AS t2_r44, "posts"."raw_email" AS t2_r45, "posts"."public_version" AS t2_r46, "posts"."action_code" AS t2_r47, "posts"."locked_by_id" AS t2_r48, "posts"."image_upload_id" AS t2_r49 FROM "topic_links" INNER JOIN "topics" ON "topics"."id" = "topic_links"."topic_id" AND ("topics"."deleted_at" IS NULL) INNER JOIN "posts" ON "posts"."id" = "topic_links"."post_id" AND ("posts"."deleted_at" IS NULL) WHERE "posts"."user_id" = 27510 AND (posts.post_type IN (1,2,3,4)) AND ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') AND "topics"."visible" = TRUE AND (topics.category_id IS NULL OR topics.category_id IN (SELECT id FROM categories WHERE NOT read_restricted OR id IN (3,4,5,17,19,25,26,27,28))) AND "topic_links"."user_id" = 27510 AND "topic_links"."internal" = FALSE AND "topic_links"."reflection" = FALSE AND "topic_links"."quote" = FALSE ORDER BY clicks DESC, topic_links.created_at DESC LIMIT 6; 

Both are using memory for temp files, and the execution times are good enough. Maybe it’s stabilizing?

I think this analyze might be pulling some cached data. The summary page from the site shows much longer execution times:

I’ll try rerunning the analyze later once the cache has maybe expired.

The reindex should only need to run once - after the indexes are on the v13 disk format, they stay in v13 disk format.

Yes, if your VACUUM didn’t include ANALYZE you’re missing out on index usage.

2 Likes

Okay, so this query is fetching the most-liked posts for a given user.

The estimate here was way off - the planner estimated for 2,389 rows and got 20,157 rows.

One level up, another x5 discrepancy.

That’s the kind of misestimations that will cause terrible query performance.

2 Likes

Well, they should never have been on anything but the v13 disk format as far as I know. We imported the data from another forum software to a fresh Discourse install that was already running v13. Right now just trying to make sure all the indexes are properly rebuilt after the import.

Makes sense. I’m trying to run the reindex again after increasing maintenance_work_mem. It looks like postmaster is using more memory now than previous but I still don’t know if it is going to complete in a reasonable amount of time.

2 Likes

Looks like there’s a badge job running that is holding up the reindex.

This has been running for over two hours? That seems odd.

Could be something in this weekly job that’s still running as well:

You might do a

sv stop unicorn

to stop the web server during the database reindex.

1 Like

Ok, so the reindex completed finally in a reasonable amount of time. There were some indexes it said it couldn’t rebuild and skipped, but no further details as to why. Here are the table stats now:

The profile load issue appears to have gotten worse:

Here are the EXPLAINs from those queries:

                                                                                                QUERY PLAN                               
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=625609.73..625610.43 rows=6 width=1278) (actual time=2058.211..2154.991 rows=6 loops=1)
   ->  Gather Merge  (cost=625609.73..630928.70 rows=45588 width=1278) (actual time=1654.881..1751.658 rows=6 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=624609.70..624666.69 rows=22794 width=1278) (actual time=1619.464..1619.469 rows=5 loops=3)
               Sort Key: posts.like_count DESC, posts.created_at DESC
               Sort Method: top-N heapsort  Memory: 37kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 37kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 38kB
               ->  Parallel Hash Join  (cost=63000.65..624201.13 rows=22794 width=1278) (actual time=1383.820..1600.913 rows=20578 loops=3)
                     Hash Cond: (posts.topic_id = topics.id)
                     ->  Parallel Bitmap Heap Scan on posts  (cost=1875.97..562899.73 rows=67322 width=783) (actual time=63.310..264.627 rows=20579 loops=3)
                           Recheck Cond: ((user_id = 7237) AND (deleted_at IS NULL))
                           Filter: ((post_number > 1) AND (post_type = ANY ('{1,2,3,4}'::integer[])))
                           Rows Removed by Filter: 39566
                           Heap Blocks: exact=50390
                           ->  Bitmap Index Scan on idx_posts_user_id_deleted_at  (cost=0.00..1835.58 rows=167352 width=0) (actual time=36.507..36.508 rows=180435 loops=1)
                                 Index Cond: (user_id = 7237)
                     ->  Parallel Hash  (cost=59504.20..59504.20 rows=129638 width=495) (actual time=1319.362..1319.364 rows=131785 loops=3)
                           Buckets: 524288  Batches: 1  Memory Usage: 190560kB
                           ->  Parallel Seq Scan on topics  (cost=3.81..59504.20 rows=129638 width=495) (actual time=316.007..1204.917 rows=131785 loops=3)
                                 Filter: ((deleted_at IS NULL) AND (deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text) AND ((category_id IS NULL) OR (hashed SubPlan 1)))
                                 Rows Removed by Filter: 174529
                                 SubPlan 1
                                   ->  Seq Scan on categories  (cost=0.00..3.74 rows=28 width=4) (actual time=17.841..17.887 rows=35 loops=3)
                                         Filter: ((NOT read_restricted) OR (id = ANY ('{3,4,5,17,19,25,26,27,28}'::integer[])))
 Planning Time: 0.751 ms
 JIT:
   Functions: 91
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 11.057 ms, Inlining 134.342 ms, Optimization 762.277 ms, Emission 451.708 ms, Total 1359.384 ms
 Execution Time: 2204.845 ms
(32 rows)
                                                                                             QUERY PLAN                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=726143.96..726144.66 rows=6 width=1464) (actual time=5250.325..5305.144 rows=6 loops=1)
   ->  Gather Merge  (cost=726143.96..726351.17 rows=1776 width=1464) (actual time=4800.064..4854.881 rows=6 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=725143.93..725146.15 rows=888 width=1464) (actual time=4762.610..4762.615 rows=5 loops=3)
               Sort Key: topic_links.clicks DESC, topic_links.created_at DESC
               Sort Method: top-N heapsort  Memory: 36kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 37kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 32kB
               ->  Nested Loop  (cost=574851.64..725128.02 rows=888 width=1464) (actual time=649.373..4710.853 rows=39385 loops=3)
                     ->  Parallel Hash Join  (cost=574847.40..695445.64 rows=2624 width=969) (actual time=630.974..3847.821 rows=336541 loops=3)
                           Hash Cond: (topic_links.post_id = posts.id)
                           ->  Parallel Bitmap Heap Scan on topic_links  (cost=11248.93..130753.84 rows=416505 width=186) (actual time=58.964..3095.540 rows=336541 loops=3)
                                 Recheck Cond: (user_id = 7237)
                                 Filter: ((NOT internal) AND (NOT reflection) AND (NOT quote))
                                 Rows Removed by Filter: 8
                                 Heap Blocks: exact=23544
                                 ->  Bitmap Index Scan on index_topic_links_on_user_id  (cost=0.00..10999.02 rows=1000879 width=0) (actual time=45.320..45.322 rows=1009648 loops=1)
                                       Index Cond: (user_id = 7237)
                           ->  Parallel Hash  (cost=562726.85..562726.85 rows=69730 width=783) (actual time=571.264..571.266 rows=60145 loops=3)
                                 Buckets: 262144  Batches: 1  Memory Usage: 71264kB
                                 ->  Parallel Bitmap Heap Scan on posts  (cost=1877.42..562726.85 rows=69730 width=783) (actual time=377.440..481.561 rows=60145 loops=3)
                                       Recheck Cond: ((user_id = 7237) AND (deleted_at IS NULL))
                                       Filter: (post_type = ANY ('{1,2,3,4}'::integer[]))
                                       Heap Blocks: exact=141396
                                       ->  Bitmap Index Scan on idx_posts_user_id_deleted_at  (cost=0.00..1835.58 rows=167352 width=0) (actual time=29.225..29.226 rows=180435 loops=1)
                                             Index Cond: (user_id = 7237)
                     ->  Index Scan using topics_pkey on topics  (cost=4.24..11.31 rows=1 width=495) (actual time=0.002..0.002 rows=0 loops=1009624)
                           Index Cond: (id = topic_links.topic_id)
                           Filter: ((deleted_at IS NULL) AND (deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text) AND ((category_id IS NULL) OR (hashed SubPlan 1)))
                           Rows Removed by Filter: 1
                           SubPlan 1
                             ->  Seq Scan on categories  (cost=0.00..3.74 rows=28 width=4) (actual time=17.793..17.835 rows=35 loops=3)
                                   Filter: ((NOT read_restricted) OR (id = ANY ('{3,4,5,17,19,25,26,27,28}'::integer[])))
 Planning Time: 4.526 ms
 JIT:
   Functions: 118
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 14.724 ms, Inlining 94.921 ms, Optimization 934.359 ms, Emission 525.383 ms, Total 1569.387 ms
 Execution Time: 5309.478 ms
(40 rows)

I’ve added the indexes I came up with previously to help with this (Slow Page Loads on User Profiles - #2 by Ghan) and that has cut the load times down quite a bit.

Query plan after adding indexes:

                                                                                                 QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.13..25136.02 rows=6 width=1473) (actual time=84.592..96.171 rows=6 loops=1)
   ->  Gather Merge  (cost=1004.13..9649737.42 rows=2399 width=1473) (actual time=84.591..96.168 rows=6 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=4.11..9648460.49 rows=1000 width=1473) (actual time=60.452..60.471 rows=4 loops=3)
               ->  Nested Loop  (cost=0.87..9619828.27 rows=2943 width=977) (actual time=0.115..47.798 rows=6058 loops=3)
                     ->  Parallel Index Scan using index_topic_links_on_clicks_and_created_desc on topic_links  (cost=0.43..6728256.06 rows=403188 width=186) (actual time=0.051..30.428 rows=6058 loops=3)
                           Filter: ((NOT internal) AND (NOT reflection) AND (NOT quote) AND (user_id = 7237))
                           Rows Removed by Filter: 54345
                     ->  Index Scan using posts_pkey on posts  (cost=0.44..7.17 rows=1 width=791) (actual time=0.002..0.002 rows=1 loops=18173)
                           Index Cond: (id = topic_links.post_id)
                           Filter: ((deleted_at IS NULL) AND (user_id = 7237) AND (post_type = ANY ('{1,2,3,4}'::integer[])))
               ->  Index Scan using topics_pkey on topics  (cost=3.24..9.73 rows=1 width=496) (actual time=0.002..0.002 rows=0 loops=18173)
                     Index Cond: (id = topic_links.topic_id)
                     Filter: ((deleted_at IS NULL) AND (deleted_at IS NULL) AND visible AND ((archetype)::text <> 'private_message'::text) AND ((category_id IS NULL) OR (hashed SubPlan 1)))
                     Rows Removed by Filter: 1
                     SubPlan 1
                       ->  Seq Scan on categories  (cost=0.00..2.74 rows=28 width=4) (actual time=0.031..0.049 rows=35 loops=3)
                             Filter: ((NOT read_restricted) OR (id = ANY ('{3,4,5,17,19,25,26,27,28}'::integer[])))
 Planning Time: 1.205 ms
 Execution Time: 96.258 ms
(21 rows)

Here are the indexes if anyone wants to add them:

CREATE INDEX index_topic_links_on_clicks_and_created ON public.topic_links USING btree (clicks, created_at);
CREATE INDEX index_posts_on_like_count_and_created ON public.posts USING btree (like_count, created_at);
CREATE INDEX index_topic_links_on_clicks_and_created_desc ON public.topic_links USING btree (clicks DESC, created_at DESC);
CREATE INDEX index_posts_on_like_count_and_created_desc ON public.posts USING btree (like_count DESC, created_at DESC, user_id) WHERE deleted_at IS NULL AND post_number > 1 AND (post_type = ANY ('{1,2,3,4}'::integer[]));
1 Like

@codinghorror

Here and here you asked if those indexes were missing. Just adding these in is essentially night and day in terms of profile load speeds. These indexes should definitely be considered.

It is true that the cache added here: Slow Page Loads on User Profiles - #12 by codinghorror has made a huge difference but that’s only on subsequent loads, the indexes improve the initial experience over all. Users should not be subjected to slow load times even if they get better due to the cache for the time that is cached. The goal was to improve the first initial load (when ever the cache expires) and this accomplishes that big time.

4 Likes

Maybe after the holidays we can look at these indexes @sam @falco?

2 Likes

I am unclear how an index on every single row sorted on the number of clicks/like count is required for a user page that is scoped to a user id

2 Likes

As far as the clicks, in our case this appears to be where the gain is:

->  Sort  (cost=725143.93..725146.15 rows=888 width=1464) (actual time=4762.610..4762.615 rows=5 loops=3)
               Sort Key: topic_links.clicks DESC, topic_links.created_at DESC

This sort costs 4 seconds, which is part of why profiles are loading so slowly. This is just one query of many on the profile page (though one of the most costly in terms of speed). After adding the index, the query planner picks it up here, reducing the query time for this subtree down to about 30ms:

 ->  Parallel Index Scan using index_topic_links_on_clicks_and_created_desc on topic_links  (cost=0.43..6728256.06 rows=403188 width=186) (actual time=0.051..30.428 rows=6058 loops=3)

Obviously the query plan changes in other ways, too, but the index is definitely being reference here and the query runs much faster.

1 Like

But conceptually all we care about is links scoped to the user, why are the new proposed indexes not scoped to users?

1 Like

That might help the performance further. I’m not sure. The filter for the user_id is on the posts table so I’m not sure how that would interact with the topic_links side of the join, but it’s certainly worth taking a look at.

1 Like

What would you suggest we try to index, user IDs?

OK, let me break this down:

SELECT "topic_links"."id" AS t0_r0, "topic_links"."topic_id" AS t0_r1, "topic_links"."post_id" AS t0_r2, "topic_links"."user_id" AS t0_r3, "topic_links"."url" AS t0_r4, "topic_links"."domain" AS t0_r5, "topic_links"."internal" AS t0_r6, "topic_links"."link_topic_id" AS t0_r7, "topic_links"."created_at" AS t0_r8, "topic_links"."updated_at" AS t0_r9, "topic_links"."reflection" AS t0_r10, "topic_links"."clicks" AS t0_r11, "topic_links"."link_post_id" AS t0_r12, "topic_links"."title" AS t0_r13, "topic_links"."crawled_at" AS t0_r14, "topic_links"."quote" AS t0_r15, "topic_links"."extension" AS t0_r16, "topics"."id" AS t1_r0, "topics"."title" AS t1_r1, "topics"."last_posted_at" AS t1_r2, "topics"."created_at" AS t1_r3, "topics"."updated_at" AS t1_r4, "topics"."views" AS t1_r5, "topics"."posts_count" AS t1_r6, "topics"."user_id" AS t1_r7, "topics"."last_post_user_id" AS t1_r8, "topics"."reply_count" AS t1_r9, "topics"."featured_user1_id" AS t1_r10, "topics"."featured_user2_id" AS t1_r11, "topics"."featured_user3_id" AS t1_r12, "topics"."deleted_at" AS t1_r13, "topics"."highest_post_number" AS t1_r14, "topics"."like_count" AS t1_r15, "topics"."incoming_link_count" AS t1_r16, "topics"."category_id" AS t1_r17, "topics"."visible" AS t1_r18, "topics"."moderator_posts_count" AS t1_r19, "topics"."closed" AS t1_r20, "topics"."archived" AS t1_r21, "topics"."bumped_at" AS t1_r22, "topics"."has_summary" AS t1_r23, "topics"."archetype" AS t1_r24, "topics"."featured_user4_id" AS t1_r25, "topics"."notify_moderators_count" AS t1_r26, "topics"."spam_count" AS t1_r27, "topics"."pinned_at" AS t1_r28, "topics"."score" AS t1_r29, "topics"."percent_rank" AS t1_r30, "topics"."subtype" AS t1_r31, "topics"."slug" AS t1_r32, "topics"."deleted_by_id" AS t1_r33, "topics"."participant_count" AS t1_r34, "topics"."word_count" AS t1_r35, "topics"."excerpt" AS t1_r36, "topics"."pinned_globally" AS t1_r37, "topics"."pinned_until" AS t1_r38, "topics"."fancy_title" AS t1_r39, "topics"."highest_staff_post_number" AS t1_r40, "topics"."featured_link" AS t1_r41, "topics"."reviewable_score" AS t1_r42, "topics"."image_upload_id" AS t1_r43, "topics"."slow_mode_seconds" AS t1_r44, "posts"."id" AS t2_r0, "posts"."user_id" AS t2_r1, "posts"."topic_id" AS t2_r2, "posts"."post_number" AS t2_r3, "posts"."raw" AS t2_r4, "posts"."cooked" AS t2_r5, "posts"."created_at" AS t2_r6, "posts"."updated_at" AS t2_r7, "posts"."reply_to_post_number" AS t2_r8, "posts"."reply_count" AS t2_r9, "posts"."quote_count" AS t2_r10, "posts"."deleted_at" AS t2_r11, "posts"."off_topic_count" AS t2_r12, "posts"."like_count" AS t2_r13, "posts"."incoming_link_count" AS t2_r14, "posts"."bookmark_count" AS t2_r15, "posts"."score" AS t2_r16, "posts"."reads" AS t2_r17, "posts"."post_type" AS t2_r18, "posts"."sort_order" AS t2_r19, "posts"."last_editor_id" AS t2_r20, "posts"."hidden" AS t2_r21, "posts"."hidden_reason_id" AS t2_r22, "posts"."notify_moderators_count" AS t2_r23, "posts"."spam_count" AS t2_r24, "posts"."illegal_count" AS t2_r25, "posts"."inappropriate_count" AS t2_r26, "posts"."last_version_at" AS t2_r27, "posts"."user_deleted" AS t2_r28, "posts"."reply_to_user_id" AS t2_r29, "posts"."percent_rank" AS t2_r30, "posts"."notify_user_count" AS t2_r31, "posts"."like_score" AS t2_r32, "posts"."deleted_by_id" AS t2_r33, "posts"."edit_reason" AS t2_r34, "posts"."word_count" AS t2_r35, "posts"."version" AS t2_r36, "posts"."cook_method" AS t2_r37, "posts"."wiki" AS t2_r38, "posts"."baked_at" AS t2_r39, "posts"."baked_version" AS t2_r40, "posts"."hidden_at" AS t2_r41, "posts"."self_edits" AS t2_r42, "posts"."reply_quoted" AS t2_r43, "posts"."via_email" AS t2_r44, "posts"."raw_email" AS t2_r45, "posts"."public_version" AS t2_r46, "posts"."action_code" AS t2_r47, "posts"."locked_by_id" AS t2_r48, "posts"."image_upload_id" AS t2_r49 

FROM "topic_links" 
INNER JOIN "topics" ON "topics"."deleted_at" IS NULL AND "topics"."id" = "topic_links"."topic_id" 
INNER JOIN "posts" ON "posts"."deleted_at" IS NULL AND "posts"."id" = "topic_links"."post_id" 

WHERE "posts"."user_id" = 32 AND (posts.post_type IN (1,2,3,4)) AND "topics"."deleted_at" IS NULL AND (topics.archetype <> 'private_message') AND "topics"."visible" = TRUE AND (topics.category_id IS NULL OR topics.category_id IN (SELECT id FROM categories WHERE NOT read_restricted OR id IN (20,25,69,83,87,88,89,91,94,95,96,97,99,100))) AND "topic_links"."user_id" = 32 AND "topic_links"."internal" = FALSE AND "topic_links"."reflection" = FALSE AND "topic_links"."quote" = FALSE 

ORDER BY clicks DESC, topic_links.created_at DESC LIMIT 6; 

This query exists for this section:

Topic links has the indexes:

https://github.com/discourse/discourse/blob/1bef8356d6f9b08e88b8ab38b66f4c56873a9c55/app/models/topic_link.rb#L405-L410

It also has the column user_id

https://github.com/discourse/discourse/blob/1bef8356d6f9b08e88b8ab38b66f4c56873a9c55/app/models/topic_link.rb#L388-L388

Per:

https://github.com/discourse/discourse/blob/1bef8356d6f9b08e88b8ab38b66f4c56873a9c55/app/models/topic_link.rb#L314-L325

The links we create are correctly bound to the user id.

Therefore, what makes sense to me here is simply to improve the index:

Our clause is now:

"topic_links"."user_id" = 32 AND "topic_links"."internal" = FALSE AND "topic_links"."reflection" = FALSE AND "topic_links"."quote" = FALSE

This is joining on topic_id / post_id

So the index

create index idxFaster on topic_links(user_id, clicks desc, created_at desc) where (NOT reflection and NOT quote and NOT internal) include (post_id, topic_id)

Though, given that AR over selects here, covering it is kind of pointless so the include can be removed.

Can you nuke all your extra indexes and see if idxFaster fixes the issue for you?

EDIT

https://github.com/discourse/discourse/pull/11598

Should correct the issue once merged.

7 Likes