用户个人资料页面加载缓慢

我们已完成从当前软件到 Discourse 的用户帖子导入,但注意到部分用户页面加载非常缓慢(需数秒),这似乎是由慢查询引起的。我们该如何排查并定位瓶颈?

以下是我们环境的一些详细信息:

服务器配置:8 核 CPU,32 GB 内存,NVMe SSD 存储,采用 ZFS 文件系统

该论坛约有 2500 万篇帖子。其中大部分为私密对话,公开帖子约 1050 万篇。
我们确实存在一些巨型主题(我们是一个写作网站,因此单个故事往往随时间缓慢发展,积累大量帖子),某些主题包含数千条帖子,这可能也是影响因素之一,但我尚不确定如何验证这一点。

以下是其中一条疑似存在问题的查询,我在内置负载分析器中观察到其执行时间超过 5000 毫秒:

SELECT "topic_links"."id" AS t0_r0, "topic_links"."topic_id" AS t0_r1, "topic_links"."post_id" AS t0_r2, "topic_links"."user_id" AS t0_r3, "topic_links"."url" AS t0_r4, "topic_links"."domain" AS t0_r5, "topic_links"."internal" AS t0_r6, "topic_links"."link_topic_id" AS t0_r7, "topic_links"."created_at" AS t0_r8, "topic_links"."updated_at" AS t0_r9, "topic_links"."reflection" AS t0_r10, "topic_links"."clicks" AS t0_r11, "topic_links"."link_post_id" AS t0_r12, "topic_links"."title" AS t0_r13, "topic_links"."crawled_at" AS t0_r14, "topic_links"."quote" AS t0_r15, "topic_links"."extension" AS t0_r16, "topics"."id" AS t1_r0, "topics"."title" AS t1_r1, "topics"."last_posted_at" AS t1_r2, "topics"."created_at" AS t1_r3, "topics"."updated_at" AS t1_r4, "topics"."views" AS t1_r5, "topics"."posts_count" AS t1_r6, "topics"."user_id" AS t1_r7, "topics"."last_post_user_id" AS t1_r8, "topics"."reply_count" AS t1_r9, "topics"."featured_user1_id" AS t1_r10, "topics"."featured_user2_id" AS t1_r11, "topics"."featured_user3_id" AS t1_r12, "topics"."deleted_at" AS t1_r13, "topics"."highest_post_number" AS t1_r14, "topics"."like_count" AS t1_r15, "topics"."incoming_link_count" AS t1_r16, "topics"."category_id" AS t1_r17, "topics"."visible" AS t1_r18, "topics"."moderator_posts_count" AS t1_r19, "topics"."closed" AS t1_r20, "topics"."archived" AS t1_r21, "topics"."bumped_at" AS t1_r22, "topics"."has_summary" AS t1_r23, "topics"."archetype" AS t1_r24, "topics"."featured_user4_id" AS t1_r25, "topics"."notify_moderators_count" AS t1_r26, "topics"."spam_count" AS t1_r27, "topics"."pinned_at" AS t1_r28, "topics"."score" AS t1_r29, "topics"."percent_rank" AS t1_r30, "topics"."subtype" AS t1_r31, "topics"."slug" AS t1_r32, "topics"."deleted_by_id" AS t1_r33, "topics"."participant_count" AS t1_r34, "topics"."word_count" AS t1_r35, "topics"."excerpt" AS t1_r36, "topics"."pinned_globally" AS t1_r37, "topics"."pinned_until" AS t1_r38, "topics"."fancy_title" AS t1_r39, "topics"."highest_staff_post_number" AS t1_r40, "topics"."featured_link" AS t1_r41, "topics"."reviewable_score" AS t1_r42, "topics"."image_upload_id" AS t1_r43, "posts"."id" AS t2_r0, "posts"."user_id" AS t2_r1, "posts"."topic_id" AS t2_r2, "posts"."post_number" AS t2_r3, "posts"."raw" AS t2_r4, "posts"."cooked" AS t2_r5, "posts"."created_at" AS t2_r6, "posts"."updated_at" AS t2_r7, "posts"."reply_to_post_number" AS t2_r8, "posts"."reply_count" AS t2_r9, "posts"."quote_count" AS t2_r10, "posts"."deleted_at" AS t2_r11, "posts"."off_topic_count" AS t2_r12, "posts"."like_count" AS t2_r13, "posts"."incoming_link_count" AS t2_r14, "posts"."bookmark_count" AS t2_r15, "posts"."score" AS t2_r16, "posts"."reads" AS t2_r17, "posts"."post_type" AS t2_r18, "posts"."sort_order" AS t2_r19, "posts"."last_editor_id" AS t2_r20, "posts"."hidden" AS t2_r21, "posts"."hidden_reason_id" AS t2_r22, "posts"."notify_moderators_count" AS t2_r23, "posts"."spam_count" AS t2_r24, "posts"."illegal_count" AS t2_r25, "posts"."inappropriate_count" AS t2_r26, "posts"."last_version_at" AS t2_r27, "posts"."user_deleted" AS t2_r28, "posts"."reply_to_user_id" AS t2_r29, "posts"."percent_rank" AS t2_r30, "posts"."notify_user_count" AS t2_r31, "posts"."like_score" AS t2_r32, "posts"."deleted_by_id" AS t2_r33, "posts"."edit_reason" AS t2_r34, "posts"."word_count" AS t2_r35, "posts"."version" AS t2_r36, "posts"."cook_method" AS t2_r37, "posts"."wiki" AS t2_r38, "posts"."baked_at" AS t2_r39, "posts"."baked_version" AS t2_r40, "posts"."hidden_at" AS t2_r41, "posts"."self_edits" AS t2_r42, "posts"."reply_quoted" AS t2_r43, "posts"."via_email" AS t2_r44, "posts"."raw_email" AS t2_r45, "posts"."public_version" AS t2_r46, "posts"."action_code" AS t2_r47, "posts"."locked_by_id" AS t2_r48, "posts"."image_upload_id" AS t2_r49 FROM "topic_links" INNER JOIN "topics" ON "topics"."id" = "topic_links"."topic_id" AND ("topics"."deleted_at" IS NULL) INNER JOIN "posts" ON "posts"."id" = "topic_links"."post_id" AND ("posts"."deleted_at" IS NULL) WHERE (posts.post_type IN (1,2,3,4)) AND ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') AND "topics"."visible" = TRUE AND (topics.category_id IS NULL OR topics.category_id IN (SELECT id FROM categories WHERE NOT read_restricted OR id IN (3,4))) AND "topic_links"."user_id" = 7237 AND "topic_links"."internal" = FALSE AND "topic_links"."reflection" = FALSE AND "topic_links"."quote" = FALSE ORDER BY clicks DESC, topic_links.created_at DESC LIMIT 6

在加载用户页面时观察服务器,发现 PostgreSQL 进程在查询执行期间 CPU 使用率急剧上升。除为导入过程添加 MySQL 支持外,我未对安装配置进行任何其他修改。如有任何关于后续步骤的建议,将不胜感激!

7 个赞

在此附上供参考。我已为数据库添加了 4 个索引,性能得到了显著提升,不过我认为仍有优化空间。以下是我添加的内容:

CREATE INDEX index_topic_links_on_clicks_and_created ON public.topic_links USING btree (clicks, created_at);
CREATE INDEX index_posts_on_like_count_and_created ON public.posts USING btree (like_count, created_at);
CREATE INDEX index_topic_links_on_clicks_and_created_desc ON public.topic_links USING btree (clicks DESC, created_at DESC);
CREATE INDEX index_posts_on_like_count_and_created_desc ON public.posts USING btree (like_count DESC, created_at DESC, user_id) WHERE deleted_at IS NULL AND post_number > 1 AND (post_type = ANY ('{1,2,3,4}'::integer[]));
6 个赞

嗯,我们这里是不是漏掉了一些索引,@tgxworld

2 个赞

你好 @Ghan

非常感谢你提供的这些索引。我们有一段时间也遇到了同样的问题,但一直未能解决。
应用这些索引后,加载用户资料的速度有了显著改善。之前有些“最坏情况”下的资料加载需要长达 30 秒,现在只需约 5 秒即可显示。

再次感谢,希望还有更多优化空间可以进一步提升速度。

此致
Sascha

附:即使是 /about 页面,加载时间也长达约 20 秒。或许这个页面也可以进行优化。

1 个赞

再次询问。我们是否遗漏了这些索引 @tgxworld @sam

2 个赞

让我检查一下 Meta 中这些查询的执行计划。

在 Meta 上查询速度相当快,但我能理解它们在性能较弱的数据库上可能表现不佳。我们使用的是 ActiveRecord,代码相当整洁,但生成的 SQL 却相当混乱。

手动编写 SQL 后,我们可以让查询利用现有索引,从而大幅优化性能:

优化前

优化后

在优化后的查询中,我们仅检索用户的帖子,避免了因关联操作过晚而导致的帖子循环问题。

因此,我认为我们可以在不增加索引开销的情况下获得更好的性能。

7 个赞

重写后的版本可以在 ActiveRecord 代码中完成,还是您需要在其中传入手动查询字符串?

希望如此,因为当前的 Ruby 代码非常优美。但即使不是这样,我们在应用中也有一个非常简单且已广泛使用的 SQL 处理模式。

1 个赞

看起来用户摘要页面上运行的许多查询确实缺少索引。我们需要逐个优化这些查询。

@Falco 我看到你已经将自己分配给了这个任务,你仍然想负责吗?这里的棘手之处在于,这些查询大多未经测试,因此我们必须小心,以免破坏现有功能。

3 个赞

我添加了一个用于测试序列化器输出的草稿提交,这样我们就不用盲目操作了:

之后,我尝试重新应用我之前 在这里 所做的操作,但结果发现我添加了一个额外的子句,虽然让它变快了,但结果却是错误的。

由于我们所有这些数据(包括“最受喜欢”和“被谁喜欢”)都要经过 guardian 处理,这意味着信息始终非常准确且针对查看者量身定制,但也导致缓存无法帮助应对多个用户的情况。

我想我们 至少 可以为此端点添加一个更长的缓存?在我看来,1 小时的缓存就足够了,因为摘要中的信息并不经常变化。

4 个赞

听起来不错……就这么办!

1 个赞

我已将缓存推送到我的 PR 分支:

在进一步调试时,我注意到其中还存在一些 N+1 查询问题。缩略图和解决状态会触发主题列表的 N+1 查询。对此大家有什么建议吗?@david @tgxworld

2 个赞

已合并缓存相关工作。

仍在处理 N+1 问题。

7 个赞

用户摘要页面在主题列表中使用了哪个序列化器?对于完整页面搜索,我最近通过从搜索序列化器中移除 thumbnails 属性修复了 N+1 问题,因为该属性根本未被使用。

3 个赞

已在 PERF: Preload topic thumbnails for all topic lists by davidtaylorhq · Pull Request #11238 · discourse/discourse · GitHubPERF: Simplify topic serialization for user summary page by davidtaylorhq · Pull Request #11236 · discourse/discourse · GitHub 中修复。

4 个赞