我在一个大型帖子重建工作中的旅程

This is now done via:

We no longer carry post ids in memory and the rebake task can be resumed by running posts:rebake_uncooked_posts.

One caveat here is that the resume task will not rebake posts in reverse order (i.e. the sort order will be id ascending).

6 个赞

So @techAPJ, if I need to trigger a rebake of every post on a Discourse install, is @pfaffman’s method the proper one to use?

If you need to rebake all posts instantly then run bundle exec rake posts:rebake.

Post.update_all("baked_version = NULL") will rebake 100 posts (by default) every 15 minutes.

4 个赞

Thanks, Arpit.

FYI, I encountered some performance issues with that approach, so I went with this, which alleviated the problem and resulted in the same outcome:

Post.in_batches.update_all('baked_version = NULL')

6 个赞

@techAPJ I have a dummy question. Where do you run this command? After entering the app?

It tells me

bash: syntax error near unexpected token ''baked_version = NULL''

./launcher enter app
rails c
Post.in_batches.update_all('baked_version = NULL')
6 个赞

Would the batch method be suitable for a large amount of rebakes?

2851000 / 27182220 ( 10.5%)

This our current process after starting it yesterday with the normal rebake command, it seems to tick about 1000 every 3 seconds. We are very close to the end of our import journey and testing, and I just wanted to make sure there was a more proper way to rebake a large site before we settled on this slower method.

1 个赞

有人能解释一下这个 in_batches 版本是如何工作的吗?它大概是以批次进行重新烘焙,但根据上面的帖子,它默认每 15 分钟以 100 个为一批进行重新烘焙。

我有一个 200 万的重新烘焙工作要做,并试图找出最好的方法来完成它。这项工作并不紧急,但我想确保正常运行和管理操作(如备份)不受长时间运行的工作的影响。

我现在读到这篇帖子:https://meta.discourse.org/t/rebaked-all-my-posts-but-whats-it-doing-now/179782,它告诉我重新烘焙任务并没有真正重新烘焙它们,只是将它们标记为待重新烘焙(这个标记是如何完成的?)。这个过程非常缓慢,我真的很难相信仅仅标记一个帖子以待重新烘焙需要这么长时间。

那就迁移到更快的服务器。

要庆幸它没有压垮你的网站。整个过程的目的是防止这个过程消耗过多的资源,在过程中保持你的网站响应。

查阅源代码总是一个好主意:

2 个赞

确实,标记应该很快。rebake_post 似乎确实调用了 cooking。也许这其中或由此产生了一些异步任务?

是的,当然,这是一个生成一组作业的作业

这不是理想的解决方案,但我找到了另一种方法!

我写了一个自己的 re-baker,速度快了 1000 倍,所以它不再需要一个月,只需要几分钟。

我实际上会在数据库插入之前进行 re-bake,这样 re-bake 的成本就会在数据库插入时间内消失。

1 个赞

啊,好的,我不知道你的背景。

是的,这是为生产案例编写的。

1 个赞

出于好奇,你能分享一下你做了什么吗?

我编写了一个程序来扫描所有导入的帖子,以查找它们包含的标记/表情符号。然后,我编写了另一个程序将原始帖子烘焙成 HTML 并直接更新数据库。