My journey into a massive posts rebake job

This is now done via:

We no longer carry post ids in memory and the rebake task can be resumed by running posts:rebake_uncooked_posts.

One caveat here is that the resume task will not rebake posts in reverse order (i.e. the sort order will be id ascending).

6 Likes

So @techAPJ, if I need to trigger a rebake of every post on a Discourse install, is @pfaffman’s method the proper one to use?

If you need to rebake all posts instantly then run bundle exec rake posts:rebake.

Post.update_all("baked_version = NULL") will rebake 100 posts (by default) every 15 minutes.

4 Likes

Thanks, Arpit.

FYI, I encountered some performance issues with that approach, so I went with this, which alleviated the problem and resulted in the same outcome:

Post.in_batches.update_all('baked_version = NULL')

6 Likes

@techAPJ I have a dummy question. Where do you run this command? After entering the app?

It tells me

bash: syntax error near unexpected token ''baked_version = NULL''

./launcher enter app
rails c
Post.in_batches.update_all('baked_version = NULL')
6 Likes

Would the batch method be suitable for a large amount of rebakes?

2851000 / 27182220 ( 10.5%)

This our current process after starting it yesterday with the normal rebake command, it seems to tick about 1000 every 3 seconds. We are very close to the end of our import journey and testing, and I just wanted to make sure there was a more proper way to rebake a large site before we settled on this slower method.

1 Like

Can anyone explain how this in_batches version works. Presumably it does the re-bake in batches, but from the posts above, it is stated that by default it does rebake in batches of 100 every 15 minutes by default.

I have a 2 million re-bake job to do and trying to figure out the best way to do this. The job has no urgency, but I want to make sure that normal operation and administrative operations (such as backup) are not impacted by a long running job.

And now I just read this post: Rebaked all my posts, but what's it doing now? which tells me the re-bake task isn’t even re-baking them but just marking them for re-baking (how is this mark done?). The process is so slow I’m really struggling to believe it takes so long just to mark a post for re-baking.

So migrate to a faster server.

Be thankful it doesn’t overwhelm your site. The whole point is to prevent this process from taking too many resources, keeping your site responsive during the process.

Consulting the source is always a good idea:

2 Likes

Indeed, marking should be very quick. And the rebake_post does seem to do the call the cooking. Maybe there are some async tasks that happen as part of this or as a result of this?

Yes, of course, it’s a job spawning a set of jobs

Not the ideal solution, but I found another way!

I just wrote my own re-baker that is 1000x faster so instead of taking a month, it takes just a few minutes.

I’ll actually re-bake just before the database insertion so the rebake cost will disappear within the db insertion time.

1 Like

ah, ok was not aware of your context.

yes, this is written for the Production case.

1 Like

Out of curosity, can you share what you did?

I wrote a program to scan all the imported posts to find what mark-ups/smileys they contained. Then wrote another program to bake the raw posts into HTML and update the database directly.