How to spread out import over longer timer to prevent running into rate limits of external services

Backstory: Markdown image embed is not rendered correctly

I already have a custom import script written in ruby. It works and adds all topics, however it is too fast. This leads to the server fetching a lot of images from external services like imgur when baking the posts.

What would be the best way to slow this down? I mean the trivial way would be to simply let the import process sleep a bit between posts. But that would lead to a potentially long running script (given the amount of images and the imgur rate limit, approximately 3 days). Is there any other way, like importing all topics at once but telling discourse to only bake a specific number of posts per minute? That way the topics are created immediately but the html is built asynchronously.

You can just rebake the posts later. Every few days you can rebake the bad posts.

Is there a way to find the bad posts, so that I can specifically rebuild those? I know I can rebake all via rake. But that will then start with the same posts always and run into the rate limit before reaching the bad ones.

On further testing, I really need to prevent running into the rate limit in the first place. With the amount of images embedded, it seems like the IP ends up being blacklisted at least for a couple of days (it was 4 days since I baked on that machine, and it’s still blacklisted). So I can’t just rebake the posts later.

Any idea on how to spread out the baking over a longer timeframe in the first place?