Thanks Jay! Appreciate the encouragement.
Ugh, I’d prefer to not think about that. It was probably upwards of 15 or 20 hours after you put me on the right path with the SQL query.
I’d like to pick your brain on this if you have any thoughts:
It took around 70 hours to do a complete trial run with production data on a very powerful VPS. I’d like to get my users interacting again ASAP even if the posts and PMs import is still incomplete. Or another alternative idea I thought about would be to disable the preprocess_posts
function, which I also heavily modified with additional gsub
regexp replacements and also to pass all the posts and PMs though Pandoc with one of two different commands depending on whether the original post was Textile markup or pure HTML. If I disable the entire preprocess_posts
routine it would probably cut the import time almost in half, and then I could add all that formatting stuff into the postprocess_posts
section once all the raw data is imported. But the downside is that after the fact I wouldn’t be able to easily access the original database column that shows the source format (Textile or HTML) for each post, which is a conditional for my Pandoc manipulation. Or could I add a custom field to each post labeling it as textile
or html
and then retrieve that later during post-processing? Dunno, just thinking out loud here.