This is totally worth pursuing. But I am biased: I have been working on a similar thing a few years ago as a hobby project.
Goal was not only to be able to create a Jekyll site from curated content in Discourse, but to be able to publish it as an (epub or pdf) e-book as well (which is my answer to the ‘why not use page publishing’ question)
I followed pretty much the same approach with a yml file and an array of post URLs.
Images were a thing but I still have the python code lying around that gets all upload:// links, decodes them, downloads and resizes the images and changes the links to local image URLs.
I never finished the project but I would certainly be interested in picking this up again and to contribute the (few) things I already made.