This issue has cropped up again
Just thinking out-loud here, but I wonder if we can elide the tricky problem here (i.e. the conversion of HTML to markdown). To recap (just to help think this through)
-
Discourse supports the importation of HTML for the creation of post content (e.g. HTML from WP Discourse).
-
In some contexts the user expects the integrity of the original HTML to be retained exactly.
-
“integrity” here has at least two aspects:
- How the content is rendered, e.g. linebreaks
- Where media is hosted, e.g. downloading images to local to avoid broken images, or potentially for security concerns
-
The conversion of HTML to markdown potentially creates issues for the first type of integrity, however it is currently necessary to ensure the second type of integrity.
So perhaps one way to address this issue for certain imported posts would be for the imported HTML to be stored directly as the cooked post content, and the pull_hotlinked_images
job would support downloading images in such content without converting img
to markdown.
Yes, put more simply, perhaps the code could support downloading hotlinked images without requiring a conversion of the img
to markdown. For such posts you would interpolate the downloaded image url in the cooked content instead of the raw.