Disabling HTML entities in posts

farmdawgnation · July 8, 2017, 2:37pm

Hi there,

I’m working on trying to migrate a google group to a Discourse forum and am having some issues with HTML being posted in emails. The mailing list is for a web framework, so folks posting HTML back and forth is pretty common. Since the emails we’re trying to import were coming from a system that didn’t support markdown, these HTML segments wouldn’t include code fences around them and there are too many to manually edit.

As a compromise, I’d like to disable HTML entities in posts completely. While searching the forum for this I saw @codinghorror mention that @eviltrout added a mode where this was possible, but I couldn’t find the magic setting in my admin panel.

Does anyone know how to flip Discourse into this mode?

Thanks!

pfaffman · July 8, 2017, 3:22pm

Without thinking about it too hard, I think that the thing to do might be to encode the HTML tags before the import. It sounds like it might be safe enough to just convert all of the <s and >s into <s and >s.

farmdawgnation · July 8, 2017, 3:45pm

The problem is these are mbox files so that transformation would have to try and distinguish between parts of the mbox and HTML.

pfaffman · July 8, 2017, 3:52pm

You could also modify the import script to pass the body through a filter at that point.

farmdawgnation · July 8, 2017, 4:00pm

That’s true. Although I’ll admit I’m also concerned about folks interacting with our community via email. Including HTML without a fence should, ideally, only render uncolored HTML. Anything else will likely cause frustration.

pfaffman · July 8, 2017, 4:07pm

Oh. That’s a bigger problem. I have a BS in computer science and a PhD in education. I have learned that programming people is lots harder than programming computers.

eviltrout · July 9, 2017, 10:13pm

There’s no mode to disable HTML entities that I recall. There is a type of post you can mark as HTML (see the mbox importer for an example) and no cooking or processing will be done.

farmdawgnation · July 9, 2017, 10:34pm

Got it. I don’t think that would be desirable either.

What I really need is something that, upon seeing raw HTML in a post, will escape the HTML before the Markdown processor converts Markdown to HTML. In poking around the code I noticed that there’s a tags whitelist of sorts, so I could see just disabling that as a possibility. Or, potentially, preprocessing the text before the markdown processor runs.

Are either of these doable in the form of a plugin?

Thanks!

sam · July 9, 2017, 10:52pm

Yeah the new markdown engine has a rule for dealing with HTML tags it could be replaced in a plugin, but what you would have at the end of the process is not CommonMark, it is some sort of mister hydra

farmdawgnation · July 9, 2017, 10:54pm

Yeah, that’s unfortunate but for certain communities I think the trade off would be worth it. Thanks!

JammyDodger · June 8, 2024, 12:44pm

This topic was automatically closed after 2526 days. New replies are no longer allowed.

Topic		Replies	Views
HTML Entities and escape characters in text/plain outgoing email alternative Bug	7	658	January 14, 2024
Wordpress plugin and html-as-text (especially for mail) WordPress	7	823	October 5, 2022
How do I disable Markdown completely? Support	20	2506	August 17, 2016
Incoming HTML Email Inconsistent Support	7	985	April 22, 2023
Idea: Option to always linkify untagged URLs in imported HTML Feature	3	349	January 23, 2023

Disabling HTML entities in posts

Related topics