Disabling HTML entities in posts


(Matt Farmer) #1

Hi there,

I’m working on trying to migrate a google group to a Discourse forum and am having some issues with HTML being posted in emails. The mailing list is for a web framework, so folks posting HTML back and forth is pretty common. Since the emails we’re trying to import were coming from a system that didn’t support markdown, these HTML segments wouldn’t include code fences around them and there are too many to manually edit.

As a compromise, I’d like to disable HTML entities in posts completely. While searching the forum for this I saw @codinghorror mention that @eviltrout added a mode where this was possible, but I couldn’t find the magic setting in my admin panel.

Does anyone know how to flip Discourse into this mode?

Thanks!


(Jay Pfaffman) #2

Without thinking about it too hard, I think that the thing to do might be to encode the HTML tags before the import. It sounds like it might be safe enough to just convert all of the <s and >s into &lt;s and &gt;s.


(Matt Farmer) #3

The problem is these are mbox files so that transformation would have to try and distinguish between parts of the mbox and HTML. :confused:


(Jay Pfaffman) #4

You could also modify the import script to pass the body through a filter at that point.


(Matt Farmer) #5

That’s true. Although I’ll admit I’m also concerned about folks interacting with our community via email. Including HTML without a fence should, ideally, only render uncolored HTML. Anything else will likely cause frustration.


(Jay Pfaffman) #6

Oh. That’s a bigger problem. I have a BS in computer science and a PhD in education. I have learned that programming people is lots harder than programming computers.


(Robin Ward) #7

There’s no mode to disable HTML entities that I recall. There is a type of post you can mark as HTML (see the mbox importer for an example) and no cooking or processing will be done.


(Matt Farmer) #8

Got it. I don’t think that would be desirable either.

What I really need is something that, upon seeing raw HTML in a post, will escape the HTML before the Markdown processor converts Markdown to HTML. In poking around the code I noticed that there’s a tags whitelist of sorts, so I could see just disabling that as a possibility. Or, potentially, preprocessing the text before the markdown processor runs.

Are either of these doable in the form of a plugin?

Thanks!


(Sam Saffron) #9

Yeah the new markdown engine has a rule for dealing with HTML tags it could be replaced in a plugin, but what you would have at the end of the process is not CommonMark, it is some sort of mister hydra


(Matt Farmer) #10

Yeah, that’s unfortunate but for certain communities I think the trade off would be worth it. Thanks!