Email reply includes inline styles with "incoming email prefer html" site setting on

We have had the Site Setting “incoming email prefer html” turned on for the past year.

It’s been working great, but yesterday, an email reply from a user came in with a bunch of inline styles showing up:

a {word-wrap:normal;word-break:break-word;}.background-contain {background-size:contain;}@media only screen and (max-width:600px) {.container {-webkit-text-size-adjust:none !important;}.container,.palm-one-whole {width:100% !important;min-width:100% !important;}.palm-one-half {width:50% !important;min-width:50% !important;box-sizing:border-box;}blockquote .container,blockquote .container div,blockquote .container table {width:auto !important;min-width:0 !important;position:relative !important;}img {max-width:100%;}.border-outer,.border-middle,.border-inner,.inner,[title="separator"] {width:100% !important;}.innercell {padding:8px !important;}.palm-block {display:block !important;}td.palm-one-whole {display:inline-block !important;padding:0;}td.palm-one-whole:first-child:not(:only-child) {margin-bottom:16px;}td.hostname {padding-top:3px !important;}}@media only screen and (min-width:601px) {.preview-card {max-width:600px !important;}}@media only screen and (min-device-width :320px) and (max-device-width :568px),only screen and (min-device-width :768px) and (max-device-width :1024px),only screen and (max-device-width:640px),only screen and (max-device-width:667px),only screen and (max-width:480px){.container {width:100% !important;min-width:100% !important;}.p,.small,li,font[size="2"],font[size="3"] {font-size:1em !important;}}@media only screen and (min-device-width :320px) and (max-device-width :568px),only screen and (min-device-width :768px) and (max-device-width :1024px),only screen and (min-device-width :1224px) {.message-wrapper {padding-top:6px;}.apple-only[style] {display:block !important;max-height:none !important;line-height:normal !important;overflow:visible !important;height:auto !important;width:100% !important;position:relative !important;}.no-apple {display:none !important;}form {font-size:inherit;}input[type="text"] {height:43px;padding-left:4px !important;}button:hover {cursor:pointer;}}@media only screen and (min-device-width :1224px) {.apple-mail-form {display:block !important;background-color:white !important;}}* [office365] .outlook-com-hidden {display:none !important;}* [office365] .outlook-com-button {display:block;}* [office365] .outlook-com-only {display:block !important;max-height:none !important;line-height:normal !important;overflow:visible !important;height:auto !important;width:100% !important;position:relative !important;}.ExternalClass {width:100%;}.ExternalClass .outlook-com-button {display:block;}.ExternalClass button {height:auto;}.ExternalClass .outlook-com-hidden {display:none !important;}.ExternalClass .outlook-com-only {display:block !important;max-height:none !important;line-height:normal !important;overflow:visible !important;height:auto !important;width:100% !important;position:relative !important;}.ExternalClass .ecxlabels {display:none !important;}.ExternalClass .ecxlabels {display:none !important;}.ExternalClass .ecxarrow {display:none !important;}.ExternalClass cite >div + div {padding:0 0 4px 0;}.ExternalClass .h1 {padding-bottom:5px;}.ExternalClass .h2 {padding-bottom:5px;}.ExternalClass .h3 {padding-bottom:5px;}.ExternalClass [lang="brand-pinterest"] {width:280px !important;}Duly noted. Thanks for the tip.

But if I click the email icon on the post, and then look at the HTML tab, it looks fine.

Any ideas how to fix this? I have the raw email saved which I can PM if there is any interest from someone on the team in helping to take a look.

Now that we have a Html To Markdown converter, I was thinking of ditching that setting and always converting HTML to Markdown. This would fix a lot of issues with HTML.

Would you be on-board with this or do you need HTML?

5 Likes

The main reason we are using this is because of the funky line break issues referenced in the topic I linked to in the OP. Would prefer things as markdown if it works well enough.

Happy to PM you this email as an example to try that with if you’d like.

Update pretty pleased with the test results @zogstrip showed me.

2 Likes

Here’s to better :incoming_envelope:

https://github.com/discourse/discourse/commit/b76674f6406f94539e6b3060d60c010f2de99d19

4 Likes

I don’t think this response is very clear…

  1. We always prefer plain text to html

  2. If we cannot get the plain text we will use html because we were forced to

  3. If we are forced to use HTML we will (now, this is new) convert it to Markdown

Can you confirm?

If that’s the case then I think we should keep the site setting to prefer HTML so we can avoid he linebreak issue

My worry is switching the whole pipeline from plain text to HTML is a radical change this late in a release.

1 Like

The setting incoming_email_prefer_html has already existed for a year, this commit removes it.

I agree that this late in the game it probably does not make sense to switch everyone to the new converter, it is guaranteed to have edge cases, but stripping a “site setting” just cause we fixed the implementation of the “site setting” does not really make sense to me.

Instead if we feel this has too much risk just revert the commit for now. Or leave the setting and mark in (experimental)

I am referring to what @mcwumbly proposed, not what @zogstrip implemented.

1 Like

Not following, @zogstrip removed the site setting in the commit he linked. There is no longer any way to prefer html.

In particular, I am not following how fixing the parsing and dropping the setting are in any way related.

1 Like

We should decide on one or the other paths. As I recall there were technical problems using the HTML as the preferred source. We should either

  • resolve those problems and always prefer HTML

or

  • give up and always prefer plain text

I don’t like carrying around random secondary code paths hidden behind Booleans that we aren’t even testing, and we have no idea if they even work. Sextuply so for the wild world of email…

The code path already hits, all the setting does is makes it hit more, and makes it easier for us to transition to HTML preferred

For example I would recommend turning that setting on, on meta, so we live with the new HTML to markdown converter

As it turns out, it’s sometimes better to use the markdown generated from the HTML rather than using the text version of the email.

So, I restored the site setting and enabled it here on meta so that we can exercise it :wink:

https://github.com/discourse/discourse/commit/0ec15af970e53abc0503ef867f9400a245e57845

6 Likes

If the Html to Markdown converter works so well, might it also help better identifying the breakpoint from where to strip previous posts?

In a recent reply-by-email post where it completely failed to strip the previous post, I noticed that in the text part of email, the break point looked like this:

	username Full Name Title=20
September 25
Hi Name!

Whereas the html version looked like this:

  <div></div>

    <div style="margin-bottom:25px;">
  <table cellspacing="0" cellpadding="0" border="0">
    <tbody><tr>
      <td style="vertical-align:top;width:55px;">
        <img src="http://forum.mydomain.com/user_avatar/forum.mydomain.com/username/45/26_1.png" title="username" width="45" height="45">
      </td>
      <td>
        <a href="http://forum.mydomain.com/u/username" target="_blank" style="text-decoration: none; font-weight: bold; color: #006699;; font-size:13px;font-family:'lucida grande',tahoma,verdana,arial,sans-serif;color:#3b5998;text-decoration:none;font-weight:bold">username</a>
          <a href="http://forum.mydomain.com/u/username" target="_blank" style="text-decoration: none; font-weight: bold; color: #006699;; font-size:13px;font-family:'lucida grande',tahoma,verdana,arial,sans-serif;text-decoration:none;margin-left:7px;color: #3b5998;font-weight:normal;">Full Name</a>
          <span style="font-size:13px;font-family:'lucida grande',tahoma,verdana,arial,sans-serif;text-decoration:none;margin-left:7px;color: #999;">Title</span>
        <br>
        <span style="text-align:right;color:#999999;padding-right:5px;font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11px">September 25</span>
      </td>
    </tr>
  </tbody></table>
  <div style="padding-top:5px;">
<p style="margin-top:0; border: 0;">Hi Name</p>

In other words, I can see why discourse fails to identify the breakpoint based on the text version…

Has this been considered?

In any case, I had another reply-by-email post today that failed to be stripped of any of the replied-to post and when I wanted to do that manually, I couldn’t do it because the resulting post would have been under 20 characters. So it struck me whether the min character requirement might at some point affect the processing of incoming emails? In other words: if someone replies by emails saying just “yes”, how is that email going to be processed?