URLs being dropped from Thunderbird-generated replies

bsoares · October 4, 2021, 2:32pm

Hmmm, in that example, and in the full email source you messaged me (thank you!) the moz-do-not-send attribute should not affect the display of the <img> or <a> tags that that attribute appears in. The mozfilter is only looking at values in the class attribute, and I can’t immediately see anywhere else that it might get filtered.

As such I can’t work out why the link-with-alias’s content and href get separated out, unless for some reason the discourse importer is suddenly deciding to use the plain/text part of the Mime encoded email (which does have the text and URL separated). Why it would do that I don’t know.

In your test discourse setup can you try importing/emailing a thunderbird HTML email with both a link-with-alias and, say an embedded image or something else that will mark it out as the HTML part of the email?

Flominator · October 4, 2021, 4:32pm

Thanks for trying.

Then the picture which in the mail in between the text (left) turns to be at the end in Discourse (right):

bsoares · October 5, 2021, 8:52am

I’m still thinking this might be discourse using the other mime parts (in this case a plain/text part followed by an image/… part of the email to create its markdown version, though why I don’t know. Perhaps an HTML validator is rejecting the text/html part because of strictly-non-validating attributes like moz-do-not-send!?
Could you do one more test, with the same post (some text with an image in the middle, but also make some (but not all) of the text bold, even just one work. I think that will determine if the text part is coming from a text/plain or text/html block.

And sorry to ask, but just to make sure, you have incoming_email_prefer_html set to true (checked)?!

Flominator · October 16, 2021, 10:00am

The email:
grafik

The post:

bsoares · October 25, 2021, 10:50am

Thanks @Flominator . I see that the emboldened text has become italicised, so it’s definitely not using the HTML directly, but it is picking up on the emphasis somehow. I wonder if the text/plain part of the email gets some kind of markup/down added – would you be able to PM me the email source like last time?

bsoares · October 27, 2021, 11:13am

Hi @Flominator , thanks for the raw email. Looking at the text/plain alternative part of the email does indeed put asterisks around the text that’s in bold in the text/html part. Most markdown renderers (such as the one in discourse) interpret this as italicised. Here’s the text/plain segment copied and pasted on its own:

Und nochmal soll ich für hier
https://meta.discourse.org/t/urls-being-dropped-from-thunderbird-generated-replies/163751/24
eine Testnachricht schicken.

Bild in die Mitte dann wieder Text von dem ein Teil sogar fett
geschrieben ist

Gruß

Flo

which looks identical to your screenshot.

So what I think is happening is that the text/html segment is being rejected as invalid HTML (probably down to the non-standard moz-do-not-send attribute name in the a tags). This will require the patch to change how valid HTML is checked (possibly just removing those attributes) and I’m less confident how stable that will be without it going into the core code. I’ll have a look when I get some time.

bsoares · April 26, 2022, 8:56am

Hi all following this topic,

I’ve just spotted (before updating) that a separate fix (along the same lines but more specific) for this issue has been committed:
issue: FIX: properly clean Thunderbird emails, don't remove links by ValdikSS · Pull Request #16543 · discourse/discourse · GitHub
commit: FIX: properly clean Thunderbird emails, don't remove links (#16543) · discourse/discourse@f7540aa · GitHub

This means that the patch attached in an above comment will fail (probably not “spectacularly” but it might then require a rebuild to get upgrades going again) when you upgrade to include this new commit (probably your next upgrade).
If you have it automatically being applied (e.g. with a git apply cmd in your app.yml as described above), you should remove that before your next upgrade. In fact a rebuild might be in order as that commit might fail to apply since the place in receiver.rb where it will want to apply the commit diff has already been changed by the patch.

I’m going to 1) remove the git apply cmd from app.yml, 2) rebuild app, 3) update (if it hasn’t already in the rebuild). I’ll let you know how that goes…

[10 minutes later…]

In the end I did the following instead because it doesn’t require any downtime during the rebuild.

remove the git apply for the patch from app.yml (only needs to be done before your next app container rebuild)
revert the patched file with:
i) launcher enter app
ii) (now in app container)
cd /var/www/discourse
git checkout ./lib/email/receiver.rb
exit

update discourse using the web admin update

ValdikSS · April 28, 2022, 2:27pm

I’m the author of this patch. It works great for me, and I found no drawbacks.

Topic		Replies	Views
Mailing list mode: "Upload" links broken (?) in e-mails Bug	19	1373	November 26, 2019
Text of forwarded emails don't show up in posts Feature	28	8484	November 9, 2019
Incoming emails trimmed despite trimming disabled Bug	5	567	July 18, 2023
Emails are not threaded in Outlook 2013 Bug	31	14427	January 9, 2015
Import_mbox.sh not working with e-mails from Samsung phone sent via a listserv server Support	8	741	May 9, 2022

URLs being dropped from Thunderbird-generated replies

Related topics