Mailing list mode: "Upload" links broken (?) in e-mails

Hello,

I’m subscribed to the Rust forums, which use Discourse, and after first submitting this bug there, I was suggested to post it here:

On the Rust Discourse forums I have “Mailing list mode” enabled in my preferences, and also “Include previous replies at the bottom of emails”.

In the plain-text (non-HTML) version of such e-mails, the replies for a thread (e.g. Ideas for a Rayon logo - community - The Rust Programming Language Forum) have links to images of the form upload://... which means it’s not possible to view them.

In the HTML version (as well as in the web UI) they have the form https://discourse-cdn-sjc1.com/business5/uploads/... It would be nice to have these full URLs in the text version too.

Thanks

5 Likes

Yes, this is definitely a bug if we are leaking raw URLs into the text content of HTML emails.

2 Likes

This is a very interesting problem.

At the moment we simply ship the raw markdown to the plaintext section of emails.

We made that decision many years ago and were kind of aware of some of the cracks.

@eviltrout / @codinghorror how do you feel about changing the mailer so it does HTML => text and uses that for the text part of the MIME vs using Post raw?

We now have a very robust patterns for doing HTML => text living inside Discourse and could make a custom one here. It is a pretty big job though cause we would have to be ultra careful with quotes and oneboxes.

For example now text contents has stuff like:

[quote="codinghorror, post:2, topic:112354"]
this is definitely a bug
[/quote]

This would probably be a nicer way of rendering this in a text email.

> this is definitely a bug
URL-TO-QUOTE
3 Likes

Well, Markdown is designed to look good in raw text… it’s one of the design directives

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

Problem is we have nasty BBcode in there. I’d rather convert BBCode to text, personally, rather than playing a game of “telephone” where we go Markdown → HTML → text

We would need to design Markdown -> Markdown renderer. We could teach markdown.it to do this with a heavily modified pipeline, it is not trivial, probably a couple of weeks for a first shot at it. Even if we got uploads and bbcode hacks we have working we would still be left with inline HTML tags.

Playing “telephone” here is faster and more robust, cause we get to normalize the Markdown that way.

1 Like

No, we only need

BBCode → Plain Text

You don’t need to touch the existing Markdown, just convert BBCode

It is not that simple:

[quote="codinghorror, post:6, topic:112354"]
You don’t need to touch the existing Markdown, just convert BBCode
[/quote]

Is BBCODE

`[b]test[/b]` 

Is not BBCODE cause it is ` quoted

In the past we went down the regex hell of hack on top of hack to allow for all the edge cases and it we ended up with a nightmare pipeline.

The problem is there is no clean way of fishing out BBCODE and only converting it.

2 Likes

I’m not really following, playing telephone and going

Markdown → HTML → text

is nowhere near clean, as HTML doesn’t convert to text properly, there are zillions of edge cases in that, plus “telephone” adds multiple layers of errors in the conversion.

I still think it makes more sense to surgically convert BBCode to plain text, because the Markdown is guaranteed to look OK as plain text. It’s literally in the definition of the Markdown project:

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

The problem is the BBCode, so I think it makes the most sense to focus on the problem rather than going for 3 levels of conversions on everything, even Markdown that does not need to be converted to “plain text”.

Say I have a post with this Markdown:

I just tried typing `[b]bold[/b]` and I did not get bold.

If we blindly convert this we get:

I just tried typing `**bold**` and I did not get bold.

This is clearly wrong, so we add a regex, and then another regex for ``` to handle hoisting, and then for 3 space indents and finally have a regex tower high enough that it is both impossible to maintain and somehow works for all the crazy cases people hit in the while.

Right but isn’t that part of the normal Markdown pipeline? You have to do that anyway, so at the time the markdown pipeline is already deciding to hoist stuff out, hoist it all out, and then convert BBCode to Markdown.

Yes we can do this.

But it is pretty complex change, certainly not a 1 day change, I think my estimate of 2 weeks is probably about right here.

HTML->Markdown is a matter of hours of work especially since @vinothkannans already built this for our quote and cut-and-paste feature.

1 Like

At the moment we simply ship the raw markdown to the plaintext
section of emails.

Having markdown in the body of e-mails is nice (some mail clients may
prettify it), there are just a few kinks with the current format, as you
mention.

For example now text contents has stuff like:

[quote="codinghorror, post:2, topic:112354"]
this is definitely a bug
[/quote]

This would probably be a nicer way of rendering this in a text email.

> this is definitely a bug
URL-TO-QUOTE

This is reason why I wrote my “Discourse article” extension to Emacs’
Gnus mail client: the transformation of the Discourse quote format into
regular e-mail citation format.

And then I added other things, and realized that except for a handful of
transformations (including a header for the “Previous Replies” section
in mailing list mode and separators for replies therein) I could hand
off most of the rendering to markdown mode…

I would definitely welcome a more “canonical” markdown, if you will.

3 Likes

Something about this still does not sit right with me @sam – we are letting ugly BBCode style markup leak into the raw.

I guess I can see the HTML → text as a quick fix, but to me it does not address the underlying issue.

Perhaps we just don’t care that much about this, the plain text version of the email is a “best effort” thing so if we have @vinothkannans existing code to do it, that would be good enough?

I think it may be worth trying it out, on the upside this can lead to other fixed like… what happens when you quote a onebox and reply? what happens when you quote a poll and reply.

It is replacing one evil with a different evil, but it may be crazy enough to work… not sure.


About to quote :arrow_down:

Let’s do a little test here…

  • a thing
  • another thing

0 voters

A quote:

A Bold bold

A [b]Bold[/b] bold

A onebox:

A mini onebox: https://www.reddit.com/r/i3wm/comments/49pwag/changing_the_title_bar_font/

An emoji :slight_smile:

About to quote :arrow_down:

Let’s do a little test here…

A quote:

> so if we have @vinothkannans existing code to do it, that would be good enough?

A Bold bold

```
A [b]Bold[/b] bold
```

A onebox:

![|16x16](https://www.redditstatic.com/desktop2x/img/favicon/android-icon-192x192.png) [reddit](https://www.reddit.com/r/i3wm/comments/49pwag/changing_the_title_bar_font/)

### [r/i3wm - Changing the title bar font](https://www.reddit.com/r/i3wm/comments/49pwag/changing_the_title_bar_font/)

0 votes and 5 comments so far on Reddit

A mini onebox: [Changing the title bar font : i3wm](https://www.reddit.com/r/i3wm/comments/49pwag/changing_the_title_bar_font/)

An emoji :slight_smile:

We totally EAT polls, oneboxes are odd, so I am not sure how great this is as a solution.

Maybe we just have to pay the price and do this properly in a custom renderer.

1 Like

I still think “strip all bbcode” is NOT as hard as you’re making it out to be.

Most bbcode could just be stripped with no real loss of meaning, and only a handful would need conversion.

Because the Markdown is plain text compatible as-is …

I see improving plain text e-mails could prove quite complicated!

From my point of view, the readability of the plain text e-mails is already quite good and indeed, not much conversion is required. Here’s what I think could improve readability the most, in order of decreasing importance:

  • Links to images (and other assets?): currently unusable (see my
    original post) as upload:// links can’t be handled by mail
    clients.

  • Quotes: my naive conversion into regular e-mail citation improved
    readability a lot, so having the server do it would be great.

  • Oneboxes, as you mentioned – although I don’t think I’ve seen them
    often.

  • Polls: I don’t think polls would be very useful in plain text as there
    would be no means of voting. Showing the results when the poll is
    closed could be nice, but in any case I think a simple link to the
    Discourse post would do, polls being infrequent.

  • Previous Replies: if possible, this section could be in markdown form
    too (section heading, post separators…).

I understand this could be a lot of effort, and possibly not worth it if
there aren’t may users reading the e-mails in plain text – which I’m
guessing is the case.

I’d say that fixing the upload:// links would have the most impact on
usability, based on all the Discourse posts I’ve received by mail.
Could this be fixed independently of more involved pipeline changes?

(And thank you for having plain text in the first place :+1:t2:)

4 Likes

Thanks Damien,

I have filed this in my mind castle :european_castle: , and set a reminder for next week, we may have to just make some small hacky attempts here vs a fully 100% robust super implementation, will decide what path we are going to take here some time next week. At a minimum something for uploads cause that is by far the worst pain we have here.

5 Likes

This is still sitting in my list a few months later and yet another one of those small yet extremely annoying edge cases where we really want a reverse markdown parser.

The problem is that the raw for this image:

image

Is:

![image|65x79](upload://bkLTU01hzKturAl7WsxEXeHYk1y.png)

I guess the trivial thing that we can do here is that is a bit destructive, and will ruin stuff for code blocks, is substitute, like so:

upload://bkLTU01hzKturAl7WsxEXeHYk1y.png

with

https://d11a6trkgmumsb.cloudfront.net/original/3X/4/f/4f710992a5de8a73bfc0198ac9fa557cdfcc3380.png

If we use a straight regex gsub, this will clearly break all sorts of things for the 0.01% case but will improve the 99.9% case for the OP here.

(also we need to be careful not to have a rogue post issue 10,000 queries, so we need to do this in a batch of sorts)

@codinghorror does this sound good to you. It makes me feel a bit yuck but that is close to the best we can do now without a reverse parser.

cc @CvX / @zogstrip

3 Likes

That looks do-able and safe-ish enough.

@CvX mind adding that to your list? Before sending out an email, we need to massage the raw/markdown to change all upload:// urls into an absolute URL.

2 Likes