Better email reply parsing ✉

There are many open bugs on meta related to incorrect email reply parsing and I have been looking into them since past few days.

Almost everytime the fix is to correct/modify something in GitHub’s email_reply_parser library.

The GitHub’s email_reply_parser library is now abandoned. The last commit was over a year ago, and there are many open issues / pull requests.

I propose that we include the email_reply_parser in core Discourse email library so that we can customize it as per our requirements, without depending on GitHub’s library.

I have laid the groundwork for this, and in process fixed these bugs:

https://meta.discourse.org/t/html-email-signature-not-being-stripped-out-of-notification-reply/21351

Here is the PR for the same:

Looking forward to hear @team’s feedback/suggestions.

17 Likes

Wow I did not realize that library was only about 280 lines of code!

One downside with importing it this way is you are not importing their tests. If we regress on something they already have a test for, we are not going to know are we?

Maybe it makes more sense to fork it as discourse-email-parser, and add your new functionality and tests to the suite in that project?

12 Likes

I agree :100:%. Forking is a better idea.

11 Likes

Okay, I just published a new gem discourse_email_parser :incoming_envelope:

Here is the GitHub repo: https://github.com/discourse/discourse_email_parser

and RubyGems.org page: discourse_email_parser | RubyGems.org | your community gem host

Updated the PR to use discourse_email_parser instead of email_reply_parser gem.

7 Likes

This change is now live, @zogstrip just merged the PR! :tada:

5 Likes

It looks like the parser is not yet trying to detect the “On <date>, <name> wrote”-line in any non-english language:

https://github.com/discourse/discourse_email_parser/blob/dfc6031cfce718e4d0cadd9a52a72be0016e2c55/lib/discourse_email_parser.rb#L89

Is this correct? How can we contribute translations for this to at least cover the most common languages?

4 Likes

discourse_email_parser is now replaced by email_reply_trimmer

https://github.com/discourse/email_reply_trimmer

and it covers common languages :wink:

https://github.com/discourse/email_reply_trimmer/blob/c7f36f34afdf3b40b47ec89f01dc4a1a7b8eb194/lib/email_reply_trimmer/embedded_email_matcher.rb#L13-L26

6 Likes

That list might soon include German:

https://github.com/discourse/email_reply_trimmer/pull/1

6 Likes

This topic was automatically closed after 24 hours. New replies are no longer allowed.