Better email reply parsing 📧

There are many open bugs on meta related to incorrect email reply parsing and I have been looking into them since past few days.

Almost everytime the fix is to correct/modify something in GitHub’s email_reply_parser library.

The GitHub’s email_reply_parser library is now abandoned. The last commit was over a year ago, and there are many open issues / pull requests.

I propose that we include the email_reply_parser in core Discourse email library so that we can customize it as per our requirements, without depending on GitHub’s library.

I have laid the groundwork for this, and in process fixed these bugs:

https://meta.discourse.org/t/html-email-signature-not-being-stripped-out-of-notification-reply/21351

Here is the PR for the same:

Looking forward to hear @team’s feedback/suggestions.

17 إعجابًا

Wow I did not realize that library was only about 280 lines of code!

One downside with importing it this way is you are not importing their tests. If we regress on something they already have a test for, we are not going to know are we?

Maybe it makes more sense to fork it as discourse-email-parser, and add your new functionality and tests to the suite in that project?

12 إعجابًا

I agree :100:%. Forking is a better idea.

11 إعجابًا

Okay, I just published a new gem discourse_email_parser :incoming_envelope:

Here is the GitHub repo: https://github.com/discourse/discourse_email_parser

and RubyGems.org page: discourse_email_parser | RubyGems.org | your community gem host

Updated the PR to use discourse_email_parser instead of email_reply_parser gem.

7 إعجابات

This change is now live, @zogstrip just merged the PR! :tada:

5 إعجابات

It looks like the parser is not yet trying to detect the “On <date>, <name> wrote”-line in any non-english language:

https://github.com/discourse/discourse_email_parser/blob/dfc6031cfce718e4d0cadd9a52a72be0016e2c55/lib/discourse_email_parser.rb#L89

Is this correct? How can we contribute translations for this to at least cover the most common languages?

4 إعجابات

discourse_email_parser is now replaced by email_reply_trimmer

https://github.com/discourse/email_reply_trimmer

and it covers common languages :wink:

https://github.com/discourse/email_reply_trimmer/blob/c7f36f34afdf3b40b47ec89f01dc4a1a7b8eb194/lib/email_reply_trimmer/embedded_email_matcher.rb#L13-L26

6 إعجابات

That list might soon include German:

https://github.com/discourse/email_reply_trimmer/pull/1

6 إعجابات

This topic was automatically closed after 24 hours. New replies are no longer allowed.