Better email reply parsing ๐Ÿ“ง

There are many open bugs on meta related to incorrect email reply parsing and I have been looking into them since past few days.

Almost everytime the fix is to correct/modify something in GitHubโ€™s email_reply_parser library.

The GitHubโ€™s email_reply_parser library is now abandoned. The last commit was over a year ago, and there are many open issues / pull requests.

I propose that we include the email_reply_parser in core Discourse email library so that we can customize it as per our requirements, without depending on GitHubโ€™s library.

I have laid the groundwork for this, and in process fixed these bugs:

https://meta.discourse.org/t/html-email-signature-not-being-stripped-out-of-notification-reply/21351

Here is the PR for the same:

Looking forward to hear @teamโ€™s feedback/suggestions.

17 ืœื™ื™ืงื™ื

Wow I did not realize that library was only about 280 lines of code!

One downside with importing it this way is you are not importing their tests. If we regress on something they already have a test for, we are not going to know are we?

Maybe it makes more sense to fork it as discourse-email-parser, and add your new functionality and tests to the suite in that project?

12 ืœื™ื™ืงื™ื

I agree :100:%. Forking is a better idea.

11 ืœื™ื™ืงื™ื

Okay, I just published a new gem discourse_email_parser :incoming_envelope:

Here is the GitHub repo: https://github.com/discourse/discourse_email_parser

and RubyGems.org page: discourse_email_parser | RubyGems.org | your community gem host

Updated the PR to use discourse_email_parser instead of email_reply_parser gem.

7 ืœื™ื™ืงื™ื

This change is now live, @zogstrip just merged the PR! :tada:

5 ืœื™ื™ืงื™ื

It looks like the parser is not yet trying to detect the โ€œOn <date>, <name> wroteโ€-line in any non-english language:

https://github.com/discourse/discourse_email_parser/blob/dfc6031cfce718e4d0cadd9a52a72be0016e2c55/lib/discourse_email_parser.rb#L89

Is this correct? How can we contribute translations for this to at least cover the most common languages?

4 ืœื™ื™ืงื™ื

discourse_email_parser is now replaced by email_reply_trimmer

https://github.com/discourse/email_reply_trimmer

and it covers common languages :wink:

https://github.com/discourse/email_reply_trimmer/blob/c7f36f34afdf3b40b47ec89f01dc4a1a7b8eb194/lib/email_reply_trimmer/embedded_email_matcher.rb#L13-L26

6 ืœื™ื™ืงื™ื

That list might soon include German:

https://github.com/discourse/email_reply_trimmer/pull/1

6 ืœื™ื™ืงื™ื

This topic was automatically closed after 24 hours. New replies are no longer allowed.