Better email reply parsing ✉


(Arpit Jalan) #1

There are many open bugs on meta related to incorrect email reply parsing and I have been looking into them since past few days.

Almost everytime the fix is to correct/modify something in GitHub’s email_reply_parser library.

The GitHub’s email_reply_parser library is now abandoned. The last commit was over a year ago, and there are many open issues / pull requests.

I propose that we include the email_reply_parser in core Discourse email library so that we can customize it as per our requirements, without depending on GitHub’s library.

I have laid the groundwork for this, and in process fixed these bugs:

Here is the PR for the same:

Looking forward to hear @team’s feedback/suggestions.

Email parsing reply not correctly stripping `----Original message----` marker
Email-created reply including reply preamble
MOSS Roadmap - Mailing lists
Malformed bullet causes emailed in topic to be incorrectly truncated
HTML email signature not being stripped out of notification reply
(Robin Ward) #2

Wow I did not realize that library was only about 280 lines of code!

One downside with importing it this way is you are not importing their tests. If we regress on something they already have a test for, we are not going to know are we?

Maybe it makes more sense to fork it as discourse-email-parser, and add your new functionality and tests to the suite in that project?

(Régis Hanol) #3

I agree :100:%. Forking is a better idea.

(Arpit Jalan) #4

Okay, I just published a new gem discourse_email_parser :incoming_envelope:

Here is the GitHub repo: GitHub - discourse/discourse_email_parser: Small library to parse plain text email content

and page: discourse_email_parser | | your community gem host

Updated the PR to use discourse_email_parser instead of email_reply_parser gem.

(Arpit Jalan) #5

This change is now live, @zogstrip just merged the PR! :tada:

(Felix Freiberger) #6

It looks like the parser is not yet trying to detect the “On <date>, <name> wrote”-line in any non-english language:

Is this correct? How can we contribute translations for this to at least cover the most common languages?

(Arpit Jalan) #7

discourse_email_parser is now replaced by email_reply_trimmer

and it covers common languages :wink:

(Felix Freiberger) #8

That list might soon include German:

(Arpit Jalan) #9

This topic was automatically closed after 24 hours. New replies are no longer allowed.