techAPJ
(Arpit Jalan)
9 ืืืฆืืืจ,โ 2015,โ 6:49pm
1
There are many open bugs on meta related to incorrect email reply parsing and I have been looking into them since past few days.
Almost everytime the fix is to correct/modify something in GitHubโs email_reply_parser library.
The GitHubโs email_reply_parser library is now abandoned. The last commit was over a year ago, and there are many open issues / pull requests.
I propose that we include the email_reply_parser in core Discourse email library so that we can customize it as per our requirements, without depending on GitHubโs library.
I have laid the groundwork for this, and in process fixed these bugs:
Email parsing reply not correctly handling ----Original message---- marker.
Here is a modified example (to protect privacy):
[image]
I have two examples of this now with the same user.
Iโm willing to send via PM the raw original email replies to a one of the Discourse Team to help resolve this.
Cheers,
Dean.
https://meta.discourse.org/t/html-email-signature-not-being-stripped-out-of-notification-reply/21351
Reproduce:
Reply using Gmail to a post notification.
Get a separate rejection error because your reply is less than the default 20 character minimum.
Reply to your first attempt to reply (which still is addressed to Discourse) with a longer response.
Expected:
Only the response content appears.
Actual:
Your email address is exposed.
[image]
Pastebin of original message content available to those who request.
I know that a line with two dashes truncates everything below it.
--
But it turns out that a line with a single dash immediately followed by text (no space) will also truncate everything below it. See screenshots.
[image]
[image]
Here is the PR for the same:
master โ arpitjalan:better-email-parsing
merged 01:51PM - 11 Dec 15 UTC
Meta topic for discussion: https://meta.discourse.org/t/better-email-reply-parsiโฆ ng-email/36495
Looking forward to hear @team โs feedback/suggestions.
17 ืืืืงืื
eviltrout
(Robin Ward)
9 ืืืฆืืืจ,โ 2015,โ 8:41pm
2
Wow I did not realize that library was only about 280 lines of code!
One downside with importing it this way is you are not importing their tests. If we regress on something they already have a test for, we are not going to know are we?
Maybe it makes more sense to fork it as discourse-email-parser, and add your new functionality and tests to the suite in that project?
12 ืืืืงืื
zogstrip
9 ืืืฆืืืจ,โ 2015,โ 9:08pm
3
I agree %. Forking is a better idea.
11 ืืืืงืื
techAPJ
(Arpit Jalan)
10 ืืืฆืืืจ,โ 2015,โ 7:45pm
4
Okay, I just published a new gem discourse_email_parser
Here is the GitHub repo: GitHub - discourse/discourse_email_parser: Small library to parse plain text email content
and RubyGems.org page: discourse_email_parser | RubyGems.org | your community gem host
Updated the PR to use discourse_email_parser instead of email_reply_parser gem.
7 ืืืืงืื
techAPJ
(Arpit Jalan)
11 ืืืฆืืืจ,โ 2015,โ 2:31pm
5
This change is now live, @zogstrip just merged the PR!
5 ืืืืงืื
fefrei
(Felix Freiberger)
4 ืืคืืจืืืจ,โ 2016,โ 9:41am
6
It looks like the parser is not yet trying to detect the โOn <date>, <name> wroteโ-line in any non-english language:
# Returns this same Email instance.
def read(text)
# in 1.9 we want to operate on the raw bytes
text = text.dup.force_encoding('binary') if text.respond_to?(:force_encoding)
# Normalize line endings.
text.gsub!("\r\n", "\n")
# Check for multi-line reply headers. Some clients break up
# the "On DATE, NAME <EMAIL> wrote:" line into multiple lines.
if text =~ /^(?!On.*On\s.+?wrote:)(On\s(.+?)wrote:)$/nm
# Remove all new lines from the reply header.
text.gsub! $1, $1.gsub("\n", " ")
end
# Check for "---- Original Message ----"
# and strip email content after that part
if text =~ /^([\s_-]+Original (?i)message?[\s_-]+$.*)/nm
text.gsub!($1, "")
end
Is this correct? How can we contribute translations for this to at least cover the most common languages?
4 ืืืืงืื
techAPJ
(Arpit Jalan)
4 ืืคืืจืืืจ,โ 2016,โ 9:52am
7
discourse_email_parser is now replaced by email_reply_trimmer
Library to trim replies from plain text email.
and it covers common languages
ON_DATE_SOMEONE_WROTE_MARKERS = [
# Dutch
["Op","het volgende geschreven"],
# English
["On", "wrote"],
# French
["Le", "a รฉcrit "],
# Polish
["Dnia", "napisaล\\(a\\)"],
# Portuguese
["Em", "escreveu"],
# Spanish
["El", "escribiรณ"],
]
6 ืืืืงืื
fefrei
(Felix Freiberger)
4 ืืคืืจืืืจ,โ 2016,โ 9:58am
8
That list might soon include German:
master โ fefrei:patch-1
merged 10:34AM - 04 Feb 16 UTC
I haven't tested this, but the comment is copied from a real email (with redacteโฆ d details).
6 ืืืืงืื
techAPJ
(Arpit Jalan)
ื ืกืืจ ืึพ
5 ืืคืืจืืืจ,โ 2016,โ 10:06am
9
This topic was automatically closed after 24 hours. New replies are no longer allowed.