What is the problem with PrettyText.markdown call?

PrettyText.markdown("❤️❤️❤️", {}) 

Will not generate the same code as:

PrettyText.markdown(":heart::heart::heart:", {})

It generates:

=> "<p><img src=\"/images/emoji/apple/heart.png?v=5\" title=\":heart:\" class=\"emoji\" alt=\":heart:\">️:heart:️:heart:️</p>" 
( there are ':️' & ':', copy paste to https://www.soscisurvey.de/tools/view-chars.php to see it)

I think it has something to do with

replacement = "\u200b" + replacement;

In lib/pretty_text/shims.js

( on 1.9.4 )

1 Like

The control code insertion isn’t happening on latest:

[1] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[2] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

although I do note it behaves differently:

[1] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[2] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>
❤️❤️❤️

:heart::heart::heart:

generates:

:heart::heart::heart:

:heart::heart::heart:

1 Like

FFEF is not an assigned Unicode character, so I wonder what’s putting it in. (note: it’s not the BOM, that’s FEFF.)

I think it’s locale character encoding involved.

I don’t know what it means, but \u+ffb8 is “Halfwidth Hangul Letter Cieuc”

1 Like

That was my initial hunch, a big endian - little endian paste thing. (been there, done that)

If the leading colon isn’t the “beginning of a ‘word’” Discourse adds a “hair space” - the \u200b.

https://github.com/discourse/discourse/blob/master/lib/pretty_text/shims.js#L12-L25

Looking at bytes, if it is endian related, I don’t see how.

\u200b
01011100 01110101 00110010 00110000 00110000 01100010

\uffef
01011100 01110101 01100110 01100110 01100101 01100110

\uffb8
01011100 01110101 01100110 01100110 01100010 00111000

\uffe2
01011100 01110101 01100110 01100110 01100101 00110010

So I think it must be the regex’s interpretation of what it considers to be a “word”, i.e. locale related.

2 Likes

This is what I see on the console:

$ echo ":heart:️:heart:️" | hexdump
0000000 3a 68 65 61 72 74 3a ef b8 8f 3a 68 65 61 72 74
0000010 3a ef b8 8f 0a                                 
0000015

well, if I type it, it will be:
$ echo ":heart::heart:" | hexdump \r\n 0000000 3a 68 65 61 72 74 3a 3a 68 65 61 72 74 3a 0a \r\n 000000f

( I have problem to format this post )

The first one with invisible characters is 5 char longer and it disabled the following emoji to url escape.

Maybe OS’s clipboard or terminal changed something, but the problem we want to fix is why unicode hearts cannot be escaped to urls.

What is the best way to debug it? console.log doesn’t work on the server side js.

Please satisfy my curiosity and let me know what the locale is.

I’m guessing it’s something where the typical western concept of what constitutes a “word” doesn’t apply. But if I’m completely off-base it’s the wrong rabbit hole to go down into.

it is from Bitnami Discourse Stack for Virtual Machines

So en_US.

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
1 Like

Thanks. “Bitnami install” is a completely different rabbit hole :rabbit2:

Not that the cause is because of the Bitnami install, but I have seen many posts here dealing with problems it had. So I’m leaning towards thinking it is more a Bitnami thing rather than Discourse itself. Any way to reach out to others with Bitnami installs to see if they have the same issue?

1 Like

@Mittineague Can you reproduce the problem?

Because @supermathie did reproduce it. I feel he is not on bitnami.

1 Like

Yes, 1-2 are supermathie’s results. 5,6,7,8 are mine

[1] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[2] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[5] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[6] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[1] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[2] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[7] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[8] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

@joffreyjaffeux is there some internal bug in the emoji remapper?

1 Like

Uhhh what’s with the two different colons here?

30%20PM

% echo -n :<fe0f>: | xxd
00000000: 3aef b88f 3a                              :...:

There’s a U+FEOF VARIATION SELECTOR 16, also known as “force emoji display” that’s not being picked up by the remapper.

1 Like

Your acuity is much better than mine. I still can’t discern a difference even when I look for it.

As to where they came from, I simply copied supermathie’s example puts into my console to see if I could reproduce the results.

Should we move this to the bug category? Or I should create a new topic in the bug category?

There is probably a minor issue with the emoji -> image convertor but I am struggling here with the real world impact of this.

Can you provide an example in a post of where this is an actual issue short of hexedit showing something off?

2 Likes

This is from an iPad. But I think it applies to many mobile users.

If I just :heart: Discourse, it is ok.

But if we :heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart: Discourse it is a problem, is it?

1 Like

Mobile view, this is another problem.

The real world problem is when people are excited, they use a lot of emojis, and this issue breaks their hearts.