What is the problem with PrettyText.markdown call?


(Ballistic Tire) #1
PrettyText.markdown("❤️❤️❤️", {}) 

Will not generate the same code as:

PrettyText.markdown(":heart::heart::heart:", {})

It generates:

=> "<p><img src=\"/images/emoji/apple/heart.png?v=5\" title=\":heart:\" class=\"emoji\" alt=\":heart:\">️:heart:️:heart:️</p>" 
( there are ':️' & ':', copy paste to https://www.soscisurvey.de/tools/view-chars.php to see it)

I think it has something to do with

replacement = "\u200b" + replacement;

In lib/pretty_text/shims.js

( on 1.9.4 )


Emojis selected on iOS displaying additional rectangles
Emojis selected on iOS displaying additional rectangles
(Michael Brown) #2

The control code insertion isn’t happening on latest:

[1] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[2] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

although I do note it behaves differently:

[1] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[2] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>
❤️❤️❤️

:heart::heart::heart:

generates:

:heart:️:heart:️:heart:️

:heart::heart::heart:


(Kane York) #3

FFEF is not an assigned Unicode character, so I wonder what’s putting it in. (note: it’s not the BOM, that’s FEFF.)


(Mittineague) #4

I think it’s locale character encoding involved.

I don’t know what it means, but \u+ffb8 is “Halfwidth Hangul Letter Cieuc”


(Mittineague) #7

That was my initial hunch, a big endian - little endian paste thing. (been there, done that)

If the leading colon isn’t the “beginning of a ‘word’” Discourse adds a “hair space” - the \u200b.

Looking at bytes, if it is endian related, I don’t see how.

\u200b
01011100 01110101 00110010 00110000 00110000 01100010

\uffef
01011100 01110101 01100110 01100110 01100101 01100110

\uffb8
01011100 01110101 01100110 01100110 01100010 00111000

\uffe2
01011100 01110101 01100110 01100110 01100101 00110010

So I think it must be the regex’s interpretation of what it considers to be a “word”, i.e. locale related.


(Ballistic Tire) #8

This is what I see on the console:

$ echo ":heart:️:heart:️" | hexdump
0000000 3a 68 65 61 72 74 3a ef b8 8f 3a 68 65 61 72 74
0000010 3a ef b8 8f 0a                                 
0000015

well, if I type it, it will be:
$ echo ":heart::heart:" | hexdump \r\n 0000000 3a 68 65 61 72 74 3a 3a 68 65 61 72 74 3a 0a \r\n 000000f

( I have problem to format this post )

The first one with invisible characters is 5 char longer and it disabled the following emoji to url escape.

Maybe OS’s clipboard or terminal changed something, but the problem we want to fix is why unicode hearts cannot be escaped to urls.


(Ballistic Tire) #9

What is the best way to debug it? console.log doesn’t work on the server side js.


(Mittineague) #10

Please satisfy my curiosity and let me know what the locale is.

I’m guessing it’s something where the typical western concept of what constitutes a “word” doesn’t apply. But if I’m completely off-base it’s the wrong rabbit hole to go down into.


(Ballistic Tire) #11

it is from Bitnami Discourse Virtual Machine

So en_US.

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

(Mittineague) #12

Thanks. “Bitnami install” is a completely different rabbit hole :rabbit2:

Not that the cause is because of the Bitnami install, but I have seen many posts here dealing with problems it had. So I’m leaning towards thinking it is more a Bitnami thing rather than Discourse itself. Any way to reach out to others with Bitnami installs to see if they have the same issue?


(Ballistic Tire) #13

@Mittineague Can you reproduce the problem?

Because @supermathie did reproduce it. I feel he is not on bitnami.


(Mittineague) #14

Yes, 1-2 are supermathie’s results. 5,6,7,8 are mine

[1] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[2] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[5] pry(main)> puts PrettyText.markdown("❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️</p>
[6] pry(main)> puts PrettyText.markdown(":heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[1] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[2] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

[7] pry(main)> puts PrettyText.markdown("❤️❤️❤️", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:">️:heart:️:heart:️</p>
[8] pry(main)> puts PrettyText.markdown(":heart::heart::heart:", {});
<p><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"><img src="/images/emoji/twitter/heart.png?v=5" title=":heart:" class="emoji" alt=":heart:"></p>

(Sam Saffron) #15

@joffreyjaffeux is there some internal bug in the emoji remapper?


(Kane York) #16

Uhhh what’s with the two different colons here?

30%20PM

% echo -n :<fe0f>: | xxd
00000000: 3aef b88f 3a                              :...:

There’s a U+FEOF VARIATION SELECTOR 16, also known as “force emoji display” that’s not being picked up by the remapper.


(Mittineague) #17

Your acuity is much better than mine. I still can’t discern a difference even when I look for it.

As to where they came from, I simply copied supermathie’s example puts into my console to see if I could reproduce the results.


(Ballistic Tire) #18

Should we move this to the bug category? Or I should create a new topic in the bug category?


(Sam Saffron) #19

There is probably a minor issue with the emoji -> image convertor but I am struggling here with the real world impact of this.

Can you provide an example in a post of where this is an actual issue short of hexedit showing something off?


(Ballistic Tire) #20

This is from an iPad. But I think it applies to many mobile users.

If I just :heart:️ Discourse, it is ok.

But if we :heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️:heart:️ Discourse it is a problem, is it?


(Ballistic Tire) #21

Mobile view, this is another problem.


(Ballistic Tire) #22

The real world problem is when people are excited, they use a lot of emojis, and this issue breaks their hearts.