Khmer Unicode Zero Width Space disappears after posting

(Vannak Eng) #1

Hi there, I just installed discourse and found out that it didn’t support Zero Width Space at all. It is Khmer Unicode special character that represents a hidden space to separate each word. you can take a look here

(Michael Brown) #2

Can you give us an example of how it should look?

Not being familiar with Khmer it’s hard to tell what’s going wrong.

(Jeff Atwood) #3

It should work, all unicode is supported. Where are you trying to use this character, specifically?

(Sam Saffron) #4

unless … this is another nokogiri bug …

(Jeff Atwood) #5

The OP did not say “entity”, and what was referred to is a Unicode character, not a string representing a HTML entity… so unless it is a strange HTML entity, it should work.

(Vannak Eng) #6

@supermathie the thing is that it replaces zero width space to be a space for example here

ប្រទេស កម្ពុជា this one with zero width space
ប្រទេសកម្ពុជា this one with no zero width space


(Vannak Eng) #7

@codinghorror something makes the zero width space becoming a space. do you have any idea?

(Sam Saffron) #8

It is a bug, we will try to find it, in the mean time perhaps you should look at translating Discourse :slight_smile:

(Vannak Eng) #9

Hi Sam I just requested for khmer language on Transifex. hope it is approved soon and i can invite my team to work on that.

(Régis Hanol) #10

Just making sure @techAPJ sees this ^^

(Arpit Jalan) #11

Approved! Thanks for contributing translations.

(Sam Saffron) #12

I am not seeing this bug in our markdown cooking code:

 it 'does not strip zero width spaces' do
    from = "hi\u{200B}\u{FEFF}there"
    cooked = PrettyText.cook(from)

    cooked = cooked.gsub(/<[^>]*>/, "")
    expect(cooked).to eq from

So let me take this one step back, does the preview look right before you post?

(Vannak Eng) #13

@sam Yes sure it works fine in the preview. It just breaks the words after posted.

(Vannak Eng) #14

@sam Any update on this?

(Sam Saffron) #15

I am confused about the repro, in your example:

ប្រទេស កម្ពុជា

has a real space in the raw markdown, I need an example I can work with.

(Marcin Rataj) #16

@sam, I was able to replicate the issue this way:

I copied sample from:

and pasted here:

Word Word Word

After each Word you should find one zero-width space (U+200B).

Discourse displayed ‘WordWordWord’ correctly in preview pane (separated with U+200B), but after I posted this, I saw that it got cooked/refreshed and this operation replaced ‘WordWordWord’ with ‘Word Word Word’.

So indeed, Discourse replaces U+200B with real space :frowning:

(Vannak Eng) #17

@lidel thank you very much for the explaination. that is the point @sam. :slight_smile:

(Sam Saffron) #18

AHA I can see why this is happening

@zogstrip added this …

This is actually a case of us being too smart for our own good. It was added as a system to avoid people “gaming” edits and bypassing rules. However it is totally not really acceptable that we break formatting here, so I patched it with:

(Vannak Eng) #21

@sam so how can i fix it? or wait for another release of discourse?

(Sam Saffron) #22

Wait for tests passed to pass and then update your instance if you are on the tests-passed branch which is the default in our configs.