Khmer Unicode Zero Width Space disappears after posting


(Vannak Eng) #1

Hi there, I just installed discourse and found out that it didn’t support Zero Width Space at all. It is Khmer Unicode special character that represents a hidden space to separate each word. you can take a look here http://www.askcambodia.org/t/angelababy/21


(Michael Brown) #2

Can you give us an example of how it should look?

Not being familiar with Khmer it’s hard to tell what’s going wrong.


(Jeff Atwood) #3

It should work, all unicode is supported. Where are you trying to use this character, specifically?


(Sam Saffron) #4

unless … this is another nokogiri bug …


(Jeff Atwood) #5

The OP did not say “entity”, and what was referred to is a Unicode character, not a string representing a HTML entity… so unless it is a strange HTML entity, it should work.


(Vannak Eng) #6

@supermathie the thing is that it replaces zero width space to be a space for example here

ប្រទេស កម្ពុជា this one with zero width space
ប្រទេសកម្ពុជា this one with no zero width space

Thanks,


(Vannak Eng) #7

@codinghorror something makes the zero width space becoming a space. do you have any idea?


(Sam Saffron) #8

It is a bug, we will try to find it, in the mean time perhaps you should look at translating Discourse :slight_smile:


(Vannak Eng) #9

Hi Sam I just requested for khmer language on Transifex. hope it is approved soon and i can invite my team to work on that.


(Régis Hanol) #10

Just making sure @techAPJ sees this ^^


(Arpit Jalan) #11

Approved! Thanks for contributing translations.


(Sam Saffron) #12

I am not seeing this bug in our markdown cooking code:


 it 'does not strip zero width spaces' do
    from = "hi\u{200B}\u{FEFF}there"
    cooked = PrettyText.cook(from)

    cooked = cooked.gsub(/<[^>]*>/, "")
    expect(cooked).to eq from
end

So let me take this one step back, does the preview look right before you post?


(Vannak Eng) #13

@sam Yes sure it works fine in the preview. It just breaks the words after posted.


(Vannak Eng) #14

@sam Any update on this?


(Sam Saffron) #15

I am confused about the repro, in your example:

ប្រទេស កម្ពុជា

has a real space in the raw markdown, I need an example I can work with.


(Marcin Rataj) #16

@sam, I was able to replicate the issue this way:

I copied sample from:

and pasted here:

Word Word Word

After each Word you should find one zero-width space (U+200B).

Discourse displayed ‘WordWordWord’ correctly in preview pane (separated with U+200B), but after I posted this, I saw that it got cooked/refreshed and this operation replaced ‘WordWordWord’ with ‘Word Word Word’.

So indeed, Discourse replaces U+200B with real space :frowning:


(Vannak Eng) #17

@lidel thank you very much for the explaination. that is the point @sam. :slight_smile:


(Sam Saffron) #18

AHA I can see why this is happening

@zogstrip added this …

https://github.com/discourse/discourse/commit/f4208ae83fd43e0cdd663d82a73fabfb65f327bb

This is actually a case of us being too smart for our own good. It was added as a system to avoid people “gaming” edits and bypassing rules. However it is totally not really acceptable that we break formatting here, so I patched it with:

https://github.com/discourse/discourse/commit/58c95f64d2887134604b8024afb0d673c48f433f


(Vannak Eng) #21

@sam so how can i fix it? or wait for another release of discourse?


(Sam Saffron) #22

Wait for tests passed to pass and then update your instance if you are on the tests-passed branch which is the default in our configs.