Khmer Unicode Zero Width Space disappears after posting

Hi there, I just installed discourse and found out that it didn’t support Zero Width Space at all. It is Khmer Unicode special character that represents a hidden space to separate each word. you can take a look here http://www.askcambodia.org/t/angelababy/21

2 Likes

Can you give us an example of how it should look?

Not being familiar with Khmer it’s hard to tell what’s going wrong.

1 Like

It should work, all unicode is supported. Where are you trying to use this character, specifically?

unless … this is another nokogiri bug …

The OP did not say “entity”, and what was referred to is a Unicode character, not a string representing a HTML entity… so unless it is a strange HTML entity, it should work.

@supermathie the thing is that it replaces zero width space to be a space for example here

ប្រទេស កម្ពុជា this one with zero width space
ប្រទេសកម្ពុជា this one with no zero width space

Thanks,

5 Likes

@codinghorror something makes the zero width space becoming a space. do you have any idea?

It is a bug, we will try to find it, in the mean time perhaps you should look at translating Discourse :slight_smile:

Hi Sam I just requested for khmer language on Transifex. hope it is approved soon and i can invite my team to work on that.

5 Likes

Just making sure @techAPJ sees this ^^

2 Likes

Approved! Thanks for contributing translations.

2 Likes

I am not seeing this bug in our markdown cooking code:


 it 'does not strip zero width spaces' do
    from = "hi\u{200B}\u{FEFF}there"
    cooked = PrettyText.cook(from)

    cooked = cooked.gsub(/<[^>]*>/, "")
    expect(cooked).to eq from
end

So let me take this one step back, does the preview look right before you post?

@sam Yes sure it works fine in the preview. It just breaks the words after posted.

@sam Any update on this?

I am confused about the repro, in your example:

ប្រទេស កម្ពុជា

has a real space in the raw markdown, I need an example I can work with.

@sam, I was able to replicate the issue this way:

I copied sample from:

and pasted here:

Word Word Word

After each Word you should find one zero-width space (U+200B).

Discourse displayed ‘WordWordWord’ correctly in preview pane (separated with U+200B), but after I posted this, I saw that it got cooked/refreshed and this operation replaced ‘WordWordWord’ with ‘Word Word Word’.

So indeed, Discourse replaces U+200B with real space :frowning:

2 Likes

@lidel thank you very much for the explaination. that is the point @sam. :slight_smile:

AHA I can see why this is happening

@zogstrip added this …

https://github.com/discourse/discourse/commit/f4208ae83fd43e0cdd663d82a73fabfb65f327bb

This is actually a case of us being too smart for our own good. It was added as a system to avoid people “gaming” edits and bypassing rules. However it is totally not really acceptable that we break formatting here, so I patched it with:

https://github.com/discourse/discourse/commit/58c95f64d2887134604b8024afb0d673c48f433f

1 Like

@sam so how can i fix it? or wait for another release of discourse?

Wait for tests passed to pass and then update your instance if you are on the tests-passed branch which is the default in our configs.

1 Like