Khmer Unicode Zero Width Space disappears after posting

evannak · March 12, 2015, 12:25pm

Hi there, I just installed discourse and found out that it didn’t support Zero Width Space at all. It is Khmer Unicode special character that represents a hidden space to separate each word. you can take a look here http://www.askcambodia.org/t/angelababy/21

supermathie · March 12, 2015, 2:01pm

Can you give us an example of how it should look?

Not being familiar with Khmer it’s hard to tell what’s going wrong.

codinghorror · March 12, 2015, 9:56pm

It should work, all unicode is supported. Where are you trying to use this character, specifically?

sam · March 12, 2015, 10:27pm

unless … this is another nokogiri bug …

codinghorror · March 12, 2015, 11:10pm

The OP did not say “entity”, and what was referred to is a Unicode character, not a string representing a HTML entity… so unless it is a strange HTML entity, it should work.

evannak · March 13, 2015, 2:06am

@supermathie the thing is that it replaces zero width space to be a space for example here

ប្រទេស កម្ពុជា this one with zero width space
ប្រទេសកម្ពុជា this one with no zero width space

Thanks,

evannak · March 14, 2015, 4:43pm

@codinghorror something makes the zero width space becoming a space. do you have any idea?

sam · March 14, 2015, 8:36pm

It is a bug, we will try to find it, in the mean time perhaps you should look at translating Discourse

evannak · March 15, 2015, 6:33am

Hi Sam I just requested for khmer language on Transifex. hope it is approved soon and i can invite my team to work on that.

zogstrip · March 15, 2015, 9:53pm

Just making sure @techAPJ sees this ^^

techAPJ · March 16, 2015, 2:58am

Approved! Thanks for contributing translations.

sam · March 16, 2015, 3:13am

I am not seeing this bug in our markdown cooking code:


 it 'does not strip zero width spaces' do
    from = "hi\u{200B}\u{FEFF}there"
    cooked = PrettyText.cook(from)

    cooked = cooked.gsub(/<[^>]*>/, "")
    expect(cooked).to eq from
end

So let me take this one step back, does the preview look right before you post?

evannak · March 16, 2015, 8:30am

@sam Yes sure it works fine in the preview. It just breaks the words after posted.

evannak · March 26, 2015, 12:52pm

@sam Any update on this?

sam · March 26, 2015, 10:14pm

I am confused about the repro, in your example:

ប្រទេស កម្ពុជា

has a real space in the raw markdown, I need an example I can work with.

lidel · March 26, 2015, 10:45pm

@sam, I was able to replicate the issue this way:

I copied sample from:

and pasted here:

Word Word Word

After each Word you should find one zero-width space (U+200B).

Discourse displayed ‘WordWordWord’ correctly in preview pane (separated with U+200B), but after I posted this, I saw that it got cooked/refreshed and this operation replaced ‘WordWordWord’ with ‘Word Word Word’.

So indeed, Discourse replaces U+200B with real space

evannak · March 27, 2015, 12:30am

@lidel thank you very much for the explaination. that is the point @sam.

sam · March 27, 2015, 2:03am

AHA I can see why this is happening

@zogstrip added this …

https://github.com/discourse/discourse/commit/f4208ae83fd43e0cdd663d82a73fabfb65f327bb

This is actually a case of us being too smart for our own good. It was added as a system to avoid people “gaming” edits and bypassing rules. However it is totally not really acceptable that we break formatting here, so I patched it with:

https://github.com/discourse/discourse/commit/58c95f64d2887134604b8024afb0d673c48f433f

evannak · March 27, 2015, 2:06am

@sam so how can i fix it? or wait for another release of discourse?

sam · March 27, 2015, 2:07am

Wait for tests passed to pass and then update your instance if you are on the tests-passed branch which is the default in our configs.

Topic		Replies	Views
Can Send Empty Chat Messages Bug chat	3	924	June 8, 2023
Missing whitespace from certain localized strings Support	3	534	October 31, 2023
Whitespace abuse? Bug	3	644	December 27, 2018
No-break space and narrow no-break space are replaced by space Feature	12	3338	January 26, 2023
Khmer Unicode : title invalid; try to be more descriptive? Support	21	3014	June 8, 2024

Khmer Unicode Zero Width Space disappears after posting

Related topics