Wrong -> arrow direction in RTL text contexts

This has nothing to do with bidi settings in Discourse.

When I type -> it gets converted into an arrow character , so A -> B renders as “A → B”. Pretty cool.

However, the arrow goes the wrong way in RTL text: א -> ב renders as: “א → ב” with the arrow going the wrong way. (If you’re reading this in the future after this bug has been fixed, this was rendered as “א → ב”)

Note that the input character sequence here is:

Character Name
א HEBREW LETTER ALEF
SPACE
- HYPHEN-MINUS
> GREATER-THAN SIGN
SPACE
ב HEBREW LETTER BET

which you can verify by copying the string א -> ב into this tool: https://unicodedecode.com/

This is because arrow characters don’t bidi-mirror in Unicode. Relevant document: https://www.unicode.org/L2/L2022/22026r-non-bidi-mirroring.pdf

In particular, arrow and arrow-like characters each often has a mirror character. One could argue that they should have had the Bidi_Mirrored=Yes property value, but they don’t, and cannot now get that.

There is unfortunately no bidi-flipping arrow character, meaning that if you want to make this substitution correctly, you have to determine the bidi direction of the surrounding text to correctly pick between ← and → arrows. No easy task.

1 Like

@falco I would argue that this is indeed a bug, not a feature request. The output is the exact opposite of user intentions and expectations.

Given that

It means we would have to build a new feature, as we are currently following the Unicode spec, which is why I recategorized it as a Feature request.

Moving on to actually addressing your issue, I think this could be easily done in a Theme component, using our existing api.decorateCooked API.

2 Likes

Thanks. I’m in no hurry to get it fixed in any particular forum, I just think this should be fixed in Discourse.

I don’t want to get into a pointless argument about semantics, so I’ll leave it at that. I’ve said what I have, I think this should be considered a bug, but what you do with that is up to you now.

Thanks for your attention and quick response :slight_smile:

1 Like

Well… A man can only resist so much. I will say one last thing (I promise). As far as I’m aware, the Unicode spec does not encourage converting -> to (and this issue would be one reason why), so this existing Discourse feature is not following any Unicode spec. It makes a false assumption about text and introduces this bug in the process. That’s how I see it. (The feature is still neat though)

Now I’ve said what I have!

3 Likes

If I’m typing in a right-to-left language, I could hope to type ‘dash‘ followed by ‘less than’ and expect it to convert to a leftward arrow, like this: ← . That seems a reasonable expectation to me. But, when I type a less-than, the composer inserts a greater-than. This was quite unexpected. Is that the bug??

I notice a RTL text box (such as the search box on aljazeera.net) inserts numbers and maths symbols in LTR order within the RTL text. This seems natural enough. (It does the same for latin alphabetics)

Below I will type “less than is < and greater than is >” in a RTL context (I don’t know if this is how things would work in a RTL locale):

‮less than is < and greater than is >

3 Likes

You don’t use a right-to-left script in everyday life, right? There is no bug in what you described. There is some ambiguity in what you said so to prevent confusion I will address the second part of your comment first.

This is exactly how it’s supposed to work. Think of it this way:

The character > literally means “greater than”. The string “A > B” means “A is greater than B”.

Similarly, to say “א is greater than ב” I would replace “is greater than” with the same greater-than character with the same code U+003E. However, because the string is entirely RTL, “א” appears to the right of “ב”. If the “greater than” character was rendered the same as LTR, it would show: א<ב which reads as “א is less than ב” or “ב is greater than א” - the exact opposite relationship to the one being described.

This is why when rendering the greater-than character, it gets visually flipped when in RTL. But the underlying character, and the Unicode data backing it, is still the “greater than” symbol. The string still means “א is greater than ב”.

Now back to your first question:

If you change your keyboard layout to a RTL language (like Hebrew or Arabic), then the key combination Shift+, (the key with < printed on it) would actually type the “greater than” character >. This would render as ‏>‏ in a RTL context, like in the search box you found.

[Edit: the next paragraph was written when I slightly misunderstood what you said you had tested. I thought you were typing into a RTL box with a LTR keyboard, when you were actually doing the opposite. Hopefully I still answered your confusion.]

But you are still using a Latin keyboard layout, so when you press that key combination, it inserts a “less than” character <. But it gets rendered as ‏<‏ because in RTL, it means what’s to the right is less than what’s to the left.

Bottom line: the character is the same, but its rendering gets mirrored.

If you’ve understood what I said up to this point, then you’ll understand that that would make -< or in RTL ‏-<‏ which I don’t expect is what you meant.

Did I successfully explain it or did I just make you more confused?

1 Like

If you think you’d do better with official Unicode documents, try this one: UAX #9: Unicode Bidirectional Algorithm do ctrl+F for “mirror” and you’ll find some good descriptions and examples.

1 Like

You’re quite right, I’m jumping in without experience, and also with a Latin keyboard!

So I should be quiet… but I do see that if I type (on my Latin keyboard) 3<6 into the aljazeera search box, I see this:

Which probably shows that you’re right, and I’m wrong, and that should be no surprise!

2 Likes

Not at all! If only RTL users were allowed to discuss and fix RTL bugs, we’d be much worse off! I just took this opportunity to introduce you to the subject. It’s supposed to take some time to wrap your head around it. I’m happy to answer any more questions or curiosities you have about this.

1 Like

I have joined the Unicode mailing list to propose an addition to Unicode that would be a solution in cases like this. One of the responses I got was this:

(Me:) The problem is this replacement is done (as far as I know) outside of any rendering context, when the text is just a sequence of character codes. It’s not reasonable to know which direction the text goes. Sometimes it’s completely impossible, if the text direction depends on context that isn’t available at the time of replacement.

The above is strictly speaking inaccurate. Any serious text rendering
nowadays requires a shaping engine, such as HarfBuzz, and ligation of
“->” into “→” would be done by such a shaper in cooperation with a
font that supports ligatures. The shaping engine is aware of the
bidi context and the script of the text it shapes, so it could in
principle mirror the arrow.

They are talking about something like this: GitHub - tonsky/FiraCode: Free monospaced font with programming ligatures

Consider switching to the ligature approach instead of blind character replacement. Another arguable advantage would be that when copy-pasted, the text would still be “->” instead of an arrow.

I haven’t looked into the technical details of how to implement this, I’ll leave that to you if you choose to use this solution.

Edit: well, unsurprisingly, Fira Text in particular isn’t designed with RTL in mind, so the rendering is off - but at least it’s pointing the right way! https://fonts.google.com/specimen/Fira+Code?preview.text=A%20->%20B,%20א%20->%20ב
Firefox:

I’m not sure if there exists a font today that does this correctly and explicitly supports RTL/bidi.

1 Like

Interestingly, I get a different result in Chromium:
Edit: I can’t reproduce this now so I think I typed it in wrong when I took this screenshot.
Edit: And now I can reproduce it again. The situation is bad.

It’s possible that browser rendering engines/shapers are not up to this task. I’ll need to investigate more, and this is not what I’m supposed to be focusing on right now…

Edit: forum limits forced me to remove this from my previous reply:
For reference, this is the code responsible for this replacement:

1 Like