Inline language markup? (language learning site)

Is there enough interest in the community to merit adding inline markup for language?
This is would be helpful in mixed language communities, and especially language learning forums.

For example, something like this:

[lang=ja]…[/lang]

To produce html:

<span lang=“ja”>…</span>

The same Unicode character is rendered differently in different languages:

In the image above, the browser renders it differently when it knows the text is Japanese.

I realize there are site-wide and user-specific locale settings, but in a mixed-language post (like on a language learning site), two languages could be used in the same post.

Is this worthy of mainline integration?

7 Likes

Very interesting, I wonder if we should simply whitelist <span lang="ja"> in core, it seems pretty low risk.

Not sure about the general interest here, this is the first time I have seen this request come up, but totally get that mixed Chinese / Japanese communities exist and they surely need this feature.

3 Likes

That’s certainly a simpler solution.
I guess I would lean toward whitelisting the lang attribute on all Discourse-supported html tags, since the html standard allows it on all html tags.

One other tag set worth whitelisting is ‘ruby’ tags, which are a standard part of html for Japanese language support.

Since there are thousands of kanji (e.g. 漢字) in the Japanese language, Japanese students are still learning them all the way through high school. So, it is common for publications to use “furigana” to mark the pronunciation above kanji (see snapshot below from NHK News website).

The W3C site has the following for a complete list of ruby tags:

<ruby> </ruby>
<rbc> </rbc>
<rtc> </rtc>
<rb> </rb>
<rc> </rc>
<rp> </rp>

I can’t think of any risks to whitelisting, since they’re pretty straight foward.

Here’s an example of html and result (as an image):

<ruby><rb>漢字</rb><rt>かんじ</rt></ruby>

(Looks like font size might need to be set to 1.2em, as I’ve done in the sample text above)

5 Likes

Be sure to mention that here:

@codinghorror what is your call here

  • Whitelist <span lang> in core and ruby tags (not the Ruby language its a Japanese feature)

or

  • Add a site setting for the whitelisting
3 Likes

Yep… We’ve got a few other people working on mentioning it at CommonMark.
Thanks!!

2 Likes

FYI,

I did some testing, and there are side-effects if <ruby> is allowed in thread titles. Post body seems fine, though.

I suppose CSS could hide them in thread titles. That seemed to work fine in tests.
Without additional research, I’m not sure how important it is to support <ruby> in titles. But I do feel like it would be a big boost for the message body.

Anyway, here are some samples (randomly inserted text):


and this:


Body text works fine, since line-height isn’t forced:



1 Like

I know it’s been a year since this topic was created, but is there any update on whether ruby tags will be whitelisted?

1 Like

Any updates on the lang attribute whitelisting? Been having some annoyances today with not being able to mark text up properly.

Sure I can add a whitelist in core for lang on spans, do you need it on divs as well?

<span>, <p>, and <div> would all be reasonable to have it whitelisted for, hm.

Technically <a> tags too, but for that you could have a <span> to mark up the link text, so it’s not as important I think.

Since ‘lang’ is a global attribute, I would be inclined to whitelist it on any Discourse-supported tags… but I don’t know what the implications of that would be for Discourse’s filtering efficiency.

If there’s good reason for keeping the whitelisting to a minimum, it would still be helpful to support at least one block-level element (e.g. <div>) and one inline element (e.g. <span>).

5 Likes

But still no word on the ruby tags?

1 Like

Happy to white list them just need details of exactly what needs whitelisting.

4 Likes

lang is whitelisted per:

https://github.com/discourse/discourse/commit/0b3d51a8bc43391b3f64719c6606ff3298128824

4 Likes

<ruby>, <rb>, <rt>, and <rp>
And the [lang] attribute on <ruby>, <rb>, and <rt>
(<rp> is just for enclosing parenthesis as a fallback for old browsers, so [lang] isn’t useful.)

Based on the tests I ran above, I’d recommend only whitelisting them in the post body, not in titles. It would cause layout problems in titles, but is okay in the post body.

Also, a slightly increased font size makes quite a bit of difference for readability:

before:
image image
after:
image image

ruby {
    font-size: 16px; /* default is 14px from css on 'html' in base.scss */
}
rt {
    font-size: 10px; /* default is 50% in Chrome */
}
4 Likes

Oops… I had a few typos above with <rp> and <rb>. It’s corrected now.

2 Likes

This is now done per, https://github.com/discourse/discourse/commit/280c318c49e862d4226ffee48134413ca84dcd85

@awesomerobot do you want to add some basic styling here? We can not have the px based rules but something out-of-the-box would be nice for CJK communities.

Read all about Ruby tags here…

3 Likes

I made an update to increase the rt font-size from 50% to 72%, which is approximately 10px (based on our default 14px font size).

41%20PM

The font-size in the ruby tag is based off of our base font-size for all content, so it seems like that should be increased along with all site or post text and not on its own.

4 Likes