Inline language markup? (language learning site)


Is there enough interest in the community to merit adding inline markup for language?
This is would be helpful in mixed language communities, and especially language learning forums.

For example, something like this:


To produce html:

<span lang=“ja”>…</span>

The same Unicode character is rendered differently in different languages:

In the image above, the browser renders it differently when it knows the text is Japanese.

I realize there are site-wide and user-specific locale settings, but in a mixed-language post (like on a language learning site), two languages could be used in the same post.

Is this worthy of mainline integration?

(Sam Saffron) #2

Very interesting, I wonder if we should simply whitelist <span lang="ja"> in core, it seems pretty low risk.

Not sure about the general interest here, this is the first time I have seen this request come up, but totally get that mixed Chinese / Japanese communities exist and they surely need this feature.


That’s certainly a simpler solution.
I guess I would lean toward whitelisting the lang attribute on all Discourse-supported html tags, since the html standard allows it on all html tags.


One other tag set worth whitelisting is ‘ruby’ tags, which are a standard part of html for Japanese language support.

Since there are thousands of kanji (e.g. 漢字) in the Japanese language, Japanese students are still learning them all the way through high school. So, it is common for publications to use “furigana” to mark the pronunciation above kanji (see snapshot below from NHK News website).

The W3C site has the following for a complete list of ruby tags:

<ruby> </ruby>
<rbc> </rbc>
<rtc> </rtc>
<rb> </rb>
<rc> </rc>
<rp> </rp>

I can’t think of any risks to whitelisting, since they’re pretty straight foward.

Here’s an example of html and result (as an image):


(Looks like font size might need to be set to 1.2em, as I’ve done in the sample text above)

(Sam Saffron) #5

Be sure to mention that here:

@codinghorror what is your call here

  • Whitelist <span lang> in core and ruby tags (not the Ruby language its a Japanese feature)


  • Add a site setting for the whitelisting


Yep… We’ve got a few other people working on mentioning it at CommonMark.



I did some testing, and there are side-effects if <ruby> is allowed in thread titles. Post body seems fine, though.

I suppose CSS could hide them in thread titles. That seemed to work fine in tests.
Without additional research, I’m not sure how important it is to support <ruby> in titles. But I do feel like it would be a big boost for the message body.

Anyway, here are some samples (randomly inserted text):

and this:

Body text works fine, since line-height isn’t forced: