Minimum title lengths on international sites

There I was on 臺鐵用Openstreetmap! - 台灣/Tâi-oân/Tâi-uân/Thòi-vân (Taiwan) - OpenStreetMap Community Forum
trying to post subject
臺鐵用OSM!
Well I got a warning that that is below 15 characters
Let’s see,
$ echo 臺鐵用OSM!|wc -c
14
So I had to change it.
$ echo 臺鐵用Openstreetmap!|wc -c
24
Anyway, there is a cultural bias in the Discourse defaults.
It just so happens that a Chinese character can pack a lot of meaning, and should not just be
assumed to be three little boring bytes. Japanese kanji too.

Note I do not run any of these sites. I am just the lowliest of users.

Well, this can be set by the site administrators. I think this should be blamed on them.
I am also a Chinese user, the administrator of my forum set the minimum subject length to 1.


Upd: Oh, it seems like this is a site for the whole world.
In any case, since site administrators want to be global, respecting language differences is the way to go.

I’m saying that the worldwide default,

needs not to just be euro-centered.
Discourse allows UTF-8, so it should also have a better algorithm for setting the default.
How about this, all western admins would need to set a higher default,
because Discourse assumed users used Chinese. Just as unfair as the current opposite situation,
where eastern admins all need to change the default.

1 Like

Discourse ships with adjusted automatic defaults for sites if locale is set to Chinese

We have per locale defaults, is our Chinese default not loose enough ?

11 Likes

I don’t know anything, and I was triggered by single word ”blaim” and that’s why I’m drifting to meta-discussion again :wink:

I’m from Finland. I don’t know anything about chinese or any other non-alphabetical languages. Because of that I wouldn’t even think such situation.

That’s why we have (or need) two things:

  • defaults per language or language group (AFAIK there is)
  • PM-system to tell admin that there is one minor issue, such ridiculously tight character limit (and that we have for sure :wink: )

So basically here is two different, but over-lapping parts:

  • what are options by design (on topic here)
  • how an admin uses options (off topic right here)
2 Likes

what you could do is ad something like this to your post

臺鐵用OSM! <!-- some more text because limits -->

If an user must do such tricks it is strong sign of bad desing or bad admin :wink:

Sure — it is ducktape fix.

1 Like

Discourse ships with adjusted automatic defaults for sites if locale
is set to Chinese

Big mistake!

In the olden days, each community stayed in their home country.
All was well.

In the newer days, we now have lots of international communities,
like the example site I posted.

Well, you might say that aforementioned international community should
set a different default.

Fine, but problem still there: Too loose for ASCII posters then.
Yup, we’re talking about two kinds of posters right there on the same
site. And those pesky posters might even switch between languages one
post to the next, or even in the same post, or even in the same Title.

Therefore I propose that you consider each Chinese character worth 5
ASCII etc. characters. Not just the current three, which gosh, happens
to be how many bytes a UTF-8 char is… And well, those emojis would
score at 2 to 4… what a mess.

Anyway I bet somebody has written a library that will give you the real
‘worthiness’ of each character, be it ASCII or not.

That was not even remotely true. All of those platforms were made for and by americans. And there was constantly character issues. Every european can prove it true because english is a minority languge if we look at big picture :wink:

2 Likes

Maybe an (optional) setting per category for lowering the minimum length could help. Cause in an international forum it normally has to be a dedicated category, if you post in Chinese (or any other language).

1 Like

This is an interesting feature request, it is a complex one due to coherence

Additionally non international sites may dislike this due to spam vectors

There is no easy change here and lots of knobs already exist

2 Likes

Actually you could count punctuation as e.g., -1,
$ unicode ,
U+002C COMMA …
Category: Po (Punctuation, Other)
$ unicode ’ ’
U+0020 SPACE …
Category: Zs (Separator, Space);

And here we get +5, from the first translation, h e a r t
$ unicode -v 心|grep Def
kDefinition: heart; mind, intelligence; soul

See e.g.,
$ apropos perluniprops
perluniprops (1) - Index of Unicode Version 14.0.0 character properties in Perl
$ apt-cache policy unicode #Debian
unicode:
Installed: 2.9-1

Well this is why those settings are adjustable. I would imagine the sites in question the site’s primary language is not Chinese.

This setting minimum is designed to help not have very vague topic titles. While it is very true chinese and other languages that use picto glyphs do indeed have a much more complete statement in a single glyph than say other languages like english and ones that use a 26 alphabet system.

So naturally system installed defaults will typically follow standards like 15 characters in Title and 20 character minimum for post requirements.

I don’t think blame should be on anyone. As sites will be tailored primarily for there targets. Ie being english it would be bad faith on my part to expect all sites foreign to me to have properly translated english content.

1 Like

That’s consensus but I’m not totally sure if it really works. Hello you all guys is 15 long and is really vague. Same with topics — how many time we’ve seen here a short answer that ends (plus few characters more). That 20 limit doesn’t work then and it can add just more noise. Actually in english speaking forums limit in topics could be just 6 and it stops I agree (are we counting spaces too?) On titles 3 is enough.

Of couse I understand why we need some limits but I’m not totally sure if 15, 20 or anything divided by 5 is really accurate :wink: But hey, that’s why we have settings for those.

Global forums must do some adjustments, and use moderators if too short titles start to be an issue.

Sorry guys if I’m hurting now someone’s feelings, but this topic tells more that we have way too much extra time spend than there would be a real issue :rofl:

(Why Meta isn’t offering to me list of emojies any more?)

2 Likes

We think we are forcing users to avoid too short titles.

But we are in fact forcing other users to use too long titles.

It’s all a cultural bias.

And no, please don’t say “they should contact their admin to adjust it.”

The root of the cultural bias is right here at Discourse.com.

A lack of sensitivity.

Also easy to fix. Count characters differently.

But who cares. Not a big problem in the “markets we cover.”

Worth noting this from above again:

As you can imagine, we get a lot of feature and ux requests, and we generally work on the principle that the ones with more interest get prioritised sooner. As yet, this one has not picked up a lot of steam from anyone other than yourself.

I appreciate that it may be something you’d love to be implemented, though I think we should be mindful to keep discussion civil and productive rather than resort to casting aspersions.

2 Likes

The problem with this though is that if you arbitrarily set every Chinese character to equal 5 Latin characters, and the error still says “Title must be at least 15 characters”, and you type 3 Chinese characters and then the error disappears, suddenly the error description will make no sense (because you typed 3 characters and not 15 characters)

I guess on an international site, if I ask myself “how would god do it” — the site would detect what language you’re trying to type the title, and show an appropriate minimum character requirement accordingly.

  • Title must be at least 15 characters (for Latin characters)
  • Title must be at least 7 characters (for Japanese characters since most of them equal 2 Latin characters, except for the vowels…)

etc. for every language set

Another issue with this though is that people can also use multiple languages in a title, as you did in the original post.

It’s about weighing up the work for all of this and how useful it will be vs. the current global defaults.

Just saying I’m not sure there’s a need to get personal – I speak English and Vietnamese both at a native level, currently learning Russian and have learnt some Latin, Chinese and Japanese. Discourse has one of the most geographically diverse teams I’ve seen anywhere by the way (spanning 6 continents and 15 timezones). I don’t think it’s related to sensitivity but logistics.

Most sites just use one language and that’s why there is a single global default for minimum title length for admins to change, with defaults per locale.

For a site that has discussion in many languages, which is much rarer, the admins for that site could lower the limit to 3 or 5 characters.

(The reason it’s in the hands of the admins is because with a minimum of 3 characters for example, people could then type really short spammy titles in English, if that’s the most common language. With a mature userbase, it might not be a problem, but with other communities, it might. It’s up to the admins to weigh it based on their userbase.)

2 Likes

I did think of another solution, which is making it so that the minimum topic title length could be set per category, if there are multiple categories with different languages.

For core, it would probably still depend on how frequently used it would be – but it could also be done in a plugin.

The spam reason / clarity of titles is mainly why it is in the hands of the admins rather than users. Though I also get how frustrating it would be as a user in a multi-language forum at present.

Better than counting bytes would be e.g.,

wcswidth (3) - determine columns needed for a fixed-size wide-character string

Anyway, it’s like “Choose your avatar” and they are somebody else’s race and gender.

And you know what, I thought I already posted a reply to this, via email. But I think the topic got reassigned while I was offline. So when the email arrived at Discourse, there was no place to post it, so it went into the bitbucket. So I have discovered a bug. At least the user should get a bounce message.

Search results for 'wcswidth' - Discourse Meta only finds this one reply. My other email posts arrived fine, but not the one where I mentioned wcswidth. That must be due to the topic change etc.

I’m fine with the topic change. But the system should be sure to email back to the user telling him/her that their post failed… due to…

OK, I can confirm sending a message to an already closed topic does result in a rejection email (but it throws away the body the user typed!) but for topic changes, etc. it seems the email is thrown away.