Please replace German umlauts in URLs with their respective digraphs

When generating topic URLs from titles, German umlauts could be “translated” better.

Currently:

Title:

Kürzest mögliche Logging-Klasse

URL:

/kurzest-mogliche-logging-klasse

(Example)

Suggestion:

Replace umlauts with their respective digraphs:

Title:

Kürzest mögliche Logging-Klasse

URL:

/kuerzest-moegliche-logging-klasse

I.e.:

ä => ae
ö => oe
ü => ue
ß => ss

Most likely there is some kind of automatic conversion that would work for other languages, too.

3 Likes

You can create your own language slug rules with a custom plugin.
Check this
https://github.com/thangngoc89/discourse-vietnamese-slug

In case you’re too lazy with all the character, here is a good project for slug
https://github.com/cocur/slugify/blob/master/src/Slugify.php

2 Likes

Thanks. From a quick look, it seems that in the source code, one character is replaced by one other character.

Since I need to replace one character by two characters, I would have to modify the code.

On the other hand, I have not the slightest clue how to develop in Ruby. So I better wait until (if ever) this goes into the main Discourse sources.

Acctually, I’m a noob in Ruby. But I can write that code quite easy (with a little help in plugin part).
Ideally, you can make a simple array like

ä : ae
ö : oe
ü : ue
ß : ss

With a loop, you can replace all text in title before pass it to Discourse slug generator

1 Like

I could probably submit a patch for that.
The slugs are calculated in slug.rb. It already contains exceptions for zh_CN and ja and there seems to be an open pull request for Vietnamese.

@sam I guess there will be a lot more languages with the same problem. How should we handle this? Should I just submit a simple patch for German or are you going to use a gem like babosa? Any other ideas?

2 Likes

I’ll withdraw that pull request. I created the slug plugin myself to work around that

1 Like

@sam and @techAPJ: Can you please help me out here?

I’ve been looking into this a little bit. It looks like I have a few options for solving this:

Option 1
Add the following transliteration rules to server.de.yml and let the existing transliteration replace the German umlauts:

  i18n:
    transliterate:
      rule:
        Ä: "AE"
        Ö: "OE"
        Ü: "UE"
        ß: "ss"
        ä: "ae"
        ö: "oe"
        ü: "ue"

But I guess I would have to add those to server.en.yml as well, otherwise they would get lost at the next pull from Transifex. I’m not feeling comfortable in doing that. Why should the translators of all the other languages have to worry about this? I’m not even sure why we already have some rules in there…

Option 2
Add an additional, language specific translation file (e.g. transliterations.de.yml) that just contains the transliteration rules and is not managed in Transifex.

Option 3
Replace the German umlauts within slug.rb. Doing this with a regex wouldn’t be that hard since there are only 7 characters to replace. But, this feels wrong on so many levels. What if other languages want to do the same? I imagine this would become ugly quite fast.

Option 4
Use stringex like it’s already done for Simplified Chinese. According to the documentation it should even have German transliterations out-of-the-box. However, I couldn’t get it working. Somehow it didn’t use the locale files that should come with stringex. :frowning: It only used the two translations from server.en.yml.

What I also like about it is the localization of stringex conversations:

It is possible to localize the different conversions and unidecoding in Stringex, so for example “100%” becomes “100 percent” in English and “100 prozent” in German.

Currently symbols like % are always removed when the slug is created. Replacing it by e.g. “percent” would be great.

3 Likes

I think we should go for options 2

transliterations.de.yml seems like the cleanest way to solve this.

1 Like

German umlauts are now handled correctly if the default locale of the forum is “de”:

https://github.com/discourse/discourse/commit/8a236c06e282e7bc64e2bf963d52ade261755255

2 Likes

Closing as the PR was now merged, thanks @gerhard!

1 Like