Exclamation mark and special characters in usernames

The only problem will be if you’re unhappy with the usernames that the script chooses. So if you don’t care that something like this happens:

  • clan|nickname → clan_nickname
  • [clan]nickname → clan_nickname2

then you’re probably OK, and it seems pretty unlikely that you’d have two poeple in the same clan that with the same nickname but different extra characters, so I think you’ll probably be OK.

Unfortunately, we had to end up creating our entire migration script by ourselves. The script that was available at the time (end of 2022) was definitely not even in a “finished” state. Pieces of referenced code were missing and others made no sense at all.

Between the parts we had to deal with, was the renaming of usernames that were not respecting the discourse standard. We had over 10k users so we simply assigned the username to the “user name” field which doesn’t have such restrictions and when a username was infringing any of the rules, we changed it to the md5 string of the username. Then we left the users 1 month to change their username freely so they could change it back to something that resembled their originale one.

Here is an excerpt of the code:

    public static string ParseName(string text)
    {
        var reservedName = new string[] { "user", "system", "moderators" };
        var result = NormalizeText(text);

        // If reserved change it
        if (reservedName.Contains(result, StringComparer.OrdinalIgnoreCase))
            result = text.ToLower().ToMD5();

        // Invalid alphanumeric
        else if (!Regex.IsMatch(result, @"^[a-zA-Z0-9_.-]*$"))
            result = text.ToLower().ToMD5();

        // Invalid repeated
        else if (Regex.IsMatch(result, @"[-_.]{2,}"))
            result = text.ToLower().ToMD5();

        // Invalid trailing char
        else if (Regex.IsMatch(result, @"/[\p{L}\p{N}]+$/"))
            result = text.ToLower().ToMD5();

        // Confusing extensions
        else if (Regex.IsMatch(result, @"/\.(js|json|css|htm|html|xml|jpg|jpeg|png|gif|bmp|ico|tif|tiff|woff)$/i"))
            result = text.ToLower().ToMD5();

        // Starting and ending dots
        else if (result.StartsWith(".") || result.EndsWith("."))
            result = text.ToLower().ToMD5();

        // No more than 60 character
        result = result.Truncate(60);

        return result;
    }

We used this file as reference to determine what was to be considered a restriction. Might have been changed in the meanwhile so, keep an eye out.

2 Likes

Did you base it on an existing import script? If you had, it would call discourse/lib/user_name_suggester.rb at main · discourse/discourse · GitHub and just worked.

No, we moved off from the script as it was incredibly slow in processing a forum with 10 million + replies.

We created everything from scratch in C#. Took around 4-5 hours to migrate everthing beside avatars and create redirect to match the different URL formats. A second pass of the script with a specific parameter would also update the avatar for each active users in the last 6 months and add all the necessary redirects by scanning the replies and looking for the old URL format.

That took more or less a couple days but it was something that could happen in the background as the important thing for us was to have the forum up and running with all content in less than a day.

1 Like

Nice work! That’s very cool.

FWIW, The existing scripts skip existing content so the final import likely would have taken under an hour, especially if you told it the date of the most recently imported data. The first run would be very painful, though.

Yeah, I know it also resumed but some old content was so messed up (the forum had a migration in early 2000) that the script was failing most of the time or slowing to a crawl very quickly.

Believe me, if we could have avoided 3 months of writing and testing, we would have! :smiley:

2 Likes