Anchoring not working when using non-latin characters in headings

I believe I have come across a potential issue while using the auto-generated table of contents feature in DiscoTOC:

When utilizing headings of varying levels in languages other than English, such as Chinese, it seems that the data-d-toc IDs in the auto-generated table of contents only capture numerical digits and English letters from the headings. This situation can result in the creation of identical IDs quite easily, subsequently leading to incorrect links in the right-hand scrollbar.

In the image above, if the serial numbers within the headings are both 5, the resulting data-d-toc IDs will both be toc-h2-5. Consequently, this will lead to two distinct links erroneously directing to the same section.

However, by modifying the serial numbers to 1.5 and 2.5, the data-d-toc IDs will differ (toc-h2-15 and toc-h2-25), effectively ensuring accurate and appropriate links.

In order to ensure accurate linking within the scrollbar, is it advisable to keep the headings in English?

Furthermore, for languages like Chinese, would the most viable solution involve incorporating multi-level serial numbers (e.g., 3.5, 3.6.5, 4.2.5.6) to the headings?

Reference:

I have already mentioned this issue, and forked a copy

https://meta.discourse.org/t/discotoc-automatic-table-of-contents/111143/399?u=lhc_fl

However, although it’s working, it is not perfect, but I am too lazy to change the code.

Obviously a better solution is to use base64 to generate data-d-toc and add a unique identifier to prevent possible duplicate titles

1 Like

I currently do not have the authority to make such changes to my company’s forum, but I do appreciate your response!

Furthermore, I’d like to ask if the official team has considered incorporating support for other non-latin languages regarding this auto-generated table of contents in future DiscoTOC component releases? @Lilly @awesomerobot

Once more, thank you, everyone!

In truth, they have taken into consideration non-Latin languages, utilizing the slugify(h.textContent) method. I suspect that this slugify function was developed in accordance with the forum’s slug generation method. When it is not in ‘encode’ mode, issues tend to arise, although I have not personally tested this hypothesis.

In previous instances when we used the official version of the theme component our forum’s slug generation method was set to ‘none’, which gave rise to similar problems. May I therefore suggest that you attempt altering the setting to ‘encode’?

Additionally, considering the official team’s speed in fixing components… there’s an issue I reported last year on a component from the same team that still hasn’t been addressed. I suggest you apply to use my fork instead.

1 Like

I made an attempt to modify this setting, but the issues mentioned earlier still persist. The data-d-toc ID can only read numbers and letters, and there are still instances of duplicated table of contents IDs. I guess the crux lies elsewhere?

Update: I updated the code today. The suffix is now generated by index:

This improvement solves the problem that non-latin characters or the same title name generate the same anchor

const suffix = `${slugify(h.textContent)}-${post?.post_number}-${index}`;
1 Like