Anchoring not working when using non-latin characters in headings

Ellery · August 25, 2023, 3:42am

I believe I have come across a potential issue while using the auto-generated table of contents feature in DiscoTOC:

When utilizing headings of varying levels in languages other than English, such as Chinese, it seems that the data-d-toc IDs in the auto-generated table of contents only capture numerical digits and English letters from the headings. This situation can result in the creation of identical IDs quite easily, subsequently leading to incorrect links in the right-hand scrollbar.

In the image above, if the serial numbers within the headings are both 5, the resulting data-d-toc IDs will both be toc-h2-5. Consequently, this will lead to two distinct links erroneously directing to the same section.

However, by modifying the serial numbers to 1.5 and 2.5, the data-d-toc IDs will differ (toc-h2-15 and toc-h2-25), effectively ensuring accurate and appropriate links.

In order to ensure accurate linking within the scrollbar, is it advisable to keep the headings in English?

Furthermore, for languages like Chinese, would the most viable solution involve incorporating multi-level serial numbers (e.g., 3.5, 3.6.5, 4.2.5.6) to the headings?

Reference:

Lhc_fl · August 25, 2023, 3:46am

I have already mentioned this issue, and forked a copy

https://meta.discourse.org/t/discotoc-automatic-table-of-contents/111143/399?u=lhc_fl

However, although it’s working, it is not perfect, but I am too lazy to change the code.

Obviously a better solution is to use base64 to generate data-d-toc and add a unique identifier to prevent possible duplicate titles

Ellery · August 25, 2023, 5:22am

I currently do not have the authority to make such changes to my company’s forum, but I do appreciate your response!

Furthermore, I’d like to ask if the official team has considered incorporating support for other non-latin languages regarding this auto-generated table of contents in future DiscoTOC component releases? @Lilly @awesomerobot

Once more, thank you, everyone!

Lhc_fl · August 25, 2023, 5:27am

In truth, they have taken into consideration non-Latin languages, utilizing the slugify(h.textContent) method. I suspect that this slugify function was developed in accordance with the forum’s slug generation method. When it is not in ‘encode’ mode, issues tend to arise, although I have not personally tested this hypothesis.

In previous instances when we used the official version of the theme component our forum’s slug generation method was set to ‘none’, which gave rise to similar problems. May I therefore suggest that you attempt altering the setting to ‘encode’?

Additionally, considering the official team’s speed in fixing components… there’s an issue I reported last year on a component from the same team that still hasn’t been addressed. I suggest you apply to use my fork instead.

Ellery · August 25, 2023, 5:47am

I made an attempt to modify this setting, but the issues mentioned earlier still persist. The data-d-toc ID can only read numbers and letters, and there are still instances of duplicated table of contents IDs. I guess the crux lies elsewhere?

Lhc_fl · August 26, 2023, 3:49am

Update: I updated the code today. The suffix is now generated by index:

This improvement solves the problem that non-latin characters or the same title name generate the same anchor

const suffix = `${slugify(h.textContent)}-${post?.post_number}-${index}`;

Topic		Replies	Views
Having identical titles leads to identical ids and anchors Feature disco-toc , pr-welcome	4	423	July 5, 2023
Anchoring not working with non-Latin symbols Bug disco-toc	2	273	August 23, 2023
DiscoTOC - automatic table of contents Theme component official , disco-toc	131	59809	November 4, 2025
Anchors to headings do not always work Bug	3	737	November 9, 2021
TOC component and header IDs Support	3	1440	July 6, 2020

Anchoring not working when using non-latin characters in headings

Related topics