Anchoring not working when using non-latin characters in headings

I believe I have come across a potential issue while using the auto-generated table of contents feature in DiscoTOC:

When utilizing headings of varying levels in languages other than English, such as Chinese, it seems that the data-d-toc IDs in the auto-generated table of contents only capture numerical digits and English letters from the headings. This situation can result in the creation of identical IDs quite easily, subsequently leading to incorrect links in the right-hand scrollbar.

In the image above, if the serial numbers within the headings are both 5, the resulting data-d-toc IDs will both be toc-h2-5. Consequently, this will lead to two distinct links erroneously directing to the same section.

However, by modifying the serial numbers to 1.5 and 2.5, the data-d-toc IDs will differ (toc-h2-15 and toc-h2-25), effectively ensuring accurate and appropriate links.

In order to ensure accurate linking within the scrollbar, is it advisable to keep the headings in English?

Furthermore, for languages like Chinese, would the most viable solution involve incorporating multi-level serial numbers (e.g., 3.5, 3.6.5, to the headings?


I have already mentioned this issue, and forked a copy


However, although it’s working, it is not perfect, but I am too lazy to change the code.

Obviously a better solution is to use base64 to generate data-d-toc and add a unique identifier to prevent possible duplicate titles

1 Like

I currently do not have the authority to make such changes to my company’s forum, but I do appreciate your response!

Furthermore, I’d like to ask if the official team has considered incorporating support for other non-latin languages regarding this auto-generated table of contents in future DiscoTOC component releases? @Lilly @awesomerobot

Once more, thank you, everyone!

In truth, they have taken into consideration non-Latin languages, utilizing the slugify(h.textContent) method. I suspect that this slugify function was developed in accordance with the forum’s slug generation method. When it is not in ‘encode’ mode, issues tend to arise, although I have not personally tested this hypothesis.

In previous instances when we used the official version of the theme component our forum’s slug generation method was set to ‘none’, which gave rise to similar problems. May I therefore suggest that you attempt altering the setting to ‘encode’?

其实他们考虑了non-latin language ,用到了slugify(h.textContent) ,我怀疑这个slugify函数是根据论坛的 slug generation method 生成的,当它不是 encode 的时候就会出问题,但我没试过。

之前用官方版插件时,我们论坛的 slug generation method 设置的是 none 就会出现和你一样的问题。要不你试试改成 encode 。


1 Like

试着改了下这个设置,但前面说到的问题依然存在,data-d-toc ID还是只能读取数字和字母,还是会有重复的目录ID。我感觉问题的关键不在这里?

I made an attempt to modify this setting, but the issues mentioned earlier still persist. The data-d-toc ID can only read numbers and letters, and there are still instances of duplicated table of contents IDs. I guess the crux lies elsewhere?

Update: I updated the code today. The suffix is now generated by index:

This improvement solves the problem that non-latin characters or the same title name generate the same anchor

const suffix = `${slugify(h.textContent)}-${post?.post_number}-${index}`;
1 Like