Anchoring not working when using non-latin characters in headings

I believe I have come across a potential issue while using the auto-generated table of contents feature in DiscoTOC:

When utilizing headings of varying levels in languages other than English, such as Chinese, it seems that the data-d-toc IDs in the auto-generated table of contents only capture numerical digits and English letters from the headings. This situation can result in the creation of identical IDs quite easily, subsequently leading to incorrect links in the right-hand scrollbar.

In the image above, if the serial numbers within the headings are both 5, the resulting data-d-toc IDs will both be toc-h2-5. Consequently, this will lead to two distinct links erroneously directing to the same section.

However, by modifying the serial numbers to 1.5 and 2.5, the data-d-toc IDs will differ (toc-h2-15 and toc-h2-25), effectively ensuring accurate and appropriate links.

In order to ensure accurate linking within the scrollbar, is it advisable to keep the headings in English?

Furthermore, for languages like Chinese, would the most viable solution involve incorporating multi-level serial numbers (e.g., 3.5, 3.6.5, 4.2.5.6) to the headings?

Reference:

这个问题我之前已经发现提出过了,还顺便fork了一份
I have already mentioned this issue, and forked a copy

https://meta.discourse.org/t/discotoc-automatic-table-of-contents/111143/399?u=lhc_fl

不过我这个虽然能用但是不完美,主要是懒得改了

However, although it’s working, it is not perfect, but I am too lazy to change the code.

Obviously a better solution is to use base64 to generate data-d-toc and add a unique identifier to prevent possible duplicate titles

1 Like

我暂时无权对公司的论坛做出这样的改动,但还是感谢你的回复!
I currently do not have the authority to make such changes to my company’s forum, but I do appreciate your response!

另外,我还想问,官方是否有考虑过在后续DiscoTOC插件的正式版中,加入这一功能对其他语言的支持呢?
Furthermore, I’d like to ask if the official team has considered incorporating support for other non-latin languages regarding this auto-generated table of contents in future DiscoTOC component releases? @Lilly @awesomerobot

再次感谢各位!
Once more, thank you, everyone!

In truth, they have taken into consideration non-Latin languages, utilizing the slugify(h.textContent) method. I suspect that this slugify function was developed in accordance with the forum’s slug generation method. When it is not in ‘encode’ mode, issues tend to arise, although I have not personally tested this hypothesis.

In previous instances when we used the official version of the theme component our forum’s slug generation method was set to ‘none’, which gave rise to similar problems. May I therefore suggest that you attempt altering the setting to ‘encode’?

其实他们考虑了non-latin language ,用到了slugify(h.textContent) ,我怀疑这个slugify函数是根据论坛的 slug generation method 生成的,当它不是 encode 的时候就会出问题,但我没试过。

之前用官方版插件时,我们论坛的 slug generation method 设置的是 none 就会出现和你一样的问题。要不你试试改成 encode 。

另外考虑到官方对组件的修复速度……隔壁有个组件我去年提的问题到现在都没下文,我建议你申请一下用我fork的版本得了

1 Like

试着改了下这个设置,但前面说到的问题依然存在,data-d-toc ID还是只能读取数字和字母,还是会有重复的目录ID。我感觉问题的关键不在这里?
我问下我领导吧,感谢解答~

I made an attempt to modify this setting, but the issues mentioned earlier still persist. The data-d-toc ID can only read numbers and letters, and there are still instances of duplicated table of contents IDs. I guess the crux lies elsewhere?

Update: I updated the code today. The suffix is now generated by index:

This improvement solves the problem that non-latin characters or the same title name generate the same anchor

const suffix = `${slugify(h.textContent)}-${post?.post_number}-${index}`;
1 Like