Forum topics are not indexed by Baidu

Hi. We have a Chinese version of our website, but unfortunately, topics of the forum are not indexed by Baidu. I noticed that this forum is indexed by Baidu without any problems. The question both for admins of this forum and community members who launched forum in Chinese. What have you done to make Baidu index topics?

How old is your forum? Indexing takes time, depending on the search engine. Have you blacklisted any specific crawlers through the blacklisted crawler user agents site setting?

4 Likes

The forum was launched in April, so I guess that’s not the reason. The blacklisted crawlers settings are the same as on your screenshot. Also, there are no restrictions in robots.txt for Baidu bot.

1 Like

What does the Baidu search engine say about this? As far as I know, there used to be a section (ziyuan.baidu) in which you could get detailed information about the status of your site, added to Baidu.

Discourse does not require additional tuning for indexing. If the scanner is not blacklisted, then there should be no problems.

6 Likes

Update about the topic. We made an account on ziyuan.baidu.com. Nothing seems to be wrong about it. The crawler gets page content correctly. Also, we implemented logs on the server. Baidu does dozens of requests a day to topics pages with 200 server response.

One more interesting thing, that we are not alone. I checked the Baidu index for your customers from the corresponding page. At least 4 of them have similar problems:

Please advise if you have any ideas.

1 Like

Perhaps Baidu prefers sites hosted in China? Baidu is a regional search, of course, he is looking for other resources, but his main task is high-quality local search. So hard to say. You may need to do a little research on this. How the region affects this. Maybe Baidu has some other special requirements?

If Baidu receives the page, and as you write everything is fine, then internal ranking mechanisms may come into play. Which may not be software dependent.

Bypass speed also depends on many factors. For example, the relevance of information, the frequency of updates, the speed of posting back links and their frequency from other sites, etc.

2 Likes

Why wouldn’t you take this up with Baidu? Maybe because only your site contents (and the required registration phone number) are Chinese. If so then you’ve got the result I would expect.

This is just speculation without your site’s URL but we do know that Baidu prioritises the following, among other things:

  • simplified Chinese over other languages
  • Chinese hosted sites over hosting outside of China
  • Chinese TLD i.e. .cn sites

That’s why it is no help to look at Discourse customer sites that are English language, hosted outside China and without a Chinese TLD.

4 Likes

@Stranik @Remah
Thank you for your replies.
The URL of the website is not a secret – https://forum.cuba-platform.cn/.
It is on .cn TLD, the server is in Hong Kong and only Chinese language is used.

2 Likes

I assume that your problem has gone away now because I can search for your topics and find them. They are just a bit further down the search results than I expected: those I searched for were on page 2 of the search results even with the exact text of the topic title.

So most likely your site doesn’t yet have sufficient reputation with Baidu. Or does Baidu have a further requirement that your site has not fulfilled?

3 Likes

Which queries did you enter where our forum is presented? I check it this way: enter a query site:forum.cuba-platform.cn at Baidu search. Now I see only 5 links and non of them are links to topics.

Meanwhile we followed recommendations from Baidu and implemented a tool which pushes the new URLs to Baidu using cURL. Will get back with the results in a while.

You’re right that topics aren’t being indexed. Baidu finds topic titles in the topic list views but not the topic view.

I don’t know why Baidu would index a topic list but not a topic? It means the crawler is working on your site but not crawling topics. So I’d check your site configuration first.

1 Like

We have the same problem.

1 Like