Discovering older topics: drinking a lake through a straw

TL;DR: Are there any plug-ins or settings which add a selection of random topics to the bottom of a topic (not just the most recent) even when not logged-in so that a crawler can discover all the older topics.

Having now merged an older forum into the current forum, I have the curious problem of ‘surfaceability’/discoverability.

In category and latest views, you can see only the most recent topics. Like the surface of a lake. The view contains only a limited number of posts and infinite scrolling makes it difficult to ‘go deep’.

Having multiple categories widens the ‘straw’, but even if you have say, 50 categories and look 100 deep, that is still only 5000 topics deep.

Only 1%-2% of the topics are accessible, the rest are hidden below the surface.

I thought sitemaps.xml could help, but these expose only a handful of topics in the default configuration.

Huge chunks of content are effectively blocked from crawlers.

I was wondering, are there any plug-ins or settings which add a selection of random topics to the bottom of a topic (not just the most recent) even when not logged-in so that a crawler can discover all the older topics.

1 Like

Crawlers should use sitemap. Not scraping randomly.

2 Likes

Have you considered bumping older topics? Seems like an easier way to resurface old topics without creating plugins or something.

In category settings you can find these settings

This works well here on meta as it brings back old topics which can then be ‘updated’ with latest info / closed. Its a good way of curating your content too. Here is an example from 2020 which has been bumped

1 Like

Matter of taste how good system bumping is. Or if it is done because of bots, not for humans.

Here bumped topics irritates me big time. I don’t know why I should see overdated topics because of there is some not-actual need to close those.

1 Like

Wait… what? Why do you think that?

I looked at the generated sitemap and it had only half a month of posts on there in sitemap_1.xml. sitemap_recent.xml has even fewer.

And did you check sitemap_2.xml etcetera?

Hence the word “recent” in the name.

That’s the issue, there are no further sitemap pages other than sitemap_1.xml and that one has fewer than the 10k max URLs specified in the settings.

I know, I mention it in case people think I’m referring to that file.

Maybe I’ll see if I can re-trigger a sitemap generation somehow.

And are all those topics that are missing public? Can you post or PM the forum this is about?

The vast majority of the topics are public.

I see that sitemap_1 has the max 10k urls in it (generated just under an hour ago). No other sitemap pages. I’ll wait 15 minutes when the next scheduled generation should take place to see if the others appear.

I see in Sidekiq that the regeneration job did run, but I don’t know why it is producing just sitemap_1.xml and not other pages.

Jobs::RegenerateSitemaps 48 minutes ago OK 257ms in 11 minutes default

I can PM you the forum link if the regeneration job scheduled in 11 mins doesn’t fix it. I also bumped it to 50k urls so even if it doesn’t generate further pages, that single page will at least have 5 times the number of URLs.

EDIT: Just an update. The job did run and created the new larger 50k sitemap, but again just one page.

EDIT2: After letting it settle a bit, I’m happy to report the additional sitemap pages have been generated in the most recent job. I’m not sure why they didn’t generate in the earlier Jobs.

Yes, but bumping is unstructured and again is just a thin straw: even if you bump 100 posts per day in 50 categories (which would also render the forum useless as it would bump out all recent content) that still only gives you the same 2% of topics.

I guess I could try to implement something like the ‘suggested topics’ at the bottom of each thread but with a different algorithm to tilt towards exploration.

You are bypassing one thing. Bumping helps only if

  • a useful bot is scraping at that moment
  • a useful bot will follow such links

In most of cases that will not happend. Only what to get is annoyed users — unless those bumpings happends only in non-JS versions.

But one question: do you want all this just because your sitemaps may be broken, or don’t you trust on sitemaps?

The AI related topics ai find are the best way to discover old topics

2 Likes

I think sitemaps solves the crawler issue. I started a different topic to discuss how to increase browsability and discoverability in this thread: Easy ways to navigate and browse large categories? - #2 by Jagster

There’s somewhat of a crossover between searchability but it is slightly different.

The impact of the fixed sitemaps was immediate:

EDIT: just to say that the AI captioning is scarily good. I didn’t expect it to interpret the chart well!

1 Like