We’re looking to identify which specific topics are being indexed/crawled most frequently by AI crawlers to ensure our most “cited” content isn’t feeding LLM hallucinations. Is there a way with Data Explorer to attribute crawler hits to individual topic IDs?
i could be wrong, but i don’t think Discourse tracks webcrawler traffic at a category or topic level. (perhaps there is some query math that could be applied to derive the figures?
)
Most Al crawler activity do not identify themselves via the user agent. They generally claim to be outdated Chrome versions. The only way to identify them is by the fact that they only visit a single page, they don’t stay on the site to visit a second page. They often live in a data center, but I have also seen a lot of single page traffic from mobile and residential IPs, which I assume are via compromised devices.
The most of AI crawlers tell user agent. Those what you are referring are seo bots/crawlers and other malicious/abussive/unwanted non-human actors.