For those with admin access in checking the Web Crawler user Agents:
e.g. actual link, change URL for your site as needed.
https://swi-prolog.discourse.group/admin/reports/web_crawlers
Our site shows a sudden increase in Mastodon agents. I suspect these agents are from Mastodon sites. As I do not use Mastodon, I now have to investigate if this is a potential issue for our site or just something to be aware of.
Since the report as shown on the web page cuts off some of the needed info, I downloaded the report.
web-crawlers-251023-084425-10.zip (4.3 KB)
At the end, notice the lines like
http.rb/5.1.1 (Mastodon/4.2.20; +https://acc4e.com/),1
Can anyone shed some more light on the following?
- Are these agents related to the Mastodon social network sites?
- Will there be more appearing in the future because of the way Mastodon works? In other words, are these being created not on purpose but as a side effect of the way Mastodon is set up or used?
- Should/can they be refused as a crawler if they are not of value to a Discourse site?
Not a pressing issue at the moment, as all of the Mastodon agents only show a pageview of 1, while the top of the list for a Mozilla/5.0 agent shows 37,279.
Asked the Discourse AI bot about these.
https://ask.discourse.com/t/understanding-mastodon-agents-as-web-crawlers/16732
Let me know if the link works for others.
After some more research and talking with other admins on our site, we welcome these agents as they are used to generate link previews. (ref)
Interesting side thought worth sharing.
Until doing the research on this, I only thought of a web crawler as something that would index all public pages for a site and often revisit a site on a regular schedule. As such, the crawler would regularly appear in the list of web crawler agents that visited a site.
As noted in this blog
a “fetcher” - a specialized type of web crawler that retrieves content on behalf of the Mastodon platform.
So these Mastodon agents may be a one-off listing in the Web Crawler User Agents report.
Thus, it would be nice to see a new report that shows just the fetcher agents, as these are hitting very specific URLs of the site, and it would be nice to know what others find of value from the site.