وكلاء مستخدمي Web Crawler - وكلاء Mastodon يتزايد عددها

For those with admin access in checking the Web Crawler user Agents:

e.g. actual link, change URL for your site as needed.
https://swi-prolog.discourse.group/admin/reports/web_crawlers

Our site shows a sudden increase in Mastodon agents. I suspect these agents are from Mastodon sites. As I do not use Mastodon, I now have to investigate if this is a potential issue for our site or just something to be aware of.

Since the report as shown on the web page cuts off some of the needed info, I downloaded the report.

web-crawlers-251023-084425-10.zip (4.3 KB)

At the end, notice the lines like

http.rb/5.1.1 (Mastodon/4.2.20; +https://acc4e.com/),1

Can anyone shed some more light on the following?

  • Are these agents related to the Mastodon social network sites?
  • Will there be more appearing in the future because of the way Mastodon works? In other words, are these being created not on purpose but as a side effect of the way Mastodon is set up or used?
  • Should/can they be refused as a crawler if they are not of value to a Discourse site?

Not a pressing issue at the moment, as all of the Mastodon agents only show a pageview of 1, while the top of the list for a Mozilla/5.0 agent shows 37,279.


Asked the Discourse AI bot about these.

https://ask.discourse.com/t/understanding-mastodon-agents-as-web-crawlers/16732

Let me know if the link works for others.


After some more research and talking with other admins on our site, we welcome these agents as they are used to generate link previews. (ref)


Interesting side thought worth sharing.

Until doing the research on this, I only thought of a web crawler as something that would index all public pages for a site and often revisit a site on a regular schedule. As such, the crawler would regularly appear in the list of web crawler agents that visited a site.

As noted in this blog

a “fetcher” - a specialized type of web crawler that retrieves content on behalf of the Mastodon platform.

So these Mastodon agents may be a one-off listing in the Web Crawler User Agents report.

Thus, it would be nice to see a new report that shows just the fetcher agents, as these are hitting very specific URLs of the site, and it would be nice to know what others find of value from the site.

إعجابَين (2)

Cool, that likely means that something in your community got reposted by users on Mastodon. Because Mastodon is federated, the link preview crawlers will have different user agents. Both because the instances will be on different Mastodon versions and because it looks like Mastodon includes the URL of the community as part of the user agent.

Agreed. It could also be neat to group the user agents, so that you can see Mastodon link preview totals, Facebook onebox totals, Discourse onebox totals (from other communities) and so on.

5 إعجابات