Dark Visitors

:information_source: Summary Integrates the Dark Visitors service with Discourse to keep track of undesired crawlers and scrapers visiting your forum.
:hammer_and_wrench: Repository Link https://github.com/magicball-network/discourse-darkvisitors
:open_book: Install Guide How to install plugins in Discourse

Features

Dark Visitors is a service that keeps track of agents (crawlers, scrapers, and other kinds of bot) visiting your websites. Their main attention is towards analyzing AI agents.

It offers two services:

  • robots.txt generation and monitoring
  • agent analytics; both server side and client side

The robots.txt service is gratis. The analytics services provides a free tier. I suggest you visit their website for more information.

This Discourse plugin connects to all these services, all of them optional and configurable to a degree.

robots.txt generation

Discourse already provides an option to configure the robots.txt. This plugin extends it. When enabled the plugin retrieves a list of agents in different categories (currently only AI categories are supported) and those which are missing in the already configured agents, it will add them. The list will be updated daily. This way when a new AI scrapper is recognized it will be added to your robots.txt.

This feature only works if Discourse manages the robots.txt and you have not manually overridden it. The plugin does not change the Blocked crawler user agents setting, it augments the robots.txt new missing agents. So you are still in full control of managing this list.

When you visit your site’s robots.txt you will see a leading comment of the last update, and the number of agents returned by Dark Visitors. The not already configured agents are added to the end of the list. They should be between Googlebot and the sitemap directive (if configured).

Agent analytics

Both server side and client side analytics can be individually enabled. It can be enabled for all visitors, or only unauthenticated visitors.

The server side analytics reports tracked visits to Dark Visitors. It will send the request path, the remote address of the visitor, the User-Agent header, and a few more browser headers.

There are some additional settings to which request are reported, see the settings below. By default only request which Discourse marks to track as views will be tracked. The following requests to Discourse will never be reported:

  • Request to the Admin section
  • Background and API requests

The client side analytics are handled by adding Javascript to your page which calls home to Dark Visitors under certain conditions:

  • The browser appears to be automated, or an AI browser
  • The user came from an AI chat service

All tracked requests count towards the events which affects your payment plan.

Configuration

You need to sign up with Dark Visitors in order to use this plugin. The free tier gives you 1,000,000 events per month. When that limit is reached you will not see any newer events in their analytics, but you can still send new information and keep using the robots.txt service.

After you sign up you must create a project (i.e. a site to track). This will provide you with an access token which is needed for the robots.txt and server side analytics functionality.

When you enable the robots.txt functionality it takes a short while before it is updated. Visit https://yoursite/robots.txt to see if it is working. It should have a comment at the top

# Augmented by Dark Visitors on 2025-05-07T12:46:00+00:00 with 28 agents

When you enable the service side analytics you can test to see if it works by requesting a test visit from the Dark Visitor’s project settings. It can take a few seconds. You should see the result in the Realtime page on Dark Visitors.

Settings

Name Description
darkvisitors enabled Global flag to enable the whole plugin
darkvisitors access token The secret access token needed for the robots.txt and server side analytics in order to communicate with Dark Visitors. You will find this in your Dark Visitor’s project under settings.
darkvisitors robots txt enabled When enabled the Disocurse robots.txt will be augmented with additional agents
darkvisitors robots txt agents The kind of agents to add to the robots.txt.
darkvisitors robots txt path The path to deny the agents access to. It is probably best to leave this at / so access to the whole site is rejected.
darkvisitors server analytics Enables server side analytics. I recommend to only enable it for anonymous users.
darkvisitors server analytics include Additional requests track. You can also track requests to the uploaded files, or even 404 Not Found request.
darkvisitors server analytics ignore Sub-strings in the user agents to ignore (case sensitive). If you use uptime monitoring I strong suggest to include their identifying user agent in this list.
darkvisitors client analytics Enable client side analytics. This will also give you insights of normal users visiting your forum while coming from an AI chat service.
darkvisitors client analytics project key For client side analytics you must configure the (public) project key. You can find this in your Dark Visitors project settings in the section JavaScript Tag, it is the code after project_key=
2 Likes