Dark Visitors

:information_source: Summary Integrates the Dark Visitors service with Discourse to keep track of undesired crawlers and scrapers visiting your forum.
:hammer_and_wrench: Repository Link https://github.com/magicball-network/discourse-darkvisitors
:open_book: Install Guide How to install plugins in Discourse

Features

Dark Visitors is a service that keeps track of agents (crawlers, scrapers, and other kinds of bot) visiting your websites. Their main attention is towards analyzing AI agents.

It offers two services:

  • robots.txt generation and monitoring
  • agent analytics; both server side and client side

The robots.txt service is gratis. The analytics services provides a free tier. I suggest you visit their website for more information.

This Discourse plugin connects to all these services, all of them optional and configurable to a degree.

robots.txt generation

Discourse already provides an option to configure the robots.txt. This plugin extends it. When enabled the plugin retrieves a list of agents in different categories (currently only AI categories are supported) and those which are missing in the already configured agents, it will add them. The list will be updated daily. This way when a new AI scrapper is recognized it will be added to your robots.txt.

This feature only works if Discourse manages the robots.txt and you have not manually overridden it. The plugin does not change the Blocked crawler user agents setting, it augments the robots.txt new missing agents. So you are still in full control of managing this list.

When you visit your site’s robots.txt you will see a leading comment of the last update, and the number of agents returned by Dark Visitors. The not already configured agents are added to the end of the list. They should be between Googlebot and the sitemap directive (if configured).

Agent analytics

Both server side and client side analytics can be individually enabled. It can be enabled for all visitors, or only unauthenticated visitors.

The server side analytics reports tracked visits to Dark Visitors. It will send the request path, the remote address of the visitor, the User-Agent header, and a few more browser headers.

There are some additional settings to which request are reported, see the settings below. By default only request which Discourse marks to track as views will be tracked. The following requests to Discourse will never be reported:

  • Request to the Admin section
  • Background and API requests

The client side analytics are handled by adding Javascript to your page which calls home to Dark Visitors under certain conditions:

  • The browser appears to be automated, or an AI browser
  • The user came from an AI chat service

All tracked requests count towards the events which affects your payment plan.

Configuration

You need to sign up with Dark Visitors in order to use this plugin. The free tier gives you 1,000,000 events per month. When that limit is reached you will not see any newer events in their analytics, but you can still send new information and keep using the robots.txt service.

After you sign up you must create a project (i.e. a site to track). This will provide you with an access token which is needed for the robots.txt and server side analytics functionality.

When you enable the robots.txt functionality it takes a short while before it is updated. Visit https://yoursite/robots.txt to see if it is working. It should have a comment at the top

# Augmented by Dark Visitors on 2025-05-07T12:46:00+00:00 with 28 agents

When you enable the service side analytics you can test to see if it works by requesting a test visit from the Dark Visitor’s project settings. It can take a few seconds. You should see the result in the Realtime page on Dark Visitors.

Settings

Name Description
darkvisitors enabled Global flag to enable the whole plugin
darkvisitors access token The secret access token needed for the robots.txt and server side analytics in order to communicate with Dark Visitors. You will find this in your Dark Visitor’s project under settings.
darkvisitors robots txt enabled When enabled the Disocurse robots.txt will be augmented with additional agents
darkvisitors robots txt agents The kind of agents to add to the robots.txt.
darkvisitors robots txt path The path to deny the agents access to. It is probably best to leave this at / so access to the whole site is rejected.
darkvisitors server analytics Enables server side analytics. I recommend to only enable it for anonymous users.
darkvisitors server analytics include Additional requests track. You can also track requests to the uploaded files, or even 404 Not Found request.
darkvisitors server analytics ignore Sub-strings in the user agents to ignore (case sensitive). If you use uptime monitoring I strong suggest to include their identifying user agent in this list.
darkvisitors client analytics Enable client side analytics. This will also give you insights of normal users visiting your forum while coming from an AI chat service.
darkvisitors client analytics project key For client side analytics you must configure the (public) project key. You can find this in your Dark Visitors project settings in the section JavaScript Tag, it is the code after project_key=
5 Likes

Thanks for this, elmuerte! I’ve set it up and it’s working great.

I see that in the plugin settings, the agent types that can be selected for exclusion via robots.txt are:

  • AI Data Scraper [selected by default]
  • Undocumented AI Agent [selected by default]
  • AI Agent
  • AI Assistant
  • AI Search Crawler

But the complete list of Dark Visitors agent types, per darkvisitors.com, is:
(bold = additional)

Crawlers & Scrapers…

  • AI Assistant
  • AI Data Scraper
  • AI Search Crawler
  • Archiver
  • Developer Helper
  • Fetcher
  • Intelligence Gatherer
  • Scraper
  • Search Engine Crawler
  • Security Scanner
  • SEO Crawler
  • Uncategorized Agent
  • Undocumented AI Agent

AI Agents…

  • AI Agent
  • Headless Agent

Not all of these agent types are things one would want to block, but I’d like to include a few like Scraper, AI Data Scraper, SEO Crawler…

Are these additional agent types just newer than your plugin? Could they be added to the current list choices in settings.yml?

Except robots.txt is just a request. A bot follows it or not. Firewall is the only way to stop those.

Yep, I understand that – but since Dark Visitors only works with robots.txt, I’d like to make it work as well as it can.

(I’m actually reading a couple of posts right now where you suggest real blocking with Nginx reverse proxy, but I’m not sure if I need to go that far yet.)

That is a bit hard core. But Dark Visitor should work with banlist of Discourse to be usefull at some level. Sure, with that you don’t need to add manually i.e. OpenAI or else that follows robots.txt.

I contacted Dark Visitors about this on may 3rd this year, and their response was “Not at the moment”. But I see the current documentation lists even more types now.

At this moment, the following types are supported by the Dark Visitors API:

I made sure the setting in Discourse can be extended with additional agent types by just adding it.

After adding the new type and saving the setting the robots.txt should be updated right away will all the new agents.

1 Like

OMG, I totally missed the “Search or create” field. My theme has a really low contrast there and it escaped my eyes. Thank you for the clarification!