Discourse Poison Fountain

:information_source: Summary Add hidden links to content which can poison web scrapers that do not play nice.
:hammer_and_wrench: Repository Link GitHub - elmuerte/discourse-poison-fountain: Discourse plugin which adds hidden poisoned content to trip bad webscrapers
:open_book: Install Guide How to install plugins in Discourse

Features

To every generated page it will add links to pages with poisoned content. If a bad web scraper consumes this content and uses it to train a LLM it will negatively affect the resulting model.

These links are hidden for users, they are marked with rel="nofollow" and by default the robots.txt will tell web spiders to not consume this content. The hidden links are only added for non-authenticated requests.

The plugin by default uses the poisoned content from RNSAFFN. See that page for some more information. You can change the poison source in the settings.

The poisoned content is served from pages with URLs like with /dpf/<some-random-slug>/<id>. With the default configuration you could create a fail2ban rule on that path for IPs which request pages from /dpf/ a few times to fend of bad scrapers.

Configuration

You only need to install and enable the plugin for it to start doing its thing in the background. Can you tune some additional settings.

Settings

Include a table of settings and setting descriptions

Name Description
poison_fountain_source The website which generates the content which will be served from the poisoned pages.
poison_fountain_update_robots_txt Enabled by default, this will add the poisoned content to the exclusion list. Web spiders which respect robots.txt would completely ignore the poisoned content.
poison_fountain_cache_hours Hours to cache the content before retrieving new content. Maximum of 24 hours.
poison_fountain_entries Number of poisoned entries to keep around
poison_fountain_link_count Number of links to add to the generated HTML pages
3 Likes