| Summary | Add hidden links to content which can poison web scrapers that do not play nice. | |
| Repository Link | https://github.com/elmuerte/discourse-poison-fountain | |
| Install Guide | How to install plugins in Discourse |
Features
To every generated page it will add links to pages with poisoned content. If a bad web scraper consumes this content and uses it to train a LLM it will negatively affect the resulting model.
These links are hidden for users, they are marked with rel="nofollow" and by default the robots.txt will tell web spiders to not consume this content. The hidden links are only added for non-authenticated requests.
The plugin by default uses the poisoned content from RNSAFFN. See that page for some more information. You can change the poison source in the settings.
The poisoned content is served from pages with URLs like with /dpf/<some-random-slug>/<id>. With the default configuration you could create a fail2ban rule on that path for IPs which request pages from /dpf/ a few times to fend of bad scrapers.
Configuration
You only need to install and enable the plugin for it to start doing its thing in the background. Can you tune some additional settings.
Settings
Include a table of settings and setting descriptions
| Name | Description |
|---|---|
poison_fountain_source |
The website which generates the content which will be served from the poisoned pages. |
poison_fountain_textual_only |
Only accept textual content from the poison source. This will prevent serving binary content. |
poison_fountain_force_plain_text |
Always serve the content as text/plain even if the source said it some something like, like HTML. |
poison_fountain_update_robots_txt |
Enabled by default, this will add the poisoned content to the exclusion list. Web spiders which respect robots.txt would completely ignore the poisoned content. |
poison_fountain_cache_hours |
Hours to cache the content before retrieving new content. Maximum of 24 hours. |
poison_fountain_entries |
Number of poisoned entries to keep around |
poison_fountain_link_count |
Number of links to add to the generated HTML pages |
This project is not affiliated with RNSAFFN. It provides an integration with their service. This integration can be configured to use an other similar working service.
You should realize that by using this plugin you will be trusting the content that is generated by the used poison fountain, and that you are forwarding it. By default this plugin will try to make this content “mostly harmless”, serving only textual content as plain text.