Unfortunately there’s no foolproof way to block LLM scrapers if your site’s content is publicly accessible, many of them will ignore robots.txt and even try to appear to be a human visitor (using different user agents & IP addresses) to circumvent blocks. Hopefully some sort of legal regulation can put guardrails on the situation, because it appears many people would like a choice of whether or not their content is used this way!
5 Likes