Stop scraping scripts won't run on my discourse site

Hello.

I am going to open a discourse platform on my server.

I was trying to stop scraping scripts won’t run on my site. I was thinking to use a plugin to stop them, but I couldn’t find a proper one.

I tried to use PHP script to scrape the site and it’s working.

Please help me to stop the scraping script. Any help will be appreciated. Thanks in advance.

Anna.

The easiest thing would be to make your site “login required”.

1 Like

Thanks so much, @Mittneague for your reply.

Is there any way to stop scraping script won’t run on the discourse site?

Yes, the aforementioned “login required” setting. When any HTTP request is made to the site a “must login” page wil be returned Only registerd members that are logged in will be able to get any further site content.

But I don’t want that option to the live site.

Can’t I have a better option?

Stopping scrapers will be an arms race. If they’re dedicated to scraping your content, you won’t be able to stop them.

That said, great first steps are:

  • using login required
  • blocking the scraper’s User-Agent
  • detecting scraping activity and using a tarpit to slow them down or a honeypot to generate irrelevant content for them to pull down and taint their data

The problem which makes solving this difficult is: How do you differentiate from scraping traffic vs. a normal user?

5 Likes

Thanks, @supermathie for your reply.

I am tracking IP. So means one user per IP.

If the user excesses the access count of limit, then I want to let the site block the content unless he passes the reCAPTCHA v2 by google.