Editing robot txt file


(Rahul Dhingra) #1

Continuing the discussion from Excluding user profiles in robots.txt (or allow edit of file):

I need to disallow google to index images and some content on my forum.Is it possible to edit robot.txt through backend or through terminal (putty) ?


#2

Well, you can just ssh into the instance, locate the robots.txt file and edit it, but I’m not sure if it would survive an upgrade.


(Uwe Keim) #3

Any update on this?

I want to exclude the Internet Archive crawler which would require something like

User-agent: ia_archiver
Disallow: /

to be added to the robots.txt file.

Any official supported way to modify the robots.txt file?


(Sam Saffron) #4

no official supported way of doing that, a PR to add support for something like this would be fine.


(Panteen Pro-V) #5

If i want to stop web crawler, how can I do this manually given that feature is not yet implemented?


(Sam Saffron) #6

Set allow index in robots txt to false in site settings.


(Panteen Pro-V) #7

You meant like this?

I’ve never have that option ticked and I still can see that the site have been crawled since.


(Sam Saffron) #8

what happens when you go to http://your-site-name/robots.txt?


(Panteen Pro-V) #9

This what I get

# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
User-Agent: *
Disallow: /

and this is the stats:

Last 7 days and 30 days, there are increasing in web crawlers.


(Sam Saffron) #10

so whoever is crawling you, they are ignoring your robots.txt


(Panteen Pro-V) #11

Any advices or solution to stop the crawling completely?


(Matt Palmer) #12

Are you after something Discourse-specific, or general advice on blocking unwanted visitors?


(Panteen Pro-V) #13

Anything including inside Discourse and outside of discourse, what I want is to stop my forum from being crawled by bots, etc… (ultimately i want to see the stats Web Crawlers part remain at 0 all the time)


(Sam Saffron) #14

Well, you can change it so your forum requires users to be logged in if you insist.


(Panteen Pro-V) #15

my forum has been used only internally inside my company and it requires user to login up front.


(Sam Saffron) #16

so… no drama, nothing can be crawled