I can send you the raw data for sure; which file is it?
Have been seeing this myself lately – anything I can do to send data along or get it looked at? This thread seems to have died down. Thanks!
Where have you seen it exactly? Can you share a screenshot?
Would a “simple” fix be to just block anonymous users from searching for anything that regexes like a domain?
It looks like you could win by simply banning all searches for .cn
domains, somehow, but the actual impact is so low here (and what they are doing is so pointlessly ineffective) it’s hard to justify any engineering effort.
Our site is seeing something similar, but slightly different.
We appear to have some kind of bot performing a daily search on our forum for a few keywords.
Is there a way to block specific search terms?
Or, a way to link a search term to an IP address, then in turn to block that IP?
These, for example, are significantly skewing our search term data analysis:
Capybara is part of discobot. That you see that is a good sign that discobot is working and people are making it through the new user tutorial.
The others I do not recognize.
I am not sure what the issue is honestly with having some random keywords on this list. A bit weird, sure, but you are the only one who sees it.
Maybe some ability to prune this list would be interesting. Eg never show me again how many people searched for capybara on my site. But not a high priority methinks.
Sorry to revive this old topic, I am facing a lot of the same spam.
How would one ban all searches for .cn
domains?
I’ve already added a filter to the blocked word list, but searches are still coming through
Not sure what the point of these are, but it’s extremely annoying to see so much spam between search logs
We’re still getting all manner of useless stuff appearing in our search results.
I’m guessing its a (search) bot of some sorts as it all comes flooding in on one date, then disappears for another month.
I tried adding some keywords to our blocked words list but that doesn’t seem to stop people searching for those keywords, it only stops them posting those words.
I’m sure I’ve asked this before, can we turn off search functionality for users that are not logged in?
Or does anyone have any suggestions on how to resolve or work around this? Our search logs are meaningless when they’re full of bots/junk
Because originally:
But now our search logs are just absolutely useless.
Look at this:
Only one valid search appears in the entire month.
It’s beyond useless
I genuinely welcome any advice on how to tackle this, or how to prevent non-logged in users from performing a search.
There is a hidden site setting named rate_limit_search_anon_global
that can be used to limit this.
Details are in site_settings.yml
, it’s expecting an integer value:
Thanks @Canapin
I was thinking I’d have to enable it via the rails console like some other hidden settings.
Would you happen to know what that value is measured in? Is it the number of searches allowed per minute or something?
Exactly this
For the record, you can see the unit of time it uses here: discourse/search_controller.rb at 8222810099de787e844881da42df1702700b9760 · discourse/discourse · GitHub
RateLimiter.new(nil, "search-min-anon-global", SiteSetting.rate_limit_search_anon_global, 1.minute).performed!
I don’t know how Discourse works, but I always have a copy of the repo on my computer to search for some terms in the code and grab some info, it’s very helpful.
Github search is less effective and often doesn’t return anything.