Effectively logging search queries

sam · June 2, 2017, 8:29pm

For 1.9 we plan to add a search log.

Unfortunately, the “log every search the server makes” approach is incorrect as we perform searches as people are typing which will result in a massively noisy log.

Proposal

Create a new table

term, user_id (nullable), ip_address, created_at, clicked_topic_id (nullable), source_id (either header or fullpage)

Log on server with the following algorithm on search

UPDATE term
SET term = :new_term
        created_at: :now
WHERE created_at < 5.seconds.ago AND
             position(term in :new_term) = 0 AND
             (user_id = :user_id OR ip_address = :ip_address)

term: new_term,
now: Time.zone.no,
ip_address: request.ip 

If update touches zero rows, then insert a **new** search log row

Or, in English

Update existing search log row IF:
- Same user (for anon use ip address, for logged in use user_id)
- Previous search started with the text of current search, eg: previous was “dog” and new is “dog in white”
- Previous search was logged less than 5 seconds ago
On click on search result (in either full page search or header) update the clicked_topic_id, (have search results return log id, then update it based on log id + user match + in last 10 minutes)

Limiting log size

So the log does not grow forever there should be a site setting for maximum rows to store. Default should be about a million.

A weekly job can delete oldest rows or something.

Thoughts? Feedback?

elijah · June 8, 2017, 4:11am

So if I search for something and open three results in three tabs, what happens? Last one “wins”?
The behavior of “multiple clicks for one search, save last only” is better for times when someone searches, clicks, hits back, clicks again, hits back, clicks a final time. New tabs subvert that though.

If I search for a topic, right click to copy URL, and then paste that in as a “you should look here” answer, is that counted as a “click”?
Copying URLs from posts does not increase click count, for reference. But this is a case when the desired search result really should be saved.

codinghorror · July 11, 2017, 2:01am

Hopefully we can make a little progress on this, maybe this week @eviltrout?

eviltrout · July 14, 2017, 6:34pm

I’ve implemented this mostly as specified in the original post:

https://github.com/discourse/discourse/commit/97e211f8372993622698aceb3eeb801a150aa9e5

I also went in and replaced all the mock tests with integration tests and added a separate commit for the clean up job.

eviltrout · July 14, 2017, 7:57pm

Actually I forgot about the click tracking

Sorry it’s been a difficult couple of days for me, I’ll get it in on Monday.

eviltrout · July 17, 2017, 7:52pm

This commit adds click tracking support:

https://github.com/discourse/discourse/commit/cdb3706025b5c7b233ec7a36f3c0ac29d6970254

meglio · December 9, 2017, 7:27am

May someone please explain how the following columns in the current search_logs table work?

search_result_id
search_type
search_result_type

Also, it would be useful if all columns contained detailed description, and those descriptions were shown in the Data Explorer plugin, here:

Mittineague · December 9, 2017, 7:54am

I’m not sure what you mean by “how they work” but as far as for the “descriptions” those are “column definitions” or “schema information” and are recognizable as such for those experienced with database tables.

To gain an insight on how they “work” you can look at the tables corresponding model. For example:
https://github.com/discourse/discourse/blob/master/app/models/search_log.rb#L95-L106

and near the top of that file you’ll see
https://github.com/discourse/discourse/blob/master/app/models/search_log.rb#L6-L20

From there you can search for variable and function names etc to find other files that interact with the table (a good IDE or text editor helps a lot).

Does that help any?

meglio · December 9, 2017, 8:08am

Yeah, sorry for being unclear. By this I just mean “what they mean”.

A description for every column would be useful in database schema though.

codinghorror · December 9, 2017, 8:20am

@techAPJ was the last to work on this, so maybe he can comment.

Mittineague · December 9, 2017, 8:24am

I guess I may be peculiar, but for me the definitions in combination with good descriptive field names are what I’m used to reading when I read schema.

It requires a shift from reading English to reading code, and I have doubts that “translating” code to English would be practical if even possible. That said, it is often easy enough to explain a small piece of code to answer a specific question.

techAPJ · December 9, 2017, 8:58am

I hope the code @Mittineague linked above explains what they mean.

Primary ID of search_result_type entity.

sam · December 28, 2017, 12:56am

This is done and dusted, closing.

Topic		Replies	Views
Search Logs Page Announcements	21	3173	August 30, 2018
Search Logs functionality improvements Feature search	1	552	September 2, 2021
How is Search Terms report produced? Data & reporting dashboard-reports	4	560	July 4, 2023
Reference guide for the Search Logs page Site Management reference , reporting	0	814	September 29, 2022
Search Logs - How to get a complete list for a given date range? Data & reporting sql-query	6	974	August 1, 2023

Effectively logging search queries

Proposal

Limiting log size

Related topics