Dashboard Report - Web Crawler User Agents

This is an SQL version of the Dashboard Report for Web Crawler User Agents.

This Dashboard Report lists the top web crawler user agents, sorted by pageviews, providing insight into which crawlers are most active on a site.

-- date :start_date = 2024-01-06
-- date :end_date = 2024-02-07

  SUM(count) AS pageviews
FROM web_crawler_requests
WHERE date BETWEEN :start_date AND :end_date
GROUP BY user_agent
ORDER BY pageviews DESC

SQL Query Explanation

  • Parameters: The query accepts two parameters, :start_date and :end_date, which define the time range for the report. Both date parameters accept the format of YYYY-MM-DD .
  • SELECT: The query retrieves data from the web_crawler_requests table, which logs requests made by web crawlers to the forum. Each record in this table includes the user_agent of the crawler and a count of pageviews.
  • SUM: The SUM(count) function calculates the total number of pageviews for each user_agent within the specified date range.
  • Filtering: The WHERE clause filters records to include only those that fall within the specified date range, using the date column.
  • Aggregation: The GROUP BY clause groups the results by user_agent, ensuring that the data is summarized for each web crawler individually.
  • Sorting: The ORDER BY clause sorts the results in descending order of pageviews, placing the most active web crawlers at the top of the report.

:discourse: Records from the web_crawler_requests table are automatically deleted after 30 days. Results for this report are only be available for the past 30 days from when the report is run, however, you can still use the data parameters for this query to filter results within last 30 days.

Example Results

user_agent pageviews
Mozilla/5.0 (compatible; SeekportBot; +https://bot.seekport.com) 1406
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 724
Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot) 533