This is an SQL version of the Dashboard Report for Web Crawler User Agents.
This Dashboard Report lists the top web crawler user agents, sorted by pageviews, providing insight into which crawlers are most active on a site.
--[params]
-- date :start_date = 2024-01-06
-- date :end_date = 2024-02-07
SELECT
user_agent,
SUM(count) AS pageviews
FROM web_crawler_requests
WHERE date BETWEEN :start_date AND :end_date
GROUP BY user_agent
ORDER BY pageviews DESC
SQL Query Explanation
- Parameters: The query accepts two parameters,
:start_date
and:end_date
, which define the time range for the report. Both date parameters accept the format ofYYYY-MM-DD
. - SELECT: The query retrieves data from the
web_crawler_requests
table, which logs requests made by web crawlers to the forum. Each record in this table includes theuser_agent
of the crawler and acount
of pageviews. - SUM: The
SUM(count)
function calculates the total number of pageviews for eachuser_agent
within the specified date range. - Filtering: The
WHERE
clause filters records to include only those that fall within the specified date range, using thedate
column. - Aggregation: The
GROUP BY
clause groups the results byuser_agent
, ensuring that the data is summarized for each web crawler individually. - Sorting: The
ORDER BY
clause sorts the results in descending order ofpageviews
, placing the most active web crawlers at the top of the report.
Records from the
web_crawler_requests
table are automatically deleted after 30 days. Results for this report are only be available for the past 30 days from when the report is run, however, you can still use the data parameters for this query to filter results within last 30 days.
Example Results
user_agent | pageviews |
---|---|
Mozilla/5.0 (compatible; SeekportBot; +https://bot.seekport.com) | 1406 |
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) | 724 |
Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot) | 533 |
… | … |