Using FILTER to summarise data

Overview

In this report, we’ll explore a SQL query that uses the FILTER function to segment user data based on their activity summary preferences. The FILTER function is a powerful tool in SQL that allows for conditional aggregation within a single query, providing a way to apply aggregate functions like COUNT(), SUM(), AVG(), etc., to a subset of data.

The SQL Query Explained

The provided SQL query counts the number of users with different activity summary settings. Here’s a breakdown of each part of the query:

SELECT 
    COUNT(*) FILTER (WHERE uo.email_digests = false) AS "Disabled",
    COUNT(*) FILTER (WHERE uo.email_digests = true) AS "Total Enabled",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND u.last_seen_at > CURRENT_DATE - INTERVAL '180 DAYS') AS "Skipped",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 30) AS "Half Hourly",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 60 ) AS "Hourly",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 1440) AS "Daily",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 10080) AS "Weekly",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 43200) AS "Monthly",
    COUNT(*) FILTER (WHERE uo.email_digests = true AND uo.digest_after_minutes = 259200) AS "6 Monthly"
FROM user_options uo
  JOIN users u ON u.id = uo.user_id
WHERE uo.user_id > 0
  • COUNT(*) FILTER (WHERE condition) counts the number of rows that match the specified condition.
  • The AS clause renames the output columns for better readability.
  • The query joins the user_options table with the users table to access user settings and activity data.
  • The WHERE uo.user_id > 0 condition filters out any system users or non-standard entries.

How the FILTER Function Works

The FILTER function is used in conjunction with an aggregate function to count only the rows that meet specific criteria. It’s a more readable and often more efficient alternative to using subqueries or CASE statements for conditional aggregation.

Here’s the general syntax for using FILTER:

AGGREGATE_FUNCTION(column_name) FILTER (WHERE condition) AS alias_name

When to Use the FILTER Function

The FILTER function is particularly useful when you need to perform multiple counts (or other aggregations) with different conditions in a single query. It simplifies the query and avoids the need for multiple subqueries or complex CASE statements.

You might use the FILTER function to segment users by their preferences, posts by their Reactions, or topics by their categories, among other use cases. It’s a versatile tool that can enhance data reporting and analysis.

Conclusion

The FILTER function is a valuable addition to any SQL toolkit. It allows for clear and concise conditional aggregation, making complex data queries more manageable and understandable.

4 Likes