List of emails of users watching a specific category

I have a Data Explorer query that runs this:

SELECT * FROM category_users WHERE category_id = '10'

This gives me a result that looks like this:

How can I also display the users’ email in this output?

(To forestall privacy discussions let me note the following: we use a private Discourse for paying members, who have individually given consent to our use of their personal information for the purposes of delivering services to them. We have systems that don’t interoperate in an automated fashion and we use emails to manually connect the users in two different systems.)

1 Like

You need to join the user_emails table based on the user_id from the category_users table. Give something like this a try:

SELECT
cu.*,
ue.email
FROM category_users cu
JOIN user_emails ue
ON ue.user_id = cu.user_id
WHERE category_id = '10'
AND ue.primary = true
2 Likes

Thank you, Simon, and apologies for taking so long to respond to you!

I just tried your query and it does exactly what I wanted!:pray:

Is there a way to get the same data site-wide, not just for a specific category?

I’m asking for this because we’re planning to restructure our forum to use more granular categories, which makes my plan to create a separate Data Explorer query for each category less tenable.

I figured out how to ask for multiple categories using something like this:

WHERE (category_id = '48') OR (category_id = '66') OR (category_id = '57')

But I’ll have to remember to update the query after changing categories, which I’m very like to forget to do :smiley:

1 Like

You can just remove the category_id = <number> filter altogether, so the query would look something like:

SELECT
    cu.*,
    ue.email
FROM category_users cu
JOIN user_emails ue
  ON ue.user_id = cu.user_id
WHERE ue.primary = true

The Data Explorer will display category names for you, but they won’t show up when you export the results. If that’s a problem for you, you can explicitly add the category name as a column, something like this:

SELECT
    c.name,
    cu.*,
    ue.email
FROM category_users cu
JOIN user_emails ue
  ON ue.user_id = cu.user_id
JOIN categories c
  ON cu.category_id = c.id
WHERE ue.primary = true
ORDER BY c.name
3 Likes

Thanks for the query @simonk!

I don’t understand why you used WHERE ue.primary = true instead of AND ue.primary = true. Does the query always require a WHERE?

Not exactly. It might be more clear if we reformat @simon’s query slightly:

SELECT
    cu.*,
    ue.email
FROM category_users cu
JOIN user_emails ue ON ue.user_id = cu.user_id
WHERE (category_id = '10' AND ue.primary = true)

The category_id and ue.primary conditions are both part of the WHERE clause, joined with AND. If you remove one of the conditions, you remove the AND but you keep the WHERE clause.

Most simple SQL queries follow this form:

SELECT <things_you_want>
FROM <tables>
WHERE <filter_conditions>

You can omit the WHERE clause altogether, but then you’ll get back every row from the tables you specified.

Here’s your original query (reformatted):

SELECT *
FROM category_users
WHERE category_id = '10'
  • SELECT *” means that you want the query to return every column from all the tables involved.

  • FROM category_users” indicates the table that you want to query. The category_users table contains rows looking something like this:

    id category_id user_id notification_level
    1 1 1 3
    2 1 2 3
    3 3 1 3

    category_id and user_id are called foreign keys because they point at a row in another table (in this case, the categories and users tables). So the 3 rows above mean that the user with id 1 is watching categories 1 and 3, and the user with id 2 is watching category 1. The notification_level indicates whether they are Watching, Watching First Post or Tracking.

  • WHERE category_id = '10'” means that you are only interested in rows where the value in the category_id column is 10. Without that line, you would get every row from the category_users table.

@simon gave you a new version that added the user’s email address:

This query made a few changes from your original, for 2 reasons: Email addresses are stored in a different table (the user_emails table), and users can have more than one email address.

  • In the SELECT clause:

    • cu.*” means “all the columns from the cu table”
    • ue.email” means “the email column from the ue table”
  • In the FROM clause:

    • The category_users table now has an alias, “cu”, which saves some typing if you need to refer to it more than once.

    • We’ve JOINed to the user_emails table, and given it the alias ue.

      The user_emails table contains rows like this:

      id user_id email primary
      1 1 alex@example.com true
      2 1 alex@other.example.com false
      3 2 simon@example.com true

      Which means that the user with id 1 has 2 email addresses, alex@example.com (the primary address) and alex@other.example.com (a secondary address). The user with id 2 only has one address.

      When you JOIN 2 tables in SQL, you normally need to tell the database what the join condition is. If you don’t do that, the database doesn’t know which values in each of the tables are meant to match up, so you’ll end up with every possible combination of rows in the 2 tables. If you wrote this query:

      SELECT *
      FROM category_users
      JOIN user_emails
      

      …with the example data above, you’d get 9 rows back: you’d get the first row from category_users three times, once with each of the user_emails rows, then similarly you’d get the second category_users row three times, then finally you’d get the third category_users row three times.

      The join condition normally tells the database which column in the 2 tables represents the same value. In this case, the category_users.user_id column and the user_emails.user_id column both represent the same value. By writing ON ue.user_id = cu.user_id after JOIN user_emails ue, we tell the database to match the user_emails rows up with the appropriate category_users rows.

    • Even with the JOIN condition, we’re still going to get 4 rows back for the user with ID 1, because they are watching 2 categories and they have 2 email addresses - we’ll get a row for each combination. So @simon added an extra condition to the WHERE clause so that the query would only return rows with the user’s primary email address. This condition is in addition to the condition that already existed (restricting the category ID) - in order to be returned rows must have category_id = '10' AND ue.primary = true.

Then, because you didn’t want to restrict your search to a single category, you just needed to remove the category_id filter. You don’t want to remove the whole WHERE clause, because you still want to only return primary email addresses. In other words, your filter condition changed from:

category_id = '10' AND ue.primary = true

to

ue.primary = true

Phew! I hope that all makes sense :nerd_face:

2 Likes

Thank you for the amazingly detailed post, @simonk! I have to admit that SQL is a complete mystery to me, and your explanation was really helpful in starting to understand it. I really appreciate you taking the time to help me out! :pray:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.