Common Discourse data tips

This is a good example of a couple of Discourse data quirks that can help you get more accurate results when building your queries.

As most posts and topics are ‘soft-deleted’ when you delete them this means they will still appear in the database and can be included in custom report stats, so it can help to filter them out of any results you may want to pull out for user or category statistics, for instance. In a similar fashion, you may also want to exclude Personal Messages from your stats too, as well as the system users.

Here are a couple of tips on how to filter these out:

This SQL query is used to get the top 10 users (excluding system users) who have made the most public posts, excluding any posts, or posts from topics, that have been deleted.

-- Selecting user_id from the posts table and counting the number of posts each user has made
SELECT 
    p.user_id,
    COUNT(p.id) AS "Post Count"
FROM posts p
  -- Left join with the topics table on topic_id
  LEFT JOIN topics t ON t.id = p.topic_id
WHERE 
  -- Exclude private message topics
  t.archetype <> 'private message'
  -- Exclude topics that have been deleted
  AND t.deleted_at ISNULL
  -- Exclude posts that have been deleted
  AND p.deleted_at ISNULL
  -- Exclude whispers, small action posts, and moderator posts
  AND p.post_type = 1
  -- Exclude posts made by system user and discobot (user_id 0 and -1)
  AND p.user_id > 0
-- Group the results by user_id 
GROUP BY 1
-- Order the results by the count of posts in descending order 
ORDER BY 2 DESC
-- Limit the results to the top 10 users with the most posts
LIMIT 10

And here it is without the inline comments:

SELECT 
    p.user_id,
    COUNT(p.id) AS "Post Count"
FROM posts p
  LEFT JOIN topics t ON t.id = p.topic_id
WHERE t.archetype <> 'private message'
  AND t.deleted_at ISNULL
  AND p.deleted_at ISNULL
  AND p.post_type = 1
  AND p.user_id > 0
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

This is just a simple query to demonstrate the examples, but hopefully this helps you get started with the data explorer. :+1: If you have any questions please ask them below. :slight_smile:

7 Likes

Did you mean messages here?

any posts, or posts from[…]

sounds weird to me.

I don’t think so? It’s meant to convey that posts which are in deleted topics aren’t included. Using messages makes it sound like it’s referring to PMs. But I’ll have another look at how to describe it more clearly. :+1:


Another thought -

Including AND p.post_type = 1 may be a good inclusion for this one as well. :thinking:

1 Like

I thought it might be because it filters out PMs too, and

excludes any messages, deleted posts, and posts from deleted topics.

Makes more sense to me than

excluding any posts, or posts from topics, that have been deleted.

Re-reading it now, I do see what the original intention was, so it’s there, just not as clear as I expected it to be I suppose. :person_shrugging:

1 Like