NULL first_visited_at dates in the topic_users table

While investigating the topic users table last month

Mysterious entries in topic_users table:

I noticed quite a few rows had NULL first_visited_at dates even though the last_visited_at date was not null.

Not certain whether or not it might be due to my dev set-up I ran this query to clean up the cruft.

UPDATE topic_users 
SET first_visited_at = last_visited_at - interval '1 hour'
WHERE last_visited_at IS NOT NULL 
AND first_visited_at IS NULL

(* I didn’t think to give a total_msecs_viewed - a query for another day)

Shortly after more of the mysterious NULL first_visited_at fields appeared.

I’ve finally narrowed it down to Category Watching.

i.e. when a member is Watching a Category a row is created in the topic_users table. The first_visited_at date is NULL when the member hasn’t visited the topic yet.

Unfortunately, when the member does get around to visiting the topic the first_visited_at field does not get populated.

These two queries should give much the same result

SELECT COUNT(id) 
FROM topic_users 
WHERE topic_users.first_visited_at IS NULL 
AND topic_users.last_visited_at IS NOT NULL
SELECT COUNT(topic_users.user_id)
FROM topic_users
JOIN topics 
ON topics.id = topic_users.topic_id
WHERE topic_users.first_visited_at IS NULL 
AND topic_users.last_visited_at IS NOT NULL
AND topic_users.notification_level = 3
AND topics.category_id IS NOT NULL
AND topics.category_id IN( SELECT category_users.category_id
    FROM category_users
    WHERE category_users.notification_level = 3 )
AND topic_users.user_id IN( SELECT category_users.user_id
    FROM category_users
    WHERE category_users.notification_level = 3 )
1 Like

Issues around last_visited_at also reported / detailed here:

Specifically on April 7th I concluded the table is not updated correctly.
I’m unsure if there have been any code updates in this area since.

I didn’t report it in the bug category because there is nowhere in Discourse code base where this was an actual “problem”…
… however data is stored in an inconsistent manor for other plugins and custom users reports to be ran against.

1 Like

Not that I can see. I was thinking about [quote=“Mittineague, post:1, topic:45996”]
total_msecs_viewed
[/quote]

But when I checked earlier not a single row had Null or the default 0
The oddest were 5000 msec from the eviltrout seed posts. But they are seed posts so that isn’t too alarming.

Most of my coding experience is JavaScript, PHP and MySQL
Many other languages and databases too, but not to any great extent.

I was wondering if there might be something with Ruby time ↔ Postgres time,
But from what I’ve found online it seems they should play well together.

I’ve also noticed

TopicUser.create(attrs.merge!(user_id: user_id, topic_id: topic_id, first_visited_at: now ,last_visited_at: now))

and thought that maybe merge was discarding any second identical value.

But I have rows where both first_visited_at and last_visited_at had the same value, so unless it happens during a certain path I don’t think that’s it either.

I’ve done some more testing.

It definitely has to do with tracking state.

When a member is either Tracking or Watching a category, a row is inserted into the topic_users table when a topic is created in that category.

A row is not inserted into the topic_users table when a topic is created if a member has that category tracking state set to Mute or Normal.
And the row may not be created depending on other of the members Preference settings, if they post a reply in the topic etc. i.e. one of

created_topic: 1, 
user_changed: 2, 
user_interacted: 3, 
created_post: 4, 
auto_watch: 5, 
auto_watch_category: 6, 
auto_mute_category: 7, 
auto_track_category: 8, 
plugin_changed: 9

When a member visits a topic the logic tests for the presence of a row in the table.
If there is no row an INSERT is done.
If there is a row, an UPDATE

I think the solution would involve checking for the presence of a NULL first_visited_at field in someway somewhere. But I don’t know what would be the best way to go about it yet.

2 Likes

Well, that was fun. To easily test hacked Core code I ended up setting up a Ubuntu VM in my Windows 10.
Then inside the Ubuntu, I set up another VM per

Anyway, this hack to models/topic_user.rb “works”

    UPDATE_TOPIC_USER_SQL = "UPDATE topic_users
                                    SET
                                      last_read_post_number = GREATEST(:post_number, tu.last_read_post_number),
                                      highest_seen_post_number = t.highest_post_number,
                                      total_msecs_viewed = LEAST(tu.total_msecs_viewed + :msecs,86400000),
                                      notification_level =
                                         case when tu.notifications_reason_id is null and (tu.total_msecs_viewed + :msecs) >
                                            coalesce(uo.auto_track_topics_after_msecs,:threshold) and
                                            coalesce(uo.auto_track_topics_after_msecs, :threshold) >= 0 then
                                              :tracking
                                         else
                                            tu.notification_level
                                         end
+                                      , first_visited_at = COALESCE(tu.first_visited_at, tu.last_visited_at, :now)
                                  FROM topic_users tu
                                  join topics t on t.id = tu.topic_id
                                  join users u on u.id = :user_id
                                  join user_options uo on uo.user_id = :user_id
                                  WHERE
                                       tu.topic_id = topic_users.topic_id AND
                                       tu.user_id = topic_users.user_id AND
                                       tu.topic_id = :topic_id AND
                                       tu.user_id = :user_id
                                  RETURNING
                                    topic_users.notification_level, tu.notification_level old_level, tu.last_read_post_number
                                "

The logic is that when a row is INSERTed into the topic_users table because the member has set a category to Watch / Track and a topic is created in that category, various fields will be NULL.
* this is how it now is

With the new line of code in place, when a member gets around to actually visiting the topic and the UPDATE query runs.

  • use the same first_visited_at timestamp if it exists
  • else use the last_visited_at timestamp if it exists
  • else use :now as a fallback

My testing indicates that this works great for topics that were created after the hack was in place.
Either NULLs will be left in place if the member does not visit the topic (as so it should be)
Or the “last” will be either the same or later than the “first” if a member visits the topic.

A possible concern is how to handle topics that were created before the new line of code was there.

The easy way would be to leave it if this data is not all that important - i.e. used elsewhere by other code.
But if having accurate and correct “first” values is important, this alone won’t do.

It would be possible to grab the first post of a member in a topic.
But I don’t feel as though the place for that is in this file running a more complex query all the time.
That is, I think if any “correction” were to be done it should be a one-time query.
More intensive resource use to run through the entire database, but it should hopefully need to be done only once.

4 Likes

I have the feeling that when I first looked into the weird topic id problem I may have opened a can of worms.

I’ve put together some queries that can help find the earliest “known” first_visited_at timestamps.

SELECT topic_users.topic_id
 , topic_users.user_id
 , topic_users.last_visited_at
FROM topic_users
WHERE ( topic_users.first_visited_at IS NULL
      AND topic_users.last_visited_at IS NOT NULL ) 
SELECT topic_users.topic_id
 , topic_users.user_id
 , MIN(posts.created_at)
FROM topic_users
JOIN posts
ON posts.topic_id = topic_users.topic_id 
AND posts.user_id = topic_users.user_id
WHERE topic_users.posted 
 AND ( topic_users.first_visited_at IS NULL
      AND topic_users.last_visited_at IS NOT NULL )   
GROUP BY topic_users.topic_id, topic_users.user_id  
SELECT tu.topic_id
 , tu.user_id
 , MIN(pa.created_at)
FROM ( SELECT topic_users.topic_id
        , topic_users.user_id
      FROM topic_users
      WHERE ( topic_users.first_visited_at IS NULL
            AND topic_users.last_visited_at IS NOT NULL )
      ) AS tu 
JOIN LATERAL ( SELECT posts.id
              FROM posts
              WHERE posts.topic_id = tu.topic_id 
              ) AS pt
ON true JOIN LATERAL ( SELECT post_actions.created_at
              FROM post_actions
              WHERE post_actions.post_id = pt.id 
              AND post_actions.user_id = tu.user_id 
              ) AS pa
ON true
GROUP BY tu.topic_id, tu.user_id

But I’m wondering just how important getting a more accurate first_visited_at timestamp really is.

I’ve noticed that when a member is mentioned or invited, a row in the topic_users table is created.
Both the first_visited_at and last_visited_at fields are given :now even if the member never visited the topic.

3 Likes