Error: integer out of range

This did the trick for post_actions, thanks for the suggestion!

We’re mainly now seeing two failures still:

  1. GET https://devforum.roblox.com/notifications is 404ing for users who have notifications w/ bigInt ids.

  2. Jobs::GrantAnniversaryBadges scheduled job fails

Is it possible that a notifications model/controller/service could still be hardcoded to define the notifications.id field as an integer?

very unlikely … 404 does not sound right, is there anything in /logs that describes the problem?

Sadly I can’t find anything particularly useful in /logs itself, but when I reproduce this locally I see the req come in and then 404, but it only happens when I set my latest notification to have an id that is a bigint (works just fine when its within int32 range)

Poking around, I notice that I get a valid response when I call http://localhost/notifications.json?limit=30:

{
  "notifications": [
    {
      "id": 2148935910,
      "user_id": 2,
      "notification_type": 5,
      "read": true,
      "high_priority": false,
      "created_at": "2023-02-16T00:50:53.135Z",
      "post_number": 2,
      "topic_id": 8,
      "fancy_title": "Welcome to the Lounge",
      "slug": "welcome-to-the-lounge",
      "data": {
        "topic_title": "Welcome to the Lounge",
        "original_post_id": 13,
        "original_post_type": 1,
        "original_username": "blarouche",
        "revision_number": null,
        "display_username": "blarouche"
      }
    }
  ],
  "total_rows_notifications": 1,
  "seen_notification_id": 1001,
  "load_more_notifications": "/notifications?offset=60&username=bjlarouche"
}

However, once I pass the recent query param i.e http://localhost/notifications.json?recent=true&limit=30 and this is what’s called my the notification menu?

{
  "errors": [
    "The requested URL or resource could not be found."
  ],
  "error_type": "not_found"
}

edit: we are thinking it its that is because of the current_user.seen_notification_id is still and integer :thinking:

I think we are now in the clear!

We noticed that we needed to migrate columns in other tables that referenced notification_id in addition to just notifications.id itself. Othwerwise, services/notifications.rb or services/badge_granter.rb would throw errors.


For any other big forum setup who runs into this in the future with notifications, here is what we did…

All together, we had to migrate four columns in four tables:

  1. notifications.id
  2. user.seen_notification_id
  3. user_badges.notification_id
  4. shelved_notifications.notification_id

We initially went about #1 with the ALTER command suggested above, but then as mentioned opted to use ActiveRecord migrations, that way the migration files add up to the schema.

20230215070319_change_notifications_id_to_bigint.rb
# frozen_string_literal: true

class ChangeNotificationsIdToBigint < ActiveRecord::Migration[6.1]
    def change
        change_column :notifications, :id, :bigint
    end
end
20230215070320_change_user_seen_notification_id_to_bigint.rb
# frozen_string_literal: true

class ChangeUserSeenNotificationIdToBigint < ActiveRecord::Migration[6.1]
    def change
        change_column :users, :seen_notification_id, :bigint
    end
end
20230215070321_change_user_badges_notification_id_to_bigint.rb
# frozen_string_literal: true

class ChangeUserBadgesNotificationIdToBigint < ActiveRecord::Migration[6.1]
    def change
        change_column :user_badges, :notification_id, :bigint
    end
end
20230215070322_change_shelved_notifications_notification_id_to_bigint.rb
# frozen_string_literal: true

class ChangeShelvedNotificationsNotificationIdToBigint < ActiveRecord::Migration[6.1]
    def change
        change_column :shelved_notifications, :notification_id, :bigint
    end
end

We have a custom Dockerfile in our setup (we build images so that we can run discourse and sidekiq on separate resouces in Kubernetes) so getting these files copied into /db/migrate as part of our Dockerfile was straightforward.

Then, we just let rake db:migrate handle the rest. Once we did a rolling restart on all of our discourse and sidekiq pods everything was running as expected :crossed_fingers:.

3 Likes

Excellent data, we are going to just bite the bullet and migrate this in core discourse to a bigint. It is an expensive move , but this will certainly pop up again on big forums so we might as well fix it.

Note … the OP is about post_id … overflowing that going to be a lot harder than notifications. 2.1 billion posts certainly going to happen, but the cost of biginting post_id is far more expansive than notification id. We can wait a bit on that time bomb.

4 Likes

Brandon,

I think we may end up going with:

The balancing act here is that I don’t want to “punish” 40k Discourse instances because of the one or two outlier forums that managed to reach the amazing feat of 2.5 billion notifications.

What do you think?

(Note: It will also catch at least one you missed - there is a chat table as well)

3 Likes

This would be great and looks like a smart solution :slight_smile:

Just to loop back here - this has now been merged. :partying_face:

5 Likes

This topic was automatically closed after 4 days. New replies are no longer allowed.