Postgresql locks with high concurrency API key usage

Hello Community,

We are using the Discourse API on a self-hosted installation. Our use case is quite intensive in terms of concurrency, with around 100 API requests per second on average. We are also using PostgreSQL+Patroni+HAProxy to provide a high-availability database setup. Occasionally, our PostgreSQL locks and Patroni restarts the master node.

We have implemented a cron job to check for blocked transactions, and every time this issue occurs, we find the same kind of operations:

 blocked_pid | blocked_user | blocking_pid | blocking_user |                                       blocked_statement                                        |                             current_statement_in_blocking_process                              
-------------+--------------+--------------+---------------+------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------
      297904 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:38:16.822352' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21
      296718 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:50.900480' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21
      293101 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:49.485074' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21

As you can see, many requests are trying to update the last usage time of the same API key.

For now, we have increased the number of API keys that our application is using to reduce the probability of collision. I noticed that there is code in place that checks if the API key was modified in the last minute to avoid an update. However, I assume that since we are using more than one pod to process the requests, this protection is not effective.

I am unsure if we should report this as a bug or if there are any parameters to avoid this kind of collision (either in Discourse or PostgreSQL). Note that for us, the information related to the last usage is relevant, but a resolution of 1 day would be sufficient.

I would also like to know what the preferred approach is to manage PostgreSQL HA with auto recovery.

Thanks

Has anyone encountered a similar issue or can provide some guidance on what to investigate? Could it be possible that Discourse is not designed to handle this volume of requests per second?

Thank you in advance.