Postgresql locks with high concurrency API key usage

Joan_Tomàs_i_Buliart · July 25, 2024, 3:09pm

Hello Community,

We are using the Discourse API on a self-hosted installation. Our use case is quite intensive in terms of concurrency, with around 100 API requests per second on average. We are also using PostgreSQL+Patroni+HAProxy to provide a high-availability database setup. Occasionally, our PostgreSQL locks and Patroni restarts the master node.

We have implemented a cron job to check for blocked transactions, and every time this issue occurs, we find the same kind of operations:

 blocked_pid | blocked_user | blocking_pid | blocking_user |                                       blocked_statement                                        |                             current_statement_in_blocking_process                              
-------------+--------------+--------------+---------------+------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------
      297904 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:38:16.822352' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21
      296718 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:50.900480' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21
      293101 | discourse    |       293083 | discourse     | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:49.485074' WHERE "api_keys"."id" = 21 | UPDATE "api_keys" SET "last_used_at" = '2024-07-16 16:34:48.163449' WHERE "api_keys"."id" = 21

As you can see, many requests are trying to update the last usage time of the same API key.

For now, we have increased the number of API keys that our application is using to reduce the probability of collision. I noticed that there is code in place that checks if the API key was modified in the last minute to avoid an update. However, I assume that since we are using more than one pod to process the requests, this protection is not effective.

I am unsure if we should report this as a bug or if there are any parameters to avoid this kind of collision (either in Discourse or PostgreSQL). Note that for us, the information related to the last usage is relevant, but a resolution of 1 day would be sufficient.

I would also like to know what the preferred approach is to manage PostgreSQL HA with auto recovery.

Thanks

Joan_Tomàs_i_Buliart · September 2, 2024, 7:50am

Has anyone encountered a similar issue or can provide some guidance on what to investigate? Could it be possible that Discourse is not designed to handle this volume of requests per second?

Thank you in advance.

Topic		Replies	Views
Load test with Discourse Dev	19	3482	August 15, 2018
Upgrade failed with "duplicate key value violates unique constraint" error Installation	18	2388	March 30, 2020
Users reporting lots of 502 errors when attempting to post due to "max consecutive replies" check Bug	37	2784	January 18, 2019
Primary Postgres database process (postmaster) eating all CPU Installation server-resources	13	3986	May 28, 2020
Server Performance Issues Due to High CPU Usage Support	6	196	February 23, 2025

Postgresql locks with high concurrency API key usage

Related topics