عملية قاعدة بيانات Postgres الأساسية (postmaster) تستهلك كل وحدة المعالجة المركزية

pfaffman · 2 أبريل 2018، 10:12م

I’ve got a 2-container install on a DO 8GB droplet that is behaving very strangely.

There is a postmaster (EDIT: now there are two of them) processing eating 100% CPU.
Sidekiq is running, but the Dashboard complains that it’s not checking for updates.

There are some logs like

  PG::ConnectionBad (FATAL: remaining connection slots are reserved for non-replication superuser connections ) /var/www/discourse/vendor/bundle/ruby/2.4.0/gems/pg-0.21.0/lib/pg.rb:56:in `initialize'

and

Job exception: FATAL: remaining connection slots are reserved for non-replication superuser connections

The data container has:

  db_shared_buffers: "2GB"
  db_work_mem: "40MB"

There are 4 unicorn workers in the web container (same as # processors).

Plugins:

          - git clone https://github.com/discourse/docker_manager.git
          #- git clone https://github.com/SumatoSoft/discourse-adplugin.git
          #- git clone https://github.com/davidcelis/new_relic-discourse.git
          - git clone https://github.com/discourse/discourse-cakeday.git
          - git clone https://github.com/ekkans/lrqdo-editor-plugin-discourse.git
          #- git clone https://github.com/davidtaylorhq/discourse-whos-online.git
          - git clone https://github.com/pmusaraj/discourse-onesignal.git

Memory:

KiB Mem :  8174936 total,   169976 free,  1288084 used,  6716876 buff/cache
KiB Swap:  2097148 total,  2094304 free,     2844 used.  4369992 avail Mem

mpalmer · 3 أبريل 2018، 3:05ص

The postgresql connection limit needs to be increased. That will cause the database as a whole to use more memory, but based on the free output you’ve got plenty that could be used if required. I’d double the current value, and review errors and resource consumption.

pfaffman · 3 أبريل 2018، 3:36ص

Uh. Where is that changed?

You mean this?

  db_work_mem: "80MB"

I did that, but I’m still getting a 502 error on the admin dashboard.

The other issue is that this site is using cloudflare with no caching (I’m told). I have included the cloudflare template, but I still suspect something is wrong with cloudflare.

mpalmer · 3 أبريل 2018، 5:57ص

It’s the max_connections parameter in postgresql.conf. I don’t see a tunable for that in discourse_docker, so I suspect you’ll need to play games with a pups exec stanza to make the edit.

As for Cloudflare, all the cloudflare template does it make it so that IP addresses get fixed after going through Cloudflare proxying. It doesn’t do anything to make Cloudflare cache. You might want to keep that in a separate topic, rather than mix them together in here.

pfaffman · 3 أبريل 2018، 8:33م

Not one for playing games when they’re not necessary, I went into the data container, edited postgresql.conf by hand, doubled max_connections (from 100 to 200) and, LO! it seems that all is well.

I don’t understand just why I’ve not encountered this before or why this is the solution here. The database doesn’t seem that big and the traffic doesn’t seem that high.

Edit: I have played the games and won!

If anyone else cares. . . stick this in data.yml in hooks in the after_postgres section. I put it after the -exec section.

    # double max_connections to 200
    - replace:
        filename: "/etc/postgresql/9.5/main/postgresql.conf"
        from: /#?max_connections *=.*/
        to: "max_connections = 200"

markersocial · 28 سبتمبر 2019، 3:58م

عذراً على إحياء موضوع قديم.

@pfaffman هل حلّ هذا مشكلة استهلاك وحدة المعالجة المركزية المرتفع الناتج عن

pfaffman · 28 سبتمبر 2019، 4:32م

يبدو أن المشكلة قد اختفت

لقد جربت منح postgres ذاكرة أكثر وأقل. يبدو أن إضافة مساحة التخزين المؤقت (swap) قد ساعدت (ومن هنا جاء التفكير في منح pg ذاكرة أقل). ربما ساعدتني خطوة واحدة قمت بها، وهي عمل نسخة احتياطية لقاعدة البيانات ثم استعادتها. أو ربما لم يكن لها أي تأثير.

لا أملك حلاً سحرياً، لكن هذه هي الخطوات التي اتبعتها.

eboehnisch · 27 مايو 2020، 4:42م

بدأ هذا يحدث لي أيضًا بعد تثبيت التحديث إلى 2.5.0.beta5. واحدًا تلو الآخر، أرى المزيد من عمليات postmaster التي تستهلك أقصى قدر ممكن من وحدة المعالجة المركزية، ويستغرق منها أحيانًا بضع دقائق حتى تكتمل. ببطء، يؤدي هذا إلى استهلاك جميع أرصدة AWS الخاصة بالخادم ويجعل المنتدى بأكمله بطيئًا أو حتى غير قابل للاستخدام.

لم يكن لزيادة max_connections أي تأثير، وكذلك لم يكن لإعادة بناء التطبيق.

قبل تحديثي إلى 2.5.0beta5، لم أرَ هذا من قبل. هل لديك أي تلميح حول أين يجب أن أبحث؟

RobinTS · 27 مايو 2020، 10:47م

قمنا بتحديث المنتدى إلى الإصدار 2.5.0.beta5 أمس، ومنذ ذلك الحين أصبح بطيئًا وغير مستجيب. هناك عدد قليل من وظائف Postmaster في الأعلى تستهلك 90-100% من وحدة المعالجة المركزية. هذا يتسبب في انتهاء مهلة العديد من أجزاء المنتدى وإرجاع رمز 502 للمستخدمين.

تظهر هذه الوظائف وتختفي، ولكن أثناء نشاطها يكون المنتدى غير قابل للاستخدام إلى حد كبير.

codinghorror · 27 مايو 2020، 10:55م

أليست هذه خطوات إنهاء ترقية PostgreSQL 12؟ أعتقد أن هناك بعض التنظيف الداخلي الذي يجب أن يقوم به بعد الترحيل من PG10 إلى PG12. هل تستمر هذه الحالة ليوم أو أكثر؟

RobinTS · 28 مايو 2020، 12:46ص

مرّ حتى الآن 13 ساعة.

أيضًا، للتأكيد: انتقلت من المستوى 10 إلى 12 (أعلم أنه يمكن البقاء في المستوى 10 بشكل اختياري، لذا أردت فقط التوضيح).

لا أدري ما إذا كان هذا ذا صلة، لكن الانتقال إلى ملخص المستخدم يرفع استخدام وحدة المعالجة المركزية باستمرار إلى أكثر من 90% وينتهي دائمًا بخطأ 502. تبدو الأقسام الأخرى من الملف الشخصي عاملة، وإن كانت ببطء.

سأتابع الأمور خلال اليوم لأرى ما إذا كانت ستتحسن، وسأقوم بتحديث هذا الموضوع عند الضرورة.

codinghorror · 28 مايو 2020، 1:36ص

قد يكون هناك حاجة لبعض عمليات التنظيف بعد الترحيل. إذا راجعت موضوع الترقية الرسمي هنا وقرأت المنشور الأول بعناية، ستجد التفاصيل والخطوات الموصى بها – PostgreSQL 12 update

markersocial · 28 مايو 2020، 6:00ص

تنبيه فقط، واجهتُ نفس المشكلة وتم حلها عبر القيام بالتالي:

RobinTS · 28 مايو 2020، 4:20م

شكرًا لك @codinghorror و @markersocial على التعليمات. لقد مرّ أكثر من يوم، ويبدو أن الأمور عادت إلى طبيعتها. لم أقم بأي شيء سوى الانتظار.

سأتابع الوضع وأرى ما إذا كانت ستظهر أخطاء 502 أخرى (قد يكون ذلك بسبب قلة المستخدمين خلال ساعات غير الذروة).

إذا تكررت المشكلة، فسأجرب الخطوات التي ذكرتها.

الموضوع		الردود	مرات العرض
Slow Sidekiq + Postmaster using 95%+ CPU (32 cores) after Postgresql Version Upgrade Self-hosting server-resources	22	3263	29 مايو 2020
Discourse Crash due to PSQL connection issue Self-hosting	9	491	17 مارس 2024
Unusually high CPU usage Self-hosting	31	1007	18 فبراير 2026
Too many connections to DB, how to optimize Support	17	3944	27 يوليو 2017
Discourse Bad Gateway after reboot Self-hosting	15	2239	2 يوليو 2020

عملية قاعدة بيانات Postgres الأساسية (postmaster) تستهلك كل وحدة المعالجة المركزية

الموضوعات ذات الصلة