لماذا حجم discourse postgres_data كبير جدًا؟

Today I was evaluating my discourse server and found two weird things.

1- If I understand correctly, the discourse postgres_data folder which is in /var/discourse/shared/standalone folder, is where discourse stores the database. Now this folder for my forum is about 8GB. Which I believe is too big for a humble forum. Can someone explain why its too big?

2- I have another folder named postgres_data_old that is also about 7GB. What is this for?

Also, my server memory was about 4GB. I found its mostly consumed. So I upgraded it to 8GB. Again I think a humble forum shouldn’t need that much of memory.

There was a postgres update recently, there is a old folder in case something went wrong.

If your forum works fine, you can execute this command

cd /var/discourse
./launcher cleanup app

It should clean the old postgres data folder.

For the memory, DIscourse works this way, it uses the most memory it can, you don’t have to worry about that.

If you want it to use less memory, you can change the db_shared_buffers in the app.yml (command : nano containers/app.yml)

what about the big database? Is it something wrong with my forum? I think its way too larger than it should be.

How many posts do you have ?

I’ll let a specialist give you an answer about that. I know that Discourse stores a lot of information to provide relevent statistics and a good search engine I guess. It may not be alarming.

You can run the following commands to see which tables are taking up the most disk space

./launcher enter app
su - postgres
psql discourse
SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 20;

This is the output of my forum for these commands. I think the first 5 rows are consuming too much space. I cant imagine why user_actions should be around 2GB. Or post timing about 1GB. Can you give me an idea what could be wrong?

On the other hand, is there any way to clean unnecessary data? For example maybe I can get rid of most of email_logs. I don’t send too many emails. I don’t know why this is so big.

 public.user_actions     | 1792 MB
 public.email_logs       | 1293 MB
 public.post_timings     | 731 MB
 public.directory_items  | 456 MB
 public.topic_views      | 446 MB
 public.posts            | 380 MB
 public.post_search_data | 298 MB
 public.notifications    | 170 MB
 backup.topic_views      | 156 MB
 public.post_actions     | 155 MB
 public.user_histories   | 134 MB
 backup.directory_items  | 123 MB
 public.user_auth_tokens | 110 MB
 public.users            | 91 MB
 backup.user_auth_tokens | 87 MB
 public.user_visits      | 77 MB
 backup.posts            | 68 MB
 backup.post_timings     | 65 MB
 backup.post_search_data | 64 MB
 public.optimized_images | 63 MB

Did you do an upgrade recently?

I’ve seen a commit about cleaning the email logs table. After the rebuild, it may be lighter

Also, you might check the setting delete email logs after days, it should be safer than deleting it manually

Thank you very much. I found this option and changed it from 90 days to 10 days.

What about user_actions ? what is stored in this table that it has got so big? I have the same question about post_timings and topic_views too. The names indicate that these should be just a bunch of numbers. And shouldn’t take really this much space.

You can run a backup, download it, and analyze it locally. A pg_dump is just a text file that is human readable and will let you check what exactly is in each table.

I followed your suggestion and downloaded and extracted the backup. it was about 2GB. Is it normal that its 1/4 of what discourse reports?

Btw I realized that a huge amount of data is for excessive number of inactive users. Its more than 100k. Is there an automatic way for deleting all these users? they don’t have any post or other things that might break the process.

If there is no automatic way, If I remove them with api call, does it also clear all information related to them from database?

2GB was the size of the compressed backup? Also backups don’t include indexes, and they take a lot of space.

An inactive user without posts or likes is just a single line in the users table. Are you sure all the space is coming from inactive users?

Did your forum came in from an import? Maybe the import created some bad data?

No. Its the size of extracted backup. The size of compressed backup is about 300MB

backup.topic_views

I looks like you did a restore recently and the data before the restore is backup-ed in the backup schema. If you’re certain that you no longer need to recover back to the previous state, you can drop the schema by running the following commands.

./launcher enter app
su - postgres
psql discourse
ALTER SCHEMA "backup" TO "backup-moved";
# Check that you site is still working and up to date
DROP SCHEMA "backup-moved" CASCADE;

فكرت في إحياء هذا الموضوع، حيث أنه يتعلق بنفس المشكلة.

مجلد postgres_data الخاص بـ discourse لدينا بحجم 75 جيجابايت، وهو ما أعتقد أنه كبير جدًا. وفقًا لوحدة التحكم في الإدارة، تبلغ نسخة الاحتياطية حوالي 10.5 جيجابايت، وتستهلك التحميلات حوالي 9.3 جيجابايت.

لقد تحققت من الجداول التي تستهلك مساحة أكبر، وهذه هي النتيجة:

 public.posts                | 51 جيجابايت
 public.post_search_data     | 9769 ميجابايت
 public.post_timings         | 3997 ميجابايت
 public.user_actions         | 2144 ميجابايت
 public.post_custom_fields   | 1039 ميجابايت
 public.topics               | 676 ميجابايت
 public.post_stats           | 663 ميجابايت
 public.post_replies         | 643 ميجابايت
 public.quoted_posts         | 523 ميجابايت
 public.user_visits          | 476 ميجابايت
 public.top_topics           | 403 ميجابايت
 public.user_auth_token_logs | 364 ميجابايت
 public.topic_links          | 353 ميجابايت
 public.topic_users          | 335 ميجابايت
 public.topic_views          | 301 ميجابايت
 public.user_histories       | 220 ميجابايت
 public.users                | 209 ميجابايت
 public.stylesheet_cache     | 194 ميجابايت
 public.directory_items      | 143 ميجابايت
 public.notifications        | 139 ميجابايت

أتساءل عما إذا كان من الطبيعي أن يستهلك جدول public.posts مساحة بهذا الحجم (51 جيجابايت)، فنحن نتحدث عن منتدى يحتوي على 6 ملايين منشور، وهو ما لا أراه أمرًا استثنائيًا.

هل هذا طبيعي؟

إكمال المعلومات أعلاه باستخدام rake db:stats:

table_name | row_estimate | table_size | index_size | total_size

posts | 8847417 | 39 GB | 12 GB | 51 GB
post_search_data | 5880635 | 8377 MB | 1392 MB | 9769 MB
post_timings | 23728606 | 1571 MB | 2430 MB | 4001 MB
user_actions | 5424982 | 488 MB | 1657 MB | 2144 MB
post_custom_fields | 5832468 | 429 MB | 609 MB | 1039 MB

لقد رأيت أمثلة أخرى حيث تؤدي 10 ملايين صف في جدول posts إلى حجم يقارب 15 جيجابايت فقط. لدينا الآن 8 ملايين صف بحجم 39 جيجابايت.

هل هناك طريقة لتحسين ذلك؟

هناك العديد من العوامل المؤثرة هنا. على سبيل المثال، تستهلك المنشورات الطويلة مساحة أكثر من المنشورات القصيرة. وكزميل ناطق بالبرتغالية، أعرف كم يمكن أن تكون لغتنا مفرطة في التفصيل. أرى أن بياناتك قادمة من عملية استيراد. بعض العناصر غير الشائعة في نظام Discourse، مثل المنشورات التي تحتوي على اقتباسات بعمق 5 مستويات، موجودة في موقعك بسبب عملية الاستيراد. كما أن لغتنا تؤثر أيضًا؛ فالحرف ç يستغرق ضعف المساحة التي يستغرقها الحرف s.

أعتقد أننا لا نغير الإعداد الافتراضي لـ PostgreSQL، وأن عمود posts.raw ينتقل إلى TOAST ويتم ضغطه.

شكرًا لك، لقد نجح ذلك. أعتقد أن هذا السطر قد ينقصه RENAME على أي حال. لقد قمت بذلك على النحو التالي:

ALTER SCHEMA "backup" RENAME TO "backup-moved";