Today I was evaluating my discourse server and found two weird things.
1- If I understand correctly, the discourse postgres_data folder which is in /var/discourse/shared/standalone folder, is where discourse stores the database. Now this folder for my forum is about 8GB. Which I believe is too big for a humble forum. Can someone explain why its too big?
2- I have another folder named postgres_data_old that is also about 7GB. What is this for?
Also, my server memory was about 4GB. I found its mostly consumed. So I upgraded it to 8GB. Again I think a humble forum shouldn’t need that much of memory.
I’ll let a specialist give you an answer about that. I know that Discourse stores a lot of information to provide relevent statistics and a good search engine I guess. It may not be alarming.
You can run the following commands to see which tables are taking up the most disk space
./launcher enter app
su - postgres
psql discourse
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
This is the output of my forum for these commands. I think the first 5 rows are consuming too much space. I cant imagine why user_actions should be around 2GB. Or post timing about 1GB. Can you give me an idea what could be wrong?
On the other hand, is there any way to clean unnecessary data? For example maybe I can get rid of most of email_logs. I don’t send too many emails. I don’t know why this is so big.
Thank you very much. I found this option and changed it from 90 days to 10 days.
What about user_actions ? what is stored in this table that it has got so big? I have the same question about post_timings and topic_views too. The names indicate that these should be just a bunch of numbers. And shouldn’t take really this much space.
You can run a backup, download it, and analyze it locally. A pg_dump is just a text file that is human readable and will let you check what exactly is in each table.
I followed your suggestion and downloaded and extracted the backup. it was about 2GB. Is it normal that its 1/4 of what discourse reports?
Btw I realized that a huge amount of data is for excessive number of inactive users. Its more than 100k. Is there an automatic way for deleting all these users? they don’t have any post or other things that might break the process.
If there is no automatic way, If I remove them with api call, does it also clear all information related to them from database?
I looks like you did a restore recently and the data before the restore is backup-ed in the backup schema. If you’re certain that you no longer need to recover back to the previous state, you can drop the schema by running the following commands.
./launcher enter app
su - postgres
psql discourse
ALTER SCHEMA "backup" TO "backup-moved";
# Check that you site is still working and up to date
DROP SCHEMA "backup-moved" CASCADE;
J’ai pensé à raviver ce sujet, car il s’agit du même problème.
Notre dossier postgres_data de Discourse fait 75 Go, ce qui me semble énorme. Selon le panneau d’administration, une sauvegarde fait environ 10,5 Go et les uploads représentent environ 9,3 Go.
J’ai vérifié quelles tables occupaient le plus d’espace et voici ce que j’ai obtenu :
public.posts | 51 Go
public.post_search_data | 9769 Mo
public.post_timings | 3997 Mo
public.user_actions | 2144 Mo
public.post_custom_fields | 1039 Mo
public.topics | 676 Mo
public.post_stats | 663 Mo
public.post_replies | 643 Mo
public.quoted_posts | 523 Mo
public.user_visits | 476 Mo
public.top_topics | 403 Mo
public.user_auth_token_logs | 364 Mo
public.topic_links | 353 Mo
public.topic_users | 335 Mo
public.topic_views | 301 Mo
public.user_histories | 220 Mo
public.users | 209 Mo
public.stylesheet_cache | 194 Mo
public.directory_items | 143 Mo
public.notifications | 139 Mo
Je me demande s’il est normal que la table public.posts prenne autant de place (51 Go). Nous parlons d’un forum avec 6 millions de posts, ce qui ne me semble pas extraordinaire.
posts | 8847417 | 39 Go | 12 Go | 51 Go
post_search_data | 5880635 | 8377 Mo | 1392 Mo | 9769 Mo
post_timings | 23728606 | 1571 Mo | 2430 Mo | 4001 Mo
user_actions | 5424982 | 488 Mo | 1657 Mo | 2144 Mo
post_custom_fields | 5832468 | 429 Mo | 609 Mo | 1039 Mo
J’ai vu d’autres exemples où 10 millions de lignes pour la table posts se traduisent par environ 15 Go. Nous avons maintenant 8 millions de lignes avec une taille de 39 Go.
Il y a beaucoup de facteurs en jeu ici. Par exemple, les longs messages occupent plus d’espace que les courts. En tant que lusophone moi aussi, je sais à quel point notre langue peut être verbeuse. Je vois que vos données proviennent d’une importation. Certains artefacts, comme les messages avec des citations imbriquées sur 5 niveaux, ne sont pas courants dans Discourse mais se retrouvent sur votre site à cause de l’importation. Notre langue compte aussi : un ç occupe le double de l’espace d’un s.
Je pense que nous ne modifions pas le paramètre par défaut de PostgreSQL, et que la colonne posts.raw est gérée par TOAST et compressée.