يبلغ عدد مستخدمي موقع Discourse لدينا أكثر من 10 آلاف مستخدم، مع حوالي 700 مستخدم نشط يوميًا، حيث يُرسل المستخدمون في المتوسط 10 آلاف منشور يوميًا. كما يتجاوز عدد مشاهدات الصفحات في مجتمعنا 160 ألف مشاهدة يوميًا، بما في ذلك الزوار الآليون والمستخدمون المجهولون. ومعظم مستخدمينا يتصلون بنا عبر الأجهزة المحمولة.
قمنا بتشغيل المجتمع في وضع الإعداد المستقل على خادم افتراضي خاص (VPS) واحد يحتوي على 16 نواة معالجة وذاكرة عشوائية (RAM) سعتها 24 جيجابايت، وقمنا بتكوين ملف app.yml بالقيم التالية:
مع الإعدادات المذكورة أعلاه، أفاد بعض المستخدمين بأن الموقع يُحمّل ببطء بالنسبة لهم، وفي بعض الأحيان عند إرسال منشور، تظهر الشاشة فارغة (مع بقاء الرأس الظاهر). كما أن أداء الموقع يتباطأ أحيانًا خلال ساعات الذروة.
يرجى توضيح ما إذا كان هناك خطأ في الإعدادات أم أننا نحتاج إلى موارد إضافية.
شكرًا جزيلاً
10k posts per day is pretty high, relative to the amount of pageviews. I can imagine that you would be hitting some resource limits here given your configuration and I suspect it will be the database. You could try to move to a multi container setup, effectively offloading the unicorn workers from the main server.
It could very well be, but I would first try to scale anything that can scale horizontally, horizontally. It will also give you a much better idea of where your bottleneck is.
Most performance issues can be solved by simply throwing more resources against the problem. The hard part is doing that in a smart way so you can save some bucks (or potentially a lot of money).
Thanks for your expertise and friendly advice, I will definitely read about how to implement this. another question I have is what settings should be applied in the app for the specifications I mentioned above. (24 CPU cores with 32gb of ram)
Are the current settings appropriate or is it better to increase the values?
Hard to say without inspecting the system and see what is going on.
Since you said the most issues happen when submitting a post, the issue is probably with database writes and I don’t think increasing shared buffers any further will be of much help, but you could try. I’ve seen it cranked up to over 50% of memory (against all advice) so you could try to gradually raise it to up to 12GB.
If you’re not seeing 502’s then there is no use in increasing UNICORN_WORKERS either.
You don’t mention using it so that’s why I think, the first thing I would do is adding a CDN this would reduce a lot of weight from the VPS as the bigger requests would not touch the server.
Adding to the CDN, I would also go with S3-like storage that would allow you to scale differently the storage and the VPS resources (if your community is uploads heavy).
This recommendations help a lot to reduce load and the price increase is a lot lower than a bigger VPS.
Thanks @marianord, Unfortunately, we do not use CDN. The upload rate in our forum is not very high. Most of the time, users talk about various subjects. For example, in the last year, we had about 2.8 million posts and 2.7 million likes, but only 25gb of files are uploaded.
Do you think that according to the information I wrote, using the CDN like s3 will reduce our server load?
I disagree with @marianord. I don’t think that a CDN would make a noticeable difference with respect to the load on your server.
These are just static files and they are not heavy to serve at all.
S3 offloads the files and backups to another server that is managed by a cloud provider (really high level summary)
CDN caches the statics files of your server (images, JS, CSS) to server from multiple servers (PoP) around the globe to speed up the loading of this assets.
At least that is my experience, you’re reducing the amount of requests that get to your server therefore reducing the load. It’s a lot easier to serve only 10 JSON requests per user that to serve 100 requests per user.
Maybe this will not solve all the issues @nildarar is facing but will reduce the heavier load of the server removing all the static requests (the cached ones) from the Discourse server.
A request for a static file does not have a large impact on the overall server load. The requests for dynamic content are the heavy ones.
In general, a json request is not a static asset that will be cached by a CDN. It is dynamic content which is generated on the fly. Why are you talking about json files in a CDN context?
static requests != heavier load.
I am sorry but this is really bad advice.
This is an example from a 6-CPU machine (so CPU adds up to 600%) running Discourse without a CDN or S3.
You can see that nginx is only responsible for 6.7% (so that is 1/100 of the capacity). Only a part of that is used for static assets.
If we were to offload the static assets to S3 and/or a CDN it would reduce the overall server load by less than one percent.
True, but Discourse has a few exceptions, like stylesheets that are served by ruby, so having a caching CDN means those requests doesn’t use up the unicorn processes.
Regarding the OP problem, the first thing needed is having someone knowledgeable do some performance analysis during peak hours and identifying what is the current bottleneck.
Thanks for your guidance, until a few months ago we used the Cloudflare CDN service and made good changes to the static content through the page rules. After it, I read somewhere that using proxies like Cloudflare drastically reduces the performance of the discourse, so we disabled it.
Yesterday we increased the CPU cores from 16 to 24 and made the following changes in the app.yml
With these changes, our problem was temporarily solved, but I think we should make a fundamental change for the next few months from now.
So according to your recommendations, using CDN for serving static contents and splitting the discourse into two separate containers has a higher priority in performance improvements.
This may be older information, but I do recall reading that discourse prefers a lower number of more powerful CPU’s vs a higher number of lower powered ones… even if you update the number of unicorn workers.
@codinghorror, can you confirm if this information is still accurate?
Using something like Prometheus + Grafana can help you just to get the historical of the data, instead of seeing it live and then do some deeper analysis of what is happening.
مرحبًا مجددًا
بناءً على نصائحكم، قمنا بتثبيت Prometheus ومراقبة أداء المجتمع لفترة من الوقت. يرجى الاطلاع على التقارير أدناه ومقارنتها بالقيم التي تظهر في التثبيتات المختلفة.