Huge network traffic on NAS Storage

hnaseri · October 5, 2022, 3:55pm

I am hosting all of my upload files on a NAS Storage (glusterfs).

Recently I found that there is a huge and constant network traffic on the NAS. And traced it down to discourse requesting for optimized images. Is there a job that constantly lookup these images? why? and How can I turn it off?

hnaseri · October 5, 2022, 11:34pm

btw clean up uploads site settings is disabled in my forum.

sam · October 6, 2022, 10:15am

Possibly the backfill @david added for looking up primary image color.

It will eventually finish and return to a steady state

We need to walk all the images for the backfill, you may be able to work around by forcing the color on all images to white or something

hnaseri · October 6, 2022, 10:43am

As far as I see,

github.com

discourse/discourse/blob/d0243f741eaddf6004b7b740b9a2fc948e33cb08/app/jobs/scheduled/periodical_updates.rb#L51


      
                end
          
                offset = (SiteSetting.max_new_topics).to_i
                last_new_topic = Topic.order('created_at desc').offset(offset).select(:created_at).first
                if last_new_topic
                  SiteSetting.min_new_topics_time = last_new_topic.created_at.to_i
                end
          
                Category.auto_bump_topic!
          
                Upload.backfill_dominant_colors!(25)
          
                nil
              end
          
            end
          
          end

It is working on 25 images per 15 minutes. yes? this should be very negligible. I am seeing thousands of files being lookup every minutes.

and also looking at the bandwidth from 6 month ago, I see the same behaviour. So I think it should be something else.

However I’m pretty much sure its being done by a discourse job or somthing similar. cause when I stop discourse app, the bandwidth go away. However when I just stop discourse nginx app, the bandwidth still remains.

sam · October 6, 2022, 12:07pm

Have a look in /sidekiq it shoul tell you which jobs are running, be sure to click all tabs

hnaseri · October 6, 2022, 12:51pm

No job is running. . Is there some other jobs that wouldn’t be listed here?

Or maybe there is something in the container that tries to index files?

Falco · October 6, 2022, 3:03pm

All our background logic happens on Sidekiq jobs. If no job is running and you still have high disk I/O it may be users visiting your website and images being served by nginx ?

Do you have a caching CDN fronting static assets?

hnaseri · October 6, 2022, 3:08pm

I tested this previously.

So its not because users visiting website. If so, when I stopped nginx, the traffic should go away.

Falco · October 6, 2022, 3:09pm

You will need to use the Linux inspection tools to see what exactly PIDs and syscalls are being made then.

hnaseri · October 6, 2022, 7:52pm

@Falco @sam I think I found the root cause.

First I restarted the discourse app so that the constant traffic go away. Then I went to admin panel and went to the section for bulk reports. Its been a long time that reports dont show properly here:

Immediately after the reports are being timeout, I see the jump in network bandwidth. And I see this error in error logs:


'hijack admin/reports bulk ' is still running after 90 seconds on db default, this process may need to be restarted!

What is going wrong here?

Falco · October 6, 2022, 7:54pm

Is the database in the same NAS storage?

hnaseri · October 6, 2022, 7:54pm

No the database is on the physical ssd disk.

only upload folder is on nas

Falco · October 6, 2022, 7:55pm

So there is no correlation between those. Back to

hnaseri · October 6, 2022, 8:00pm

In fact I think maybe there is a correlation. in my test environment here it calculates the used space.

I think calculating the used space on a NAS folder with a lot of files would be very much time consuming and the root cause of high bandwidth.

Am I right?

Falco · October 6, 2022, 10:30pm

Does running

df -Pk

df -P

du -s

take a significant amount of time on the network share?

hnaseri · October 6, 2022, 10:56pm

these two were instant

df -Pk

df -P

However du -s resulted in a similar behavior I reported above.

And it was running for about 5 minutes and didn’t finish and I needed to terminate it manually.

Falco · October 7, 2022, 1:14am

Oh I see. That report result is cached but I guess it never finishes and can’t be cached because you network share is too slow.

hnaseri · October 7, 2022, 4:03am

So is there anything we can do to prevent this? For example treat it like s3 uploads that we don’t calculate disk size

Topic		Replies	Views
Huge amount of storage transactions Installation	13	1624	June 24, 2020
Disk usage spike during backup, Discourse crashed hard :-( Installation server-resources	21	2555	July 23, 2020
S3 image bandwidth costs are getting annoying Support	29	3971	November 16, 2022
Discourse overloaded real traffic or DDOS? 100% CPU usage despite of decent traffic and high specs server Installation server-resources	18	2220	September 25, 2021
Our disk space disappeared - how to find who/where? Installation server-resources	13	2569	June 8, 2024

Huge network traffic on NAS Storage

Related topics