Duplicate Uploaded Files

Hi!

We’re on 1.6.0.beta8 and recently experience a quick disappearance of our local server disk space, as detailed here:

We’ve found out that one user has taken 9GB of uploads disk space, despite not uploading that many files (file limits kept to defaults etc).

Looking at the upload::original_filename values it seems like we have a repeating set of 14 files duplicated 1729 times, i.e. 24,206 new files. The files themselves in that 14 file set are roughly the right size that when repeated would explain the sudden loss of disk space.

An example of the original filename that is duplicated also contains what looks like a hash:

RWRGuide08_zps91rnnhqk.png

Here’s a tail extract of doing a loop for that user’s upload so you can see the pattern and the type of files we are seeing:

print "#{upload.original_filename}, #{upload.filesize}, "

guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085, guide01_zpsyrgnbijn.png, 322155, RWRguide01_zps14rq1hk3.png, 580557, RWRguide02_zpsopafqq9p.png, 489787, RWRguide03_zpsml0uvds9.png, 595934, RWRguide04_zpsbj5xz2re.png, 710352, RWRguide06_zps5rkrzzew.png, 264144, RWRguide07_zpsarqkybgy.png, 265842, RWRguide08_zps91rnnhqk.png, 272160, RWRguide09_zpslyunlou1.png, 262078, RWRguide10_zpsz2mxpowq.png, 291464, RWRguide11_zpsaj3yptns.png, 548966, RWRguide12_zpswbt1oldj.png, 680332, RWRguide13_zpsiik40wve.png, 307085,

Here is one of those files details:

[3] pry(main)> Upload.find_by_original_filename("RWRguide01_zps14rq1hk3.png")

=> #<Upload:0x007feb02bafe60
 id: 8490,
 user_id: 251,
 original_filename: "RWRguide01_zps14rq1hk3.png",
 filesize: 580557,
 width: 690,
 height: 388,
 url:
  "/uploads/default/original/2X/c/cef57cd38aff4a6288438fe8e2121f40856dbd28.png",
 created_at: Wed, 08 Jun 2016 19:33:25 UTC +00:00,
 updated_at: Wed, 08 Jun 2016 19:33:25 UTC +00:00,
 sha1: "cef57cd38aff4a6288438fe8e2121f40856dbd28",
 origin:
  "http://i1133.photobucket.com/albums/m600/Sryan1991/RWRguide01_zps14rq1hk3.png",
 retain_hours: nil>

With a PostUpload.find_by_upload_id(n) call, it looks like these uploads are orphaned, and if created on June 08 then within the 30 day window of clean-up? I won’t change anything as it might be better to determine root cause first. We have changed over from local_storage to S3_storage on 16 June, so I hope that doesn’t confuse the ‘deleted/orphaned’ clean-up code?

How do we (a) clean up these files without losing content and (b) prevent this from happening again, especially as we have now switched to S3 storage so might be growing somewhere where the disk isn’t a problem.

Thanks for any help.

Sure if you can let @zogstrip have access to your site (via PM?) I am sure he can take a closer look to see what happened.

1 Like

Sure thing - thanks. I’ll PM him. It’s looking more and more like a crazed ‘download to local image’ loop, but I’m keen to get help :slight_smile:

2 Likes

Here’s the fix. Turns out, the clean up upload job wasn’t working :frowning:

https://github.com/discourse/discourse/commit/e9a293beeb847a65bd39c08aaf919f1a8fef9404

7 Likes