Images are disappearing off of s3!

(Stephen) #6

All of the posts with missing images are at least two or three days old. Do you have any form of object expiration configured on your bucket?

If you followed the S3 guide it isn’t being used as a cache, it’s the place discourse is offloading originals to. Any expiration settings means the file gets offloaded and then totally separate from discourse it’s being deleted by a process at amazon.

Why are you using a cdn and cloudflare in addition to S3? Is maxcdn providing the S3 bucket?

(David Kingham) #7

I’ve figured out this is related to tombstone, I found the images in a tombstone folder on s3. There was a lifecycle rule in my bucket called purge_tombstone that was moving the images out of the main folder. I have disabled the rule but I’m wondering if this is a bug causing this? @zogstrip @codinghorror any thoughts?

Problem with file attachments
(Jeff Atwood) #8

Additionally, and critically, were the images in “chat” or “Q&A” topics specific to these plugins you removed?

(David Kingham) #9

No, I wasn’t even using Q&A and very few people used the chat. I did delete the groups that Babble left behind…

(Stephen) #10

How was the bucket created and following what process?

(David Kingham) #11

I followed this guide and had no issues for months

(David Kingham) #12

I was able to move all the images back from the tombstone folder and everything is running smoothly now. The question now becomes why were the images tombstoned in the first place and how do I avoid this in the future?

I found this similar post, but I still can’t put two and two together for what caused this.

(Sam Saffron) #13

Are you on latest test passed?

(David Kingham) #14

Yes updated this morning

(Sam Saffron) #15

Run a global rebake at least that will stop any new stuff from breaking

Is sidekiq crashing, did you lose a bunch of jobs? What version were you running prior to upgrade?

(Schrödinger's Corgi) #16

hello David,

i met this problem twice before. The latest one happened two days ago. This kind of problem does not happen often on my website.

But i uploaded my images on the hosting server, rather than s3.

I found out that my “bad” writing habits might cause problems.

my “bad” writting habit:

I will upload images first than cut the markdown links into the local editor. But i don’t close the discourse editor. Afterwards, i will paste them back.

Why do i say that?

“I found out that my “bad” writing habits might cause problems.”

Because the time i set for the following setting just equal the time them disappeared.
clean orphan uploads grace period hours

But I am not sure if it is caused by this reason.:thinking:

When i totally use the discourse editor ,everything is fine.

What i have done for this:

I deleted the topics then I re-uploaded the content.
I rebaked posts.
I rebuilded the discourse docker instance.

The images are still there. Happy ending :sweat_smile:

(David Kingham) #17

Simple as this?

cd /var/discourse
./launcher enter app
rake posts:rebake

I don’t think it is, I went to /sidekiq and it seems to be running but I am ignorant on this. There are no dead jobs, 73 fails but I don’t know the timeline.

It seems to me that something I did this morning caused tombstone to run, is there any way I can identify that?

(Stephen) #18

That’s what we’re all trying to get to- something is happening because something changed.

But depending on the expiration policy it may not be a change today, it could be n days ago where n is whatever expiration period was set, so it could be 3 hours, 3 days or 3 weeks.

Even personal sites need some kind of schedule of change, I’d recommend a thread in /staff at the minimum. It’s the quickest way to pin down the source of such problems.

(Sam Saffron) #19

@tgxworld and I isolated a condition where uploads were missing associations with posts.

For the last 5 or so years we had a sidekiq deferred job create the association saying that upload X belongs to post Y. This has worked OK but has some pretty bad edge conditions.

If for some reason the job was mega delayed or lost then the associations are not created. If the associations are missing there is a regular job that moves the uploads to tombstone, there is a second policy that eventually deletes from tombstone.

Our new process is to create the association just as you save the post. This means this edge case can not happen anymore. I recommended a rebake to make sure if the job missed on a few posts that we would still create the associations.

We fixed this issue late last week, so there is a short transition period in place.

(David Kingham) #21

Thanks Sam, I ran the rebake and now I seem to be in worse shape. Now the images don’t even display a box where the image should be, example: The Chamber of Wonders - Landscape Gallery - Nature Photographers Network

FYI I manually moved all the files in tombstone to their proper locations rather than doing an rsync, which may have been the fatal flaw and broke the images, what are my options now?

(Sam Saffron) #22

Just copying stuff out of tombstone is not enough recover an image. At that point the upload record is already deleted.

How big is this problem on your site, how many posts do you have impacted by this? Simplest thing to do if say 5 or less is just to edit the post and re-upload the image from tombstone.

(Alan Tan) #23

@davidkingham you can run the following to find out how many uploads are broke.

./launcher enter app
rake uploads:list_broken_posts

(David Kingham) #24

That command didn’t work, but uploads:list_posts_with_broken_images shows that almost 300 of the images are broken now. The files were just copied back into tombstone, so obviously the issue still remains after running the rebake. How would I properly run the rsync to copy the files back over without breaking them?

(Sam Saffron) #25

300 is a lot! Is there a pattern here? When was the first post with missing images posted? When was the last one.

We are going to work on a rake task to correct this corruption as long as the file exists either in the tombstone or in the expected location.

(Alan Tan) #26

@davidkingham Can you try out the new rake task?

./launcher enter app
RECOVER_FROM_S3=1 rake uploads:list_posts_with_broken_images