Images are disappearing off of s3!

(David Kingham) #1

I have no idea what is going on but our images are disappearing more and more by the minute. It’s not a CDN issue because I can’t even find the files on s3 anymore. For example: Alpine Bliss - Landscape Gallery - Nature Photographers Network

It started off with one image missing this morning, now at least 10 have disappeared.

I am in panic mode, can anyone help?

(David Kingham) #2

Now up to 17. The only thing I changed this morning was removing the Babble plugin and the question/answer plugin.

(Stephen) #3

Were the images originally uploaded to these posts?

Can you share your versions of everything seen in /admin/upgrade?

(David Kingham) #4

Yes, they were displaying fine yesterday.

(David Kingham) #5

I also purged the cache on my CDN and and cloudflare after talking to maxcdn about the first image disappearing, now it seems to be multiplying.

(Stephen) #6

All of the posts with missing images are at least two or three days old. Do you have any form of object expiration configured on your bucket?

If you followed the S3 guide it isn’t being used as a cache, it’s the place discourse is offloading originals to. Any expiration settings means the file gets offloaded and then totally separate from discourse it’s being deleted by a process at amazon.

Why are you using a cdn and cloudflare in addition to S3? Is maxcdn providing the S3 bucket?

(David Kingham) #7

I’ve figured out this is related to tombstone, I found the images in a tombstone folder on s3. There was a lifecycle rule in my bucket called purge_tombstone that was moving the images out of the main folder. I have disabled the rule but I’m wondering if this is a bug causing this? @zogstrip @codinghorror any thoughts?

Problem with file attachments
(Jeff Atwood) #8

Additionally, and critically, were the images in “chat” or “Q&A” topics specific to these plugins you removed?

(David Kingham) #9

No, I wasn’t even using Q&A and very few people used the chat. I did delete the groups that Babble left behind…

(Stephen) #10

How was the bucket created and following what process?

(David Kingham) #11

I followed this guide and had no issues for months

(David Kingham) #12

I was able to move all the images back from the tombstone folder and everything is running smoothly now. The question now becomes why were the images tombstoned in the first place and how do I avoid this in the future?

I found this similar post, but I still can’t put two and two together for what caused this.

(Sam Saffron) #13

Are you on latest test passed?

(David Kingham) #14

Yes updated this morning

(Sam Saffron) #15

Run a global rebake at least that will stop any new stuff from breaking

Is sidekiq crashing, did you lose a bunch of jobs? What version were you running prior to upgrade?

(Schrödinger's Corgi) #16

hello David,

i met this problem twice before. The latest one happened two days ago. This kind of problem does not happen often on my website.

But i uploaded my images on the hosting server, rather than s3.

I found out that my “bad” writing habits might cause problems.

my “bad” writting habit:

I will upload images first than cut the markdown links into the local editor. But i don’t close the discourse editor. Afterwards, i will paste them back.

Why do i say that?

“I found out that my “bad” writing habits might cause problems.”

Because the time i set for the following setting just equal the time them disappeared.
clean orphan uploads grace period hours

But I am not sure if it is caused by this reason.:thinking:

When i totally use the discourse editor ,everything is fine.

What i have done for this:

I deleted the topics then I re-uploaded the content.
I rebaked posts.
I rebuilded the discourse docker instance.

The images are still there. Happy ending :sweat_smile:

(David Kingham) #17

Simple as this?

cd /var/discourse
./launcher enter app
rake posts:rebake

I don’t think it is, I went to /sidekiq and it seems to be running but I am ignorant on this. There are no dead jobs, 73 fails but I don’t know the timeline.

It seems to me that something I did this morning caused tombstone to run, is there any way I can identify that?

(Stephen) #18

That’s what we’re all trying to get to- something is happening because something changed.

But depending on the expiration policy it may not be a change today, it could be n days ago where n is whatever expiration period was set, so it could be 3 hours, 3 days or 3 weeks.

Even personal sites need some kind of schedule of change, I’d recommend a thread in /staff at the minimum. It’s the quickest way to pin down the source of such problems.

(Sam Saffron) #19

@tgxworld and I isolated a condition where uploads were missing associations with posts.

For the last 5 or so years we had a sidekiq deferred job create the association saying that upload X belongs to post Y. This has worked OK but has some pretty bad edge conditions.

If for some reason the job was mega delayed or lost then the associations are not created. If the associations are missing there is a regular job that moves the uploads to tombstone, there is a second policy that eventually deletes from tombstone.

Our new process is to create the association just as you save the post. This means this edge case can not happen anymore. I recommended a rebake to make sure if the job missed on a few posts that we would still create the associations.

We fixed this issue late last week, so there is a short transition period in place.

(David Kingham) #21

Thanks Sam, I ran the rebake and now I seem to be in worse shape. Now the images don’t even display a box where the image should be, example: The Chamber of Wonders - Landscape Gallery - Nature Photographers Network

FYI I manually moved all the files in tombstone to their proper locations rather than doing an rsync, which may have been the fatal flaw and broke the images, what are my options now?