Backups are duplicating and not respecting number to keep on disk

This is somewhat hard to untangle… search for directories called optimized and _optimized get the size of those directories.

/shared/uploads# find  | grep optimized$
./default/optimized
./default/_optimized
./tombstone/default/optimized
./tombstone/default/_optimized

So there you go, you would save 13 or so gigs per backup if we did not back that data up (data that can be re-generated on demand)

4 Likes

Sure:

root@forum:/var/discourse/shared/standalone/uploads# find | grep optimized$
./tombstone/default/optimized
./default/optimized
root@forum:/var/discourse/shared/standalone/uploads# du -sh ./tombstone/default/optimized; du -sh ./default/optimized
4.5G    ./tombstone/default/optimized
14G     ./default/optimized

And yes, saving that 13GB would be lovely!

1 Like

Backup appears to be running right now, I have a gzip process started at 13:24 ET.

sam 30180 88.2 0.0 13232 2000 ? RN 13:24 0:07 gzip -5 /var/www/discourse/public/backups/default/quarter-to-three-forums-2019-02-07-181213-v20190130013015.tar

Again I must stress that our backups are currently scheduled for 6:30AM UTC, which is 1:30AM ET. And yet, they’re running at 13:24 ET, which is 18:24 UTC.

Why? Wish I knew, really do wish I knew.

Had to kill it to fix notifications again.

From the log mailed to me. Backup started at 18:12. Why? Good question.

[2019-02-07 18:12:14] [STARTED]
[2019-02-07 18:12:14] 'system' has started the backup!
[2019-02-07 18:12:14] Marking backup as running...

Any possibility that Discourse is grabbing the timestamp from inside the Docker container and it’s wrong there?

CuNqii_XgAEbmH5

1 Like

No.

root@forum-app:/var/www/discourse/public/backups/default# date
Thu Feb 7 18:45:59 UTC 2019

Hmm let’s see. From my instance on colocation, which has the default of 3:30

coding-horror-discussion-2019-02-06-033852-v20190130013015.tar.gz
coding-horror-discussion-2019-01-24-033540-v20190110201340.tar.gz
coding-horror-discussion-2019-01-17-033114-v20190110201340.tar.gz

I am reading those times encoded in the filename as

03:38:52
03:35:40
03:31:14

Seems correct to me? I guess those are the times the backup finished, since it started around 3:30.

Are you absolutely 100% sure that the underlying operating system has UTC as the default timezone?

1 Like

Yes, the backup I had to kill had started at 181213, 18:12:13 UTC, 6:12PM UTC, 1:12PM ET.

The host OS is in ET, not UTC. I mentioned that earlier in the thread. The docker container is in UTC, and it does have the correct time set (in UTC).

Does the host OS need to be in UTC for discourse? The install guides don’t say anything about that.

What kind of animal uses a server where the default time zone isn’t UTC?

2 Likes

I always use ET so I don’t have to worry about converting timezones in my head.

If Discourse needs the host OS to be in UTC, that really should be in the install guide.

And when scheduling backups in the UI, it specifies “in UTC”.

And the docker container is set to UTC and its time is correct so I’m not sure why the host OS tz would matter in the first place. The whole thing runs inside the container, how would it even know?

My guess is you’ll need to offset that backup time. It’s just plain weird to have a server OS think of time in anything other than UTC, as God intended.

4 Likes

OK, will make note of when the next backup runs and offset it. I do suggest you add it to the install guide.

Keep in mind that backups were scheduled for early morning UTC and yet they ran in early afternoon ET. But UTC is 5 hours ahead. So it isn’t a simple offset that can be calculated beforehand.

Why does taking a backup break notifications for the duration? And it seemed like backups never ended before, they just kept restarting. I’ll take closer note of what happens when/if backups fail next.

Any updates on this @gerhard? I’d really like to skip the double compression step because it is a) painful and b) pointless.

@wingtip we’re going to add the “skip retina thumbnail images in backups” setting soon, which should help. p.s. switch your server to UTC already, this is like sysadmin 101 stuff man.

4 Likes

That would be helpful, thanks.

I work in IT, and putting servers in UTC is not a standard or even best practice across the industry. It’s just your app that has a problem, if indeed that is the problem. If it is, I’ll make the change of course.

Absolutely untrue. I’ve managed online environments globally for around 20 years and UTC has been a must from the outset.

The only places I’ve seen local timezones set are tinpot organisations who deal with customers in their region and can’t think beyond their immediate locality.

I’ll just leave this here

http://yellerapp.com/posts/2015-01-12-the-worst-server-setup-you-can-make.html

Nearly every mature engineering organization runs all their servers on UTC, and you should too.

Exactly, why would you run a server on a local timezone if you have a global audience. Even for basic support any user reports would mean working with the difference between two timezones, rather than their UTC offset.

Many of our global fortune 100 customers don’t use UTC. But lets not derail the topic.

Well, you’re the one complaining that you don’t like random dates and times on your server. If you want predictable dates and times on your server, use UTC.

Otherwise your backup time can be affected by daylight savings, and so on.

1 Like

Absolutely, if that is causing our problem I’ll make the change and reboot the host to be sure. And not to beat a dead :horse:, but it should be in the install docs!