Restore fails due to disk space on migration because of 70M calendar events

I’ve got a standard install that’s trying to restore a database. It is failing on the migration.


ALTER TABLE
Migrating the database...
EXCEPTION: rake db:migrate
Failed to migrate database.
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled: (StandardError)

PG::DiskFull: ERROR:  could not write to file "base/pgsql_tmp/pgsql_tmp11009.51": No space left on device
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-4.0.1/lib/patches/db/pg/alias_method.rb:109:in `exec'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rack-mini-profiler-4.0.1/lib/patches/db/pg/alias_method.rb:109:in `async_exec'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.4/lib/active_record/connection_adapters/postgresql/database_state
ments.rb:167:in `perform_query'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.4/lib/active_record/connection_adapters/abstract/database_stateme
nts.rb:556:in `block (2 levels) in raw_execute'
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-8.0.4/lib/active_record/connection_adapters/abstract_adapter.rb:1017:
in `block in with_raw_connection'

There’s 90GB of data free on the disk. On the source disk, the postgres directory is only 23GB.

How is a 23GB database failing to restore during database migration with 90GB free?

Source – Server Version: 3.5.0.beta5-dev (Commit: b16fb6a60b3f1db475cbb91a51b7d4c734370083 — May 7, 2025)

Destination Server Version: 2026.2.0-latest (Commit: b39866eb8891648a54764755e2e36eb725bd6c73 — 4 days)

23G     /var/discourse/shared/standalone/postgres_data/
-rw-r--r-- 1 discourse discourse  16G Feb 10 21:13 site-2026-02-10-174058-v20250507013646.sql
-rw-r--r-- 1 discourse discourse 2.9G Feb 10 21:11 site-2026-02-10-174058-v20250507013646.sql.gz
root@forum-data:/shared# # after delete
root@forum-data:/shared# du -hs postgres_data/; df -h .
24G     postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G   87G   68G  56% /shared
....
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G  154G  607M 100% /shared
overlay         154G  154G  607M 100% /
91G     postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G  154G   82M 100% /shared
overlay         154G  154G   82M 100% /
1.1G    postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G   65G   90G  42% /shared
overlay         154G   65G   90G  42% /
1.1G    postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G   65G   90G  42% /shared
overlay         154G   65G   90G  42% /
1.1G    postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G   65G   90G  42% /shared
overlay         154G   65G   90G  42% /
1.1G    postgres_data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       154G   65G   90G  42% /shared
overlay         154G   65G   90G  42% /
1.1G    postgres_data/

After the database had restored, postgres_data was 24G.

Something in the migration had postgres_data balloon to 91G before the disk filled and the restore failed. This is on a clean install with the Discourse versions shown above.

This seems like it must be a bug, so I’m recategorizing. I’ve also seen it when trying to upgrade. I’d thought for that upgrade that a solution would be to move to a new server, but now I see that won’t work.

I guess the next thing to do will be to try to see what query is causing this and/or what table it is. I’m not quite sure how to go about that.

2 Likes

I am putting this in support, this message is coming direct from the file system, if PG thinks it has no space, it probably has no space cause not enough is carved out for it.

Do you have any hints how a 24G database could expand to 90G during database migration?

not sure.. I would probably run ncdu or something and have a look when the situation happened.

Using du and df I watched the size of postgres_data as it grew from 25g to 90g and the space on the disk go to (near) zero before it failed.

I guess i need to find a way to track what query is running the next time.

I’ve seen it on at least two sites. One with an upgrade and one on a restore. Both had more free space than the size of the database to start.

Bunch of migrations rewrite whole tables, which can temporarily use more space, and PostgreSQL doesn’t aggressively gives space back to the system too.

Does that forum has AI with embeddings enabled ?

I was thinking something like that. . .

That seems like a likely culprit. . . . sigh. No. It doesn’t even have the AI plugin installed.

2 Likes

If you want to go down the rabbit hole, you could restore it manually to a dev environment that is at the May 2025 commit, then go to current commit and run migrations one by one and print the space before and after each — obviously with a script :stuck_out_tongue: .

3 Likes

OK. I’m restoring on a production site. It’s migrating.

Here’s a query that’s been running for 30 minutes:

discourse=# SELECT pid, usename, state, query, query_start
FROM pg_stat_activity
WHERE state = 'active';
 pid |  usename  | state  |                                                    query                                                     |          query_start
-----+-----------+--------+--------------------------------------------------------------------------------------------------------------+-------------------------------
 519 | postgres  | active | SELECT pid, usename, state, query, query_start                                                              +| 2026-02-14 17:58:35.473337+00
     |           |        | FROM pg_stat_activity                                                                                       +|
     |           |        | WHERE state = 'active';                                                                                      |
 308 | discourse | active | DELETE                                                                                                      +| 2026-02-14 17:26:08.12598+00
     |           |        |   FROM calendar_events ce                                                                                   +|
     |           |        | WHERE                                                                                                       +|
     |           |        |   ce.id IN (SELECT DISTINCT(ce3.id) FROM calendar_events ce2                                                +|
     |           |        |             LEFT JOIN calendar_events ce3 ON ce3.user_id = ce2.user_id AND ce3.description = ce2.description+|
     |           |        |             WHERE ce2.start_date >= (ce3.start_date - INTERVAL '1 days')                                    +|
     |           |        |               AND ce2.start_date <= (ce3.start_date + INTERVAL '1 days')                                    +|
     |           |        |               AND ce2.timezone IS NOT NULL                                                                  +|
     |           |        |               AND ce3.timezone IS NULL                                                                      +|
     |           |        |               AND ce3.id != ce2.id                                                                          +|
     |           |        |               AND ce2.post_id IS NULL                                                                       +|
     |           |        |               AND ce3.post_id IS NULL                                                                       +|
     |           |        |   )                                                                                                         +|
     |           |        |                                                                                                              |
(2 rows)

That’s from this:

This shows that the latest migration is later, but I guess that’s because that migration is from a plugin that just got added;.

discourse=# SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1;
    version
----------------
 20250507013646
(1 row)

There are a lot of calendar events!!! 69_724_384!

discourse=# select count(*) from calendar_events;
  count
----------
 69724384
(1 row)

So I guess that’s the problem.

. . . but they don’t even have discourse_calendar installed on the source site!?

But they DO have that number of events in calendar_events on the source table. . . .

But there are only 4 post_custom_fields with a calendar event. ?

So there are a bunch of events that are not connected to a topic or post. Standard holidays like Christmas, new year’s, boxing day.

Here’s the fix

delete from  calendar_events where topic_id=0 and post_id is null and post_number is null;

And that left 99 events.

Maybe SiteSetting.delete_expired_event_posts_after got set to 0, and even though they weren’t really using the plugin it just created all of these events?

And yup! SiteSetting.delete_expired_event_posts_after is set to -1.

Oh, but -1 is the default. So maybe that’s the issue?

No. That’s event POSTS. I don’t quite know how there were 70 Million events.

https://github.com/discourse/discourse/blob/main/plugins/discourse-calendar/config/settings.yml#L11-L13
1 Like