I’m writing this post regarding a postgres table missing chunk error that was failing my daily backups. I managed to sort it out and I want to lay out the method I’ve used in case it can be of use for someone else encountering this issue. By no means is this is a guide or professional instructions on how to solve properly postgres missing chunk errors because I’m no expert and I did it without having much knowledge but it worked out in the end.
The error I was getting during backups:
pg_dump: error: Dumping the contents of table "stylesheet_cache" failed: PQgetResult() failed.
pg_dump: error: Error message from server: ERROR: missing chunk number 0 for toast value 1903804 in pg_toast_22663
pg_dump: error: The command was: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;
So the problem was in the stylesheet_cache table, ERROR: missing chunk number 0 for toast value 1903804 in pg_toast_22663.
I’m using Discourse in Docker so here is what I did
ssh into the server
cd /path/to/discourse
./launcher rebuild app
su postgres -c "psql discourse"
Once I was in postgres I tried to REINDEX the table in hope it will solve the issue.
discourse=# select reltoastrelid::regclass from pg_class where relname = 'stylesheet_cache';
reltoastrelid
-------------------------
pg_toast.pg_toast_18396
(1 row)
REINDEX table stylesheet_cache;
REINDEX table pg_toast.pg_toast_18396;
VACUUM analyze stylesheet_cache;
Tried to do the backup again, still failed, so I had to continue finding the corruption
I computed the total number of rows in my table with:
select count(*) from stylesheet_cache;
I got a count of 2260 rows.
I went to finding the exact row that is corrupted and delete it, somehow I assumed that stylesheet_cache is not something that can really break my whole db. As I mentioned I’m no expert so maybe I took a too big of a leap of faith so be careful when doing this cause you could mess up your db.
I used the LIMIT and OFFSET queries to find the corrupted rows
select * from stylesheet_cache order by id limit 100 offset 0;
I played around with the offset and limit until I got the Missing chunk…’ error and then narrowed down to LIMIT 1 and offset the row that was corrupted. In my case there were 5 rows corrupted.
You could use a sh script to find the row with less effort, I went the manual way which took some time but you could use this:
#!/bin/sh
j=0
while [ $j -lt 2260 ]
do
psql -U postgres -d discourse -c "SELECT * FROM stylesheet_cache LIMIT 1 offset $j" >/dev/null || echo $j
j=$(($j+1))
done
I went to find the id of the corrupted row with the following command:
select id from stylesheet_cache order by id limit 1 offset 450;
id
----
11498
So the id of my corrupted row was 11498
I deleted the row with
delete from stylesheet_cache where id = 11498;
After deleting the corrupted rows I went to REINDEX again
REINDEX table stylesheet_cache;
REINDEX table pg_toast.pg_toast_40948;
VACUUM analyze stylesheet_cache;
And tried to backup again, this time it worked. I guided myself using this postgres recovery instructions: Postgres error: Missing chunk 0 for toast value in pg_toast · GitHub
DISCLAIMER: this is not a expert opinion so proceed with caution if you have similar missing chunk corruptions and trying to fix them.