How I fixed a table from postgres

I’m writing this post regarding a postgres table missing chunk error that was failing my daily backups. I managed to sort it out and I want to lay out the method I’ve used in case it can be of use for someone else encountering this issue. By no means is this is a guide or professional instructions on how to solve properly postgres missing chunk errors because I’m no expert and I did it without having much knowledge but it worked out in the end.

The error I was getting during backups:

 pg_dump: error: Dumping the contents of table "stylesheet_cache" failed: PQgetResult() failed.
 pg_dump: error: Error message from server: ERROR:  missing chunk number 0 for toast value 1903804 in pg_toast_22663
 pg_dump: error: The command was: COPY public.stylesheet_cache (id, target, digest, content, created_at, updated_at, theme_id, source_map) TO stdout;

So the problem was in the stylesheet_cache table, ERROR: missing chunk number 0 for toast value 1903804 in pg_toast_22663.

I’m using Discourse in Docker so here is what I did

ssh into the server
cd /path/to/discourse
./launcher rebuild app
su postgres -c "psql discourse"

Once I was in postgres I tried to REINDEX the table in hope it will solve the issue.

discourse=# select reltoastrelid::regclass from pg_class where relname = 'stylesheet_cache';
      reltoastrelid      
-------------------------
 pg_toast.pg_toast_18396
(1 row)
REINDEX table stylesheet_cache;
REINDEX table pg_toast.pg_toast_18396;
VACUUM analyze stylesheet_cache;

Tried to do the backup again, still failed, so I had to continue finding the corruption

I computed the total number of rows in my table with:

select count(*) from stylesheet_cache;

I got a count of 2260 rows.

I went to finding the exact row that is corrupted and delete it, somehow I assumed that stylesheet_cache is not something that can really break my whole db. As I mentioned I’m no expert so maybe I took a too big of a leap of faith so be careful when doing this cause you could mess up your db.

I used the LIMIT and OFFSET queries to find the corrupted rows

select * from stylesheet_cache order by id limit 100 offset 0;

I played around with the offset and limit until I got the Missing chunk…’ error and then narrowed down to LIMIT 1 and offset the row that was corrupted. In my case there were 5 rows corrupted.

You could use a sh script to find the row with less effort, I went the manual way which took some time but you could use this:

#!/bin/sh
j=0
while [ $j -lt 2260 ]
do
  psql -U postgres -d discourse -c "SELECT * FROM stylesheet_cache LIMIT 1 offset $j" >/dev/null || echo $j
  j=$(($j+1))
done

I went to find the id of the corrupted row with the following command:

select id from stylesheet_cache order by id limit 1 offset 450;

id

----

11498

So the id of my corrupted row was 11498
I deleted the row with

delete from stylesheet_cache where id = 11498;

After deleting the corrupted rows I went to REINDEX again

REINDEX table stylesheet_cache;
REINDEX table pg_toast.pg_toast_40948;
VACUUM analyze stylesheet_cache;

And tried to backup again, this time it worked. I guided myself using this postgres recovery instructions: Postgres error: Missing chunk 0 for toast value in pg_toast · GitHub

DISCLAIMER: this is not a expert opinion so proceed with caution if you have similar missing chunk corruptions and trying to fix them.

3 Likes