Mover un sitio de Discourse a otro VPS con rsync

I got this done, and Postgres (and Discourse!) seem to be happy.

I cleaned them out by hand, making the URLs unique with ** patterns as appropriate. It might just be a harmless cache where I could have deleted the duplicates but I didn’t want to risk it.

In my case, it was just the one index, so rebuilding all the indices was probably overkill, but honestly I felt better knowing I caught everything.

After a few failed runs of rebuilding, which takes 30 seconds or so each time and reports a single problem, this was my SQL magic to get a complete list of problem items instantly:

discourse=# select topic_id, post_id, url, COUNT(*) from topic_links GROUP BY topic_id, post_id, url HAVING COUNT(*) > 1 order by topic_id, post_id;
 topic_id | post_id |                          url                          | count 
----------+---------+-------------------------------------------------------+-------
    19200 |   88461 | http://hg.libsdl.org/SDL/rev/**533131e24aeb           |     2
    19207 |   88521 | http://hg.libsdl.org/SDL/rev/44a2e00e7c66             |     2
    19255 |   88683 | http://lists.libsdl.org/__listinfo.cgi/sdl-libsdl.org |     2
    19255 |   88683 | http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org |     2
    19523 |   90003 | http://twitter.com/Ironcode_Gaming                    |     2
(5 rows)

(5 remaining problem items in this query, for example purposes.)

Then I’d look at each post to see what was there and what to fix up:

select * from topic_links where topic_id=19255 and post_id=88683

and then fix one of them up:

update public.topic_links set url='http://lists.libsdl.org/__listinfo.cgi/**sdl-libsdl.org' where id=275100;

Until I ran out of things to fix up. :slight_smile:

I probably could have done some inner-join magic (or maybe a little Ruby) to get this all in one query, but I’m not an expert and it turned out to not be hours of work to do it manually. But it was tedious, to be clear. :slight_smile:

Then I did REINDEX DATABASE discourse; without the CONCURRENTLY just to keep it simple, nuked a few ccnew* indices I had missed earlier, and I was good to go.

Site was live the whole time, no downtime.

Whether this was necessary or not, I definitely feel like my data is a little safer now, and I’m not careening towards some unannounced future disaster.

Thanks for nudging me in the right direction to figure this out, @pfaffman!

2 Me gusta