Option to use flat/text backing files instead of SQL(Postgres) backend

adrelanos · May 26, 2020, 3:45pm

Currently the primary storage for forum posts, user accounts etc. is the postgres database.

May I suggest please to make the main storage format for forum contents simple text files?

Due to hard to fix database issues (difficult for me, as user), I think the risk is high for losing all forum data because of the seemingly/effectively opaque binary format of SQL databases. Looks like nobody can fix a severely broken database (which won’t be a visible problem as above issue) or too expensive for laymen.

I am sure, there are very good reasons to use databases such as postgres such as performance. However I propose a transparent human readable text format for backups as an emergency last resort in case db backup and restore functionality is corrupt or broken.

You probably don’t need to be convinced how amazing git is, you’re already using that. Forum contents could be stored as sub folders in and lots of text files. Thereby the whole folder could be put under git version control. If any bugs are introduced, it’s much easier to track down what commit caused it.

Since databases (unreliable, complex) are probably still be required, these text files (simple, reliable) could serve as a template to re-create the database “from source”. If storage of new posts inside text files is too slow in realtime, you can make the text file backup option on-demand or when the system is idle. (Delayed write / write cache.)

Public data (public forum posts) would be in a different folder than private user data, hashed passwords. Added advantage would be that the public part (posts) could even be published on a git remote for those who find that useful (archival). User data would stay in a local-only git repository (or custom, remote, private, encrypted git repository).

Stephen · May 26, 2020, 4:00pm

There’s an economy of scale here. Such an engineering change is significant. Were the above possible, the performance implications are such that your additional running costs would likely eclipse the cost of a consultant to fix your database.

Databases exist because they are vastly more efficient than the alternative, flat text files.

The software is free, but that’s as far as it goes. You’re going to far be better off making a short term investment in a marketplace topic than promoting an approach which makes Discourse too costly to run.

RGJ · May 26, 2020, 4:01pm

TL;DR No, you don’t want that. You really don’t.

I understand your need for more simplicity.

But.

In the 1990’s I worked with internet forum (telnet BBS) software that was built on text files. We were continuously craving for more and better functionality. We needed to add “columns” to data. We added a TAB and then added the extra column. We needed to add them on all existing files. Then we wanted to remove (another) piece of data. We wrote an awk script to go through all the files and remove this piece of data. We needed to put the software into maintenance mode in the meanwhile because the software would see text files with a different number of fields. It was hell. We so much craved for a better system but we needed to be able to run the software with minimal resources and thus we only had a file system. We needed… a database.

However, performance is not the only issue. Text files can corrupt as well, for instance if two processes are writing to them at the same time.

There is also something called referential integrity, which guarantees that internal references exist (for instance this post belongs to topic #152787 and user #406).

And then there is transactions, snapshotting, replication, load balancing.

Sure, your database can break down. My car broke down yesterday because the overly complex ABS control software is really vulnerable to small and seemingly unrelated issues. It’s impossible to repair it by myself. I need to fork out a significant amount of cash to get it repaired. But it has so much advantages above walking everywhere. Is it unreliable? No. It still brakes and I was immediately warned by the indicator on the dashboard.

Databases were built for reliability because text files were not.

pfaffman · May 27, 2020, 9:59pm

Richard’s analogy is apt. Keeping up with Discourse’s data in text files would be impossible.

Even adding support for another database would cost on the order of $200,000/year.

You might be better deciding on a budget and asking in marketplace for someone to fix your database. It’s a hard job to bid because it’s not immediately clear whether you have one corrupt index with a few records to fix up or several of them and dozens to fix.

codinghorror · May 28, 2020, 3:26am

The corrupt indexes in PG10 is something we are keeping a close eye on and we’ll absolutely help as we can, in the topic you linked, for everyone’s benefit. It’s my hope that PG12 is more resistant to this problem, and upgrading will make it go away long term.

But I surely feel that “let’s revert to plain text files” is not an appropriate solution to this problem.

dfabulich · May 28, 2020, 4:01am

Postgres offers a plain text backup solution, pg_dump.

pg_dump is a utility for backing up a PostgreSQL database.

Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved.

adrelanos · May 28, 2020, 9:44am

Thank you very much for your consideration, replies! Much appreciated!

Laurent · May 29, 2020, 11:17am

Strictly speaking, a bunch of text files is a database, just not one you’d want to use for anything critical or performant.

Topic		Replies	Views
Multiple backup schedules Feature	15	1047	April 13, 2022
Multiple Daily Backups Support	15	1410	January 13, 2022
Recommended backup process for very large forum? Support	8	1285	July 28, 2019
Export all data of all posts Support	3	3678	September 20, 2021
Best option for DB and file Storage Installation	4	378	January 13, 2024

Option to use flat/text backing files instead of SQL(Postgres) backend

Related topics