Store pdf and doc files as raw text in the database - Where to start?

I want to modify my Discourse installation so that it can also store the content of pdf and doc files as raw text, in the database. I have a basic understanding of the database structure and how the Discourse code works. Where should I begin modifying the source code?

What problem are you trying to solve? It’s hard to imagine that what you describe is a good solution. Here’s where to start Beginner's Guide to Creating Discourse Plugins - Part 1