@sam, I self-host and am wrestling with tesseract now. Installed no problem but its throwing errors that don’t seem to be serious enough to fail the job:
Error during OCR processing: /var/www/discourse/lib/discourse.rb:139:in `exec’: Failed to OCR image with Tesseract
Estimating resolution as 337
Even with that error, the PDF shows in the Persona as being indexed.
I’m not sure what this means in terms of the impact on RAG. I’ll dig deeper over the weekend.
Thank you for responding so quickly.