Ai plugin ocr support

Can support be added to the Discourse ai plugin to add the text in the images to the post (ocr)? Can google lens api (cloud vision) support be added for this?

Example: GitHub - communiteq/discourse-ocr-uploads

4 Likes

Itā€™s in our roadmap to leverage a multi-modal LLM to create image descriptions, which should also provide some level of OCR. But for pure OCR maybe give that plugin a try?

4 Likes

I used this plugin in older versions of Discourse and it worked. But unfortunately it no longer works in the new version of Discourse

1 Like

See AI Image Captioning Feature in Discourse AI Plugin, this is now ready and enabled here on meta.

3 Likes

Thanks for this feature, I will try it @Falco @pmusaraj

2 Likes

I think we should still keep this open, the captioning feature is adjacent to OCR but not exactly OCR.

OCR for example would allow you to take a photo of your notes and then upload and print them exactly. The AI captioning is much more sophisticated but also does not give you that fidelity of printing an entire page of text.

Not sure when we will have time to work on an OCR, but it does feel a bit different.

6 Likes

Now that Anthropic Claude 3 has vision support it does a decent job with ocr jobs, for example

2 Likes

Cries in German

:de: :beer: :leftwards_hand::sob: :rightwards_hand: :pretzel: :hotdog:


On a serious note, I have curiosity about how it would perform on an image like this:

Tesseract gets the following:

MINGW64 ~/Source/Repos/Sut. Driver. Firmware
$ git push
Locking support detected on remote ā€œoriginā€. Consider enabling it with:
$ git config Ifs ā€˜1fs.locksverify true
LFS: Access forbidden. Check your access level.
error: failed to push some refs to
MINGW64 ~/Source/Repos/Sut. Driver. Firmware
$ git push
Locking support detected on remote ā€œoriginā€. Consider enabling it with:
$ git config Ifs. /\fs.locksverify true
Uploading LFS objects: 100% (1/1), 584 KB | 0 B/s, done.
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 478 bytes | 478.00 KiB/s, done.
Total 5 (delta 1), reused 0 (delta 0), pack-reused 0 Ā»
remote:
remote: To create a merge request for visit:
remote: 1
remote:
To
2c50e5b. . ba25f3e
L MINGN64 ~/Source/Repos /Sut. Driver. Firmware

(Ok Iā€™m surprised at how reasonable this result is. Tesseract often changes line order and glitches letters with these types of examples.)

Sam would it be possible for you to throw that image in to claude and post the result?

Feel free to try here, Claude creative persona here has vision support, just enabled it

https://meta.discourse.org/t/parsing-complex-json-data-in-tris20-code/301329

This is either a skill issue on my side, or Claude is having little trouble :sweat_smile:

Skill issue :sweat_smile: I was using the Forum Helper rather than the Creative personality.

Claude Creative gives us this:

MINGW64 ~/Source/Repos/Sut.Driver.Firmware (kingfisher)
$ git push
Locking support detected on remote ā€œoriginā€. Consider enabling it with:
$ git config lfs.http://tfs.locksverify true
LFS: Access forbidden. Check your access level.
error: failed to push some refs to ā€˜http://tfs.lockeed/tfs/HnC/TEC/SUT/Driver%20Firmware/_git/Sut.Driver.Firmwareā€™

MINGW64 ~/Source/Repos/Sut.Driver.Firmware (kingfisher)
$ git push
Locking support detected on remote ā€œoriginā€. Consider enabling it with:
$ git config lfs.http://tfs.locked/tfs.locksverify true
Uploading LFS objects: 100% (1/1), 584 KB | 0 B/s, done.
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 478 bytes | 478.00 KiB/s, done.
Total 5 (delta 1), reused 0 (delta 0), pack-reused 0
remote: To create a merge request for kingfisher, visit:
remote: http://tfs.locked/tfs/HRC/TEC/SUT/_git/Sut.Driver.Firmware/pullrequest/new?sourceRef=kingfisher&targetRef=develop
remote:
To http://tfs.locked/tfs/HRC/TEC/SUT/_git/Sut.Driver.Firmware
2c50e5bā€¦ba25f3e kingfisher - kingfisher

MINGW64 ~/Source/Repos/Sut.Driver.Firmware (kingfisher)
$

Interesting result. I feel like the two are roughly on a par with each other in this example but with different faults. Claude has halucinated the URL here, making a contextual guess that we are doing a pull request based on the git issue.

Tesseract doesnā€™t halicunate, and in this case seems to be more correct. The only thing that stands out is 1fs on line 4, and \fs on line 10 instead of lfs.

Claude also makes a similar error throughout, using tfs instead of lfs. Itā€™s surprising because the extrapolation of URL based on context shows it understood the context, but then instead of lfs which is prominent in this problem domain, it created something completely new: tfs

1 Like

What I found absolutely jaw dropping here is that you have an OCR engine you can interact with.

Donā€™t like the kingfisher hallucination ā€¦ fine ā€¦ just ask it not to do that thing.

Really mind bending what you can do with this tech:

(apologies for broken image in the OP, we are fixing it, the image is)

1 Like