I did a lot of experimenting recently with various platforms that provide chaining together steps, and I’ll share one that got me excited: Optical Character Recognition to detect text in an image, and appending it to the Discourse post!
My sandbox topic shows how it works: Testing OCR calls - sandbox - notes
I did the first one on Pipedream, and I will explain how I did it by following along the steps. There is a beta feature for sharing workflows on that service, and the following is shared at https://pipedream.com/new?h=tch_3Z6fa9.
While this would make for an in-depth guide to using Pipedream with Discourse, I’m going to stay light on details, as I’ve already moved on from that platform, and will be sharing more about that later.
Steps
The rough outline is: send image URL to Google Cloud Vision API, run the results through ChatGPT, then append the results to the post in Discourse.
Trigger
This provides a webhook to send data to. In Discourse I created a webhook with two specific settings:
Only firing on Post Events means my initial topic won’t trigger the process; this is useful to me so I can plan to use topics as holders for applying external functions (I call them “functional notes”).
Only firing on that tag means I can use tags to control which webhooks are produced per topics; generally I’d only have one “functions” tag, to keep the process logic simple.
The webhook sends a payload with lots of info, and we’ll use the topic and post IDs later in the process.
End based on condition
This is a step to check if an edit reason is included. If so, it stops the workflow.
In the last step I update the post and include an Edit Reason, and this check ensures that I don’t keep updating the post…
One of reasons I stopped using Pipedream was because my webhook checks were eating up credits on the service. I don’t think I should have to pay to conditionally process a webhook, hence moving on…
Extract the image URL
I decided for this test each post would have a single image uploaded to it. This step checks the “cooked” value and uses the following regular expression to grab the URL:
/https?:\/\/[^\s"]+/
Google Cloud Vision API call
This is a custom code step on Pipedream. The pre-made components didn’t do want I wanted, and the service also has a code assistant that can write code from a prompt; as these API calls are straightforward, it was easy to produce with this method.
It takes the value of the former step ({{steps.extract_by_regular_expression.$return_value[0].match}}
), and this is the code:
import { axios } from "@pipedream/platform";
export default defineComponent({
props: {
imageUrl: {
type: "string",
label: "Image URL",
description: "URL of the image to be processed by Google Vision API",
},
apiKey: {
type: "string",
label: "API Key",
description: "Your Google Cloud API Key",
secret: true,
},
},
async run() {
const url = `https://vision.googleapis.com/v1/images:annotate?key=${this.apiKey}`;
const body = {
requests: [
{
image: {
source: {
imageUri: this.imageUrl,
},
},
features: [
{
type: "TEXT_DETECTION",
},
],
},
],
};
const config = {
method: "POST",
url,
data: body,
};
const response = await axios(this, config);
return response;
},
});
ChatGPT for editing
Takes output from the former step ({{steps.google_cloud.$return_value.responses[0].fullTextAnnotation.text}}
) and passes it as the User message. For the system message I have:
You are reading the output from a vision api that detected text in an image. Review the message and copyedit it for clarity. Return only the edited text without commentary.
Append to the post in Discourse
Another custom code section, as the pre-made Discourse actions in Pipedream only cover a couple of scenarios (create a topic or a post), and I want to append the text to the post.
First, here’s the code:
import { axios } from "@pipedream/platform";
export default defineComponent({
props: {
discourse: {
type: "app",
app: "discourse",
},
postId: {
type: "string",
label: "Post ID",
description: "The ID of the post to append text to",
},
text: {
type: "string",
label: "Text",
description: "The text to append to the post",
},
editReason: {
type: "string",
label: "Edit Reason",
description: "The reason for editing the post",
optional: true,
},
},
async run({ steps, $ }) {
const url = `https://${this.discourse.$auth.domain}/posts/${this.postId}.json`;
const response = await axios($, {
method: "GET",
url: url,
headers: {
"Api-Username": `${this.discourse.$auth.api_username}`,
"Api-Key": `${this.discourse.$auth.api_key}`,
},
});
const updatedText = `${response.raw} ${this.text}`;
return await axios($, {
method: "PUT",
url: url,
headers: {
"Api-Username": `${this.discourse.$auth.api_username}`,
"Api-Key": `${this.discourse.$auth.api_key}`,
},
data: {
post: {
raw: updatedText,
edit_reason: this.editReason,
},
},
});
},
});
Those properties for step are filled in like so:
Post ID
Grabs the post ID from the original payload: {{steps.trigger.event.body.post.id}}
This is used to edit that post directly.
Text
---
<blockquote>
{{steps.chat.$return_value.generated_message.content}}
</blockquote>
[details="Detected text"]
{{steps.google_cloud.$return_value.responses[0].textAnnotations[0].description}}
[/details]
Basically, I want to add a horizontal rule beneath each image, with a blockquote of edited text, and the details to check the raw output.
Since each post will have one image in it, this works very easily. I wonder how to do it with multiple images at once?
Edit Reason
OCR Text Detection
This is added as the edit reason for the post update, which will prevent a post update loop due to the step at the beginning.
I find it super helpful to always include an edit reason, especially when dealing with external services.
And that’s it! As you can see from my sandbox, it works fairly well!
I have a trip coming up, and I plan on fine-tuning the OpenAI editing to also translate to English if needed, that is one option I would have added to the system prompt for this workflow.