El script "Triage posts using AI" del plugin "Automation" siempre incluye datos de imagen en la solicitud

Priority/Severity:

Medium

Platform:

Discourse b66fca70d0e3d12ef930398289fac5269cd240c7

Description:

The official Automation” plugin has a “Triage posts using AI” script. This script provides the content of a post to the LLM, then performs configurable triage actions depending on the result.

In order to keep the LLM usage within the credit allocation provided by the Discourse hosting plan (when using the CDCK hosted LLM) or within budget when using a 3rd party LLM provider, it is important to manage the number of tokens required for each query. The “Triage posts using AI” script provides a “Max Post Tokens” setting that allows setting a limit on how much of the post content will be included in the request.

:bug: If the post has attached images to a post the massive quantity of image data is always included in the request, even when this results in exceeding the post token limit set for the automation via the “Max Post Tokens” setting.

This occurs even when the “Vision enabled” option is disabled in the configuration of the AI persona used by the automation.

This results in uncontrollable token expenditure.

In many use cases, it is only the text data of the post that is of interest, in which case the image data blowing up the token usage is not even of value.

Reproducible steps:

  1. If you don’t already have a suitable custom AI persona, create one.
  2. Disable the “Vision enabled” option in the settings of the persona.
  3. If it isn’t already, enable the built-in “Automation” plugin.
  4. Navigate to the plugin’s “Automations” page (/admin/plugins/automation/automation).
  5. Click the “+ Add automation” button.
    The “Select a script” page will open.
  6. Select the “Triage posts using AI” script.
    The automation configuration page will open.
  7. Configure the automation as required for a simple test. It is not necessary that the automation result in the performance of any actions, only that it generates LLM queries.
  8. Select the AI persona you created in step 1 from the Script options > Persona menu in the automation configuration.
  9. Set the Script options > Max Post Tokens setting to 20 in the automation configuration.
  10. Click the “Update automation” button at the bottom of the automation configuration page.
    You will be returned to the “Automations” page.
  11. Set the “Enable automation” switch for the newly created automation to the “on” position.
  12. Compose a post or DM that will trigger the automation. Add an image to the post.
  13. Publish the post or message.
  14. Navigate to the administration page of the “Data Explorer” plugin (/admin/plugins/explorer).
  15. Click the “+” button on the page.
    A “Query name…” field will appear.
  16. Type Get AI Audit Logs in the “Query name…” field.
  17. Click the “+ Create New” button.
    A new Data Explorer query will be created.
  18. Click the “Edit” button.
    The query code editing field will open.
  19. Replace the contents of the placeholder query code with the following:
    SELECT request_tokens, raw_request_payload
    FROM ai_api_audit_logs
    WHERE feature_name = 'llm_triage'
    
  20. Click the “Run” button.
    :bug: The request_tokens field value shown in the Data Explorer query result for the request the automation generated after being triggered by your post greatly exceeds the number that would be expected to be used by the “System prompt” from the AI persona configuration and the maximum 20 tokens of post text context limit imposed by the automation configuration.
  21. Click the ●●● button to the right of the value of the raw_request_payload field in the Data Explorer query result.
    A dialog will open that shows the full value of the field.

:bug: The data of the image attached to the post was included via a messages[*].image_url key in the request data.

Example:

{
  "model": "Qwen/Qwen3-VL-30B-A3B-Instruct",
  "max_tokens": 16384,
  "messages": [
    {
      "role": "system",
      "content": "You will be given a piece of text, and your task is to detect the locale of the text. Your response must be an IETF language tag code."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "title: Test of the automation\nThis is some English language text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": ""
          }
        }
      ]
    }
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Additional context:

In addition to the excessive token usage, I have been encountering poor accuracy in the results from a trivial text analysis task (detecting the human language of the post). The occurrences have all been for posts which have attached images, and the fault does not occur if I manually provide the same post text to the same AI persona via a DM. This makes me suspect the image data included in the request may be the cause of the incorrect results from the automation.


I am experiencing the fault on the forum.arduino.cc forum.