Qwen3-VL-8b 图像识别问题与 Gemma3-27b 混合图文内容

Ivan_Rapekas · 2025 年12 月 11 日 10:55

谁能澄清一下当前理解图像的逻辑吗？

我在 LM Studio 中使用 Qwen3-VL-8b，它使用与 OpenAI 兼容的 API。下面的提示说 Anthropic、Google 和 OpenAI 模型支持图像。Qwen 没戏，对吗？
Qwen3-VL-8b 当模型无法识别图片/文档时出现新的令人困惑的消息。

在 3.6.0.beta2 中：

无论在 vision enabled = true 还是 vision enabled = false 的情况下，AI 机器人都能正确处理图像识别请求，没有任何异常。

在 v2025.12.0-latest 中：新的选项 allowed attachments

现在当 vision enabled = true 时，对话框中返回一个错误：

{“error”:“Invalid ‘content’: ‘content’ objects must have a ‘type’ field that is either ‘text’ or ‘image_url’.”}

Gemma3-27b。关于识别混合文本+图像内容的一些想法。目前的响应只支持文本。当我要求模型提供带有分离图像的 PDF 的 OCR 层中的文本时，它返回

该 URL 处没有任何内容，模型生成了一个虚假的链接。

谢谢！

sam · 2025 年12 月 11 日 11:07

lmstudio 在完成或响应 API 中不支持 PDF。

据我所知，它只支持图像/文本。

Ivan_Rapekas · 2025 年12 月 12 日 07:33

感谢您的回复！我将将其标记为已解决，并在评论中说明它适用于 LM Studio 0.3.x。Studio 团队目前正在开发带有新 REST 的 0.4.0 版本。希望他们在回复中添加 PDF 支持。

话题		回复	浏览量
Ai plugin ocr support Feature ai	11	972	2024 年4 月 2 日
Gemini ai bot to draw picture in chat Support ai	3	208	2025 年3 月 14 日
Exploring blocking file upload while interacting with AI bot Feature ai , ai-bot	0	87	2026 年1 月 11 日
Introduce alt-text for images on chat Feature chat	0	370	2023 年2 月 22 日
How to solve discourse ai : No endpoints found that support tool use. To learn more about provider routing, Support ai	1	639	2025 年10 月 20 日