feat: extra image understanding capability for non-vision model #9583

EurFelux · 2025-08-26T18:13:26Z

What this PR does

由于在 #9572 以后， OCR 将跨平台内置可用，可以开始支持为非视觉模型提供图片 OCR 能力进行图片信息提取；同时也支持额外使用视觉模型来进行图片理解。

Fixes #9405
Fixes #9442
Fixes #8394

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Breaking changes

If this PR introduces breaking changes, please describe the changes and the impact on users.

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Documentation: A user-guide update was considered and is present (link) or not required. You want a user-guide update if it's a user facing feature.

Release note

在设置中添加图像处理方法选项(OCR或视觉模型) 新增默认视觉模型配置并重构模型配置结构添加视觉模型选择器到模型设置页面实现图像处理方法的状态管理和迁移逻辑

在图像处理方法中添加'off'选项，允许用户关闭图像处理功能。同时更新相关类型定义、默认设置和迁移逻辑。

将state.translateModel改为state.visionModel以正确设置视觉模型

在ModelSettings组件中，为视觉模型添加了默认值处理逻辑，使用useMemo缓存计算结果以提高性能

sdbabd · 2025-08-27T20:39:51Z

你将要完整准确地复述图片中包含的所有信息。请严格遵循以下处理流程：

首先接收图片内容输入：
<图片内容>
{{IMAGE_CONTENT}}
</图片内容>

处理时需遵守以下规则：

完整保留所有文字元素，包括但不限于：

正文文本
标注文字
图表/图示中的文字说明
手写体文字（需明确标注为手写体）

视觉元素按以下方式描述：

图像：用[图像描述：...]标注，包含主体、动作、场景三要素
表格：转换为文本格式并保留行列结构
图表：描述类型、坐标轴、数据趋势和关键数值

保持原始信息顺序：

严格遵循原内容的布局顺序（从左到右，从上到下）
使用段落分隔不同内容区块
保留原始文本中的重点标注（如加粗/斜体）

文字处理标准：

原样保留所有标点符号和特殊字符
模糊文字用[疑似为：...]标注
不可识别文字用[无法辨识字块]标注

请在<图片>标签中输出结果，按以下结构组织内容：
<图片>
[文字内容部分]
（空行分隔不同段落）
[图像描述部分]
（空行分隔不同视觉元素）
</图片>

特殊处理说明：

当遇到排版元素（如分栏、文本框）时，使用「◆」符号标记区域起始
数学公式按原文逐字符转写，符号间保留空格
多语言内容需标注语种，例如[英文原文：...]

最后执行完整性检查：

对比输入输出字符数差异不超过±5%
确保所有数字/专有名词完全一致
验证视觉元素描述包含必要细节

</输出格式要求>

如果是视觉模型最好能够设置提示词或者使用以上提示词效果会更好

EurFelux added 10 commits August 27, 2025 02:03

feat(settings): 添加图像处理方法和视觉模型支持

67d9662

在设置中添加图像处理方法选项(OCR或视觉模型) 新增默认视觉模型配置并重构模型配置结构添加视觉模型选择器到模型设置页面实现图像处理方法的状态管理和迁移逻辑

feat(i18n): 添加图片处理和视觉模型的国际化文本

6588b35

style(settings): 替换视觉模型设置中的语言图标为眼睛图标

f40a177

feat(设置): 添加图像处理关闭选项

0ef80a3

在图像处理方法中添加'off'选项，允许用户关闭图像处理功能。同时更新相关类型定义、默认设置和迁移逻辑。

feat(i18n): 为多语言文件添加"off"翻译

d4dafe7

test(api): 添加对visionModel的测试支持

77d6ffd

fix(llm): 修正setVisionModel中错误的state赋值

2f41921

将state.translateModel改为state.visionModel以正确设置视觉模型

test(clientCompatibilityTypes): 添加DEFAULT_MODELS的mock数据

8ef3b0b

feat(模型设置): 添加视觉模型的默认值处理逻辑

c62f84f

在ModelSettings组件中，为视觉模型添加了默认值处理逻辑，使用useMemo缓存计算结果以提高性能

feat(types): 在ImageMessageBlock接口中添加processedResult字段

3aae77c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extra image understanding capability for non-vision model #9583

feat: extra image understanding capability for non-vision model #9583

EurFelux commented Aug 26, 2025 •

edited

Loading

Uh oh!

sdbabd commented Aug 27, 2025

Uh oh!

Uh oh!

feat: extra image understanding capability for non-vision model #9583

Are you sure you want to change the base?

feat: extra image understanding capability for non-vision model #9583

Conversation

EurFelux commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why we need it and why it was done in this way

Breaking changes

Special notes for your reviewer

Checklist

Release note

Uh oh!

sdbabd commented Aug 27, 2025

Uh oh!

Uh oh!

EurFelux commented Aug 26, 2025 •

edited

Loading