Skip to content

Feature Request: Support for ERNIE-4.5-VL #171

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#15512

Original Author: @Som-anon
Original Issue Number: ggml-org#15512
Created: 2025-08-22T19:17:24Z


Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I would like https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT multimodal support.

Motivation

I couldn't find any discussion or issue for this, but that's the best open source model I could find for OCRing hand written Japanese and Chinese text that actually kind of works.
It's worse than OpenAIs recognition, but in my 3 test images I use to evaluate OCR capabilities of open source models it performed ok (ok is better than everything else I tested)
It's better than

  • gemma
  • qwen
  • intern
  • lfm2
  • kimi
    (I think I tested mimo, but I can't find my setup or results... so maybe mimo is ok too?)
    ...
    and every other open model I could find.

Possible Implementation

https://ernie.baidu.com/blog/publication/ERNIE_Technical_Report.pdf
https://github.com/bigdavidone/ERNIE4_5
vllm-project/vllm#20220

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions