Replies: 2 comments
-
Oh someone has already opened an issue about this, too: #15518 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Quickly skimming the paper, few impressions:
Could be missing something, but does not feel very relevant for local/edge use cases where we usually don't have the resources to generate ensembles. But either way, I don't think it requires any changes in |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, first let me clarify that I am no expert on this subject so I apologize in advance if I get something wrong or accidently wasted your time.
I just saw a post on Twitter by an AI researcher from Meta making these claims:
"Introducing DeepConf: Deep Think with Confidence
🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens.
It also delivers many strong advantages for parallel thinking:
🔥 Performance boost: ~10% accuracy across models & datasets
⚡ Ultra-efficient: Up to 85% fewer tokens generated
🔧 Plug & play: Works with ANY existing model - zero training needed (no hyperparameter tuning as well!)
⭐ Easy to deploy: Just ~50 lines of code in vLLM (see PR below)"
Here is the link to it: https://x.com/jiawzhao/status/1958982524333678877
To me it sounds pretty much too good to be true, but maybe someone who understands the llamacpp project better could take a look at it? I would be up to testing an implementation or contribute in any other way that I can because unfortunately I do not have C++ programming experience.
Beta Was this translation helpful? Give feedback.
All reactions