How did SGLang HiCache with Mooncake Backend calculate cache hit ratio? #11672
              
                Unanswered
              
          
                  
                    
                      YummuyWang
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
In SGLang, the cache hit ratio calculate formula:
I think it only calculates cache hit ratio for GPU.
The blog SGLang HiCache with Mooncake Backend Benchmark shows the cache rate change.
But
sglang/benchmark/hicache/bench_multirun.pyalso use get cached_tokens from meta_info.Is that means SGLang already support collect cache_hit_rate with data in CPU memory and SSD?
Beta Was this translation helpful? Give feedback.
All reactions