How did SGLang HiCache with Mooncake Backend calculate cache hit ratio？ #11672

YummuyWang · 2025-10-15T11:31:47Z

YummuyWang
Oct 15, 2025

In SGLang, the cache hit ratio calculate formula:

"cache_hit_rate": (
                    0
                    if sum(self.performance_metrics["prompt_len"]) == 0
                    else sum(self.performance_metrics["cached_tokens"])
                    / sum(self.performance_metrics["prompt_len"])
                ),

I think it only calculates cache hit ratio for GPU.
The blog SGLang HiCache with Mooncake Backend Benchmark shows the cache rate change.

But sglang/benchmark/hicache/bench_multirun.py also use get cached_tokens from meta_info.

cached_tokens = (data.get("meta_info") or {}).get(
                                        "cached_tokens", 0
                                    )

Is that means SGLang already support collect cache_hit_rate with data in CPU memory and SSD?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How did SGLang HiCache with Mooncake Backend calculate cache hit ratio？ #11672

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How did SGLang HiCache with Mooncake Backend calculate cache hit ratio？ #11672

Uh oh!

YummuyWang Oct 15, 2025

Replies: 0 comments

YummuyWang
Oct 15, 2025