Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

horheynm
Copy link

@horheynm horheynm commented Jan 5, 2024

Description

Tested two entrypoints for deepsparse.benchmark. One used internal KV and other used external. Goal is to always use internal KV.

  1. Python
from deepsparse.benchmark.benchmark_model import benchmark_model

stub = "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"
results = benchmark_model(stub)
print(results)
  1. CLI
deepsparse.benchmark "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"

Results

  1. Python
{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":"None",
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":0.7033625366578768,
      "seconds_ran":15.6391610680148,
      "iterations":11,
      "median":576.397096272558,
      "mean":1421.7223456044767,
      "std":2497.9850669397615,
      "25.0%":493.93453216180205,
      "50.0%":576.397096272558,
      "75.0%":676.7404270358384,
      "90.0%":1781.627886928618,
      "95.0%":5504.294607555494,
      "99.0%":8482.427984056996,
      "99.9%":9152.507993769847
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}
  1. CLI
{
   "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False",
   "version":"1.7.0.20240104",
   "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80",
   "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx",
   "batch_size":1,
   "input_shapes":null,
   "num_cores":32,
   "scenario":"singlestream",
   "scheduler":"Scheduler.default",
   "seconds_to_run":10,
   "num_streams":1,
   "benchmark_result":{
      "scenario":"singlestream",
      "items_per_sec":1.1406279406751316,
      "seconds_ran":19.287621506955475,
      "iterations":22,
      "median":353.46097755245864,
      "mean":876.6876120670614,
      "std":2260.886819025657,
      "25.0%":286.9633190566674,
      "50.0%":353.46097755245864,
      "75.0%":506.17970793973655,
      "90.0%":652.1910438779744,
      "95.0%":800.4174952395259,
      "99.0%":9027.320535853496,
      "99.9%":10993.794929189637
   },
   "fraction_of_supported_ops":1.0,
   "sequence_length":2048,
   "input_ids_length":1
}

@horheynm horheynm marked this pull request as ready for review January 5, 2024 16:50
bfineran
bfineran previously approved these changes Jan 5, 2024
rahul-tuli
rahul-tuli previously approved these changes Jan 5, 2024
@horheynm horheynm dismissed stale reviews from rahul-tuli and bfineran via a797562 January 5, 2024 17:10
@horheynm horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from f68fba7 to a797562 Compare January 5, 2024 17:10
@horheynm horheynm force-pushed the bug-benchmark-int-ext-inconsistent-values branch from fabc28d to 93222fe Compare January 5, 2024 17:50
@bfineran bfineran merged commit f2530e3 into main Jan 5, 2024
@bfineran bfineran deleted the bug-benchmark-int-ext-inconsistent-values branch January 5, 2024 22:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants