Skip to content

Conversation

@JoanFM
Copy link
Contributor

@JoanFM JoanFM commented May 13, 2024

Change the names to v2 version, so that new versions can easily be added later and differentiated.

@JoanFM JoanFM marked this pull request as ready for review May 13, 2024 07:41
@mofosyne mofosyne added refactoring Refactoring Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 13, 2024
@github-actions
Copy link
Contributor

github-actions bot commented May 13, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 557 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8385.99ms p(95)=20654.79ms fails=, finish reason: stop=499 truncated=58
  • Prompt processing (pp): avg=87.02tk/s p(95)=363.16tk/s
  • Token generation (tg): avg=46.11tk/s p(95)=49.18tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=refactor-jina-rename commit=fb83012096463c27c89a828036cee2c957a3a8e7

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 557 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715592586 --> 1715593214
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 793.32, 793.32, 793.32, 793.32, 793.32, 665.99, 665.99, 665.99, 665.99, 665.99, 659.02, 659.02, 659.02, 659.02, 659.02, 717.77, 717.77, 717.77, 717.77, 717.77, 781.65, 781.65, 781.65, 781.65, 781.65, 784.12, 784.12, 784.12, 784.12, 784.12, 799.89, 799.89, 799.89, 799.89, 799.89, 814.92, 814.92, 814.92, 814.92, 814.92, 830.88, 830.88, 830.88, 830.88, 830.88, 856.12, 856.12, 856.12, 856.12, 856.12, 889.67, 889.67, 889.67, 889.67, 889.67, 895.47, 895.47, 895.47, 895.47, 895.47, 859.22, 859.22, 859.22, 859.22, 859.22, 867.32, 867.32, 867.32, 867.32, 867.32, 869.52, 869.52, 869.52, 869.52, 869.52, 866.88, 866.88, 866.88, 866.88, 866.88, 864.55, 864.55, 864.55, 864.55, 864.55, 832.64, 832.64, 832.64, 832.64, 832.64, 832.73, 832.73, 832.73, 832.73, 832.73, 838.43, 838.43, 838.43, 838.43, 838.43, 838.56, 838.56, 838.56, 838.56, 838.56, 838.95, 838.95, 838.95, 838.95, 838.95, 840.47, 840.47, 840.47, 840.47, 840.47, 834.06, 834.06, 834.06, 834.06, 834.06, 834.95, 834.95, 834.95, 834.95, 834.95, 851.06, 851.06, 851.06, 851.06, 851.06, 849.87, 849.87, 849.87, 849.87, 849.87, 848.97, 848.97, 848.97, 848.97, 848.97, 850.72, 850.72, 850.72, 850.72, 850.72, 852.59, 852.59, 852.59, 852.59, 852.59, 851.83, 851.83, 851.83, 851.83, 851.83, 855.73, 855.73, 855.73, 855.73, 855.73, 864.13, 864.13, 864.13, 864.13, 864.13, 872.2, 872.2, 872.2, 872.2, 872.2, 871.39, 871.39, 871.39, 871.39, 871.39, 863.09, 863.09, 863.09, 863.09, 863.09, 859.11, 859.11, 859.11, 859.11, 859.11, 861.75, 861.75, 861.75, 861.75, 861.75, 864.8, 864.8, 864.8, 864.8, 864.8, 864.58, 864.58, 864.58, 864.58, 864.58, 829.52, 829.52, 829.52, 829.52, 829.52, 830.85, 830.85, 830.85, 830.85, 830.85, 829.54, 829.54, 829.54, 829.54, 829.54, 829.53, 829.53, 829.53, 829.53, 829.53, 834.55, 834.55, 834.55, 834.55, 834.55, 833.41, 833.41, 833.41, 833.41, 833.41, 834.28, 834.28, 834.28, 834.28, 834.28, 837.87, 837.87, 837.87, 837.87, 837.87, 838.46, 838.46, 838.46, 838.46, 838.46, 840.61, 840.61, 840.61, 840.61, 840.61, 843.67, 843.67, 843.67, 843.67, 843.67, 848.06, 848.06, 848.06, 848.06, 848.06, 849.34, 849.34, 849.34, 849.34, 849.34, 849.89, 849.89, 849.89, 849.89, 849.89, 849.35, 849.35, 849.35, 849.35, 849.35, 850.5, 850.5, 850.5, 850.5, 850.5, 851.28, 851.28, 851.28, 851.28, 851.28, 850.7, 850.7, 850.7, 850.7, 850.7, 852.99, 852.99, 852.99, 852.99, 852.99, 853.41, 853.41, 853.41, 853.41, 853.41]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 557 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715592586 --> 1715593214
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.22, 41.22, 41.22, 41.22, 41.22, 33.28, 33.28, 33.28, 33.28, 33.28, 27.02, 27.02, 27.02, 27.02, 27.02, 28.48, 28.48, 28.48, 28.48, 28.48, 29.54, 29.54, 29.54, 29.54, 29.54, 31.89, 31.89, 31.89, 31.89, 31.89, 33.05, 33.05, 33.05, 33.05, 33.05, 33.59, 33.59, 33.59, 33.59, 33.59, 34.01, 34.01, 34.01, 34.01, 34.01, 34.09, 34.09, 34.09, 34.09, 34.09, 34.29, 34.29, 34.29, 34.29, 34.29, 33.66, 33.66, 33.66, 33.66, 33.66, 33.62, 33.62, 33.62, 33.62, 33.62, 31.7, 31.7, 31.7, 31.7, 31.7, 31.2, 31.2, 31.2, 31.2, 31.2, 30.16, 30.16, 30.16, 30.16, 30.16, 30.29, 30.29, 30.29, 30.29, 30.29, 30.64, 30.64, 30.64, 30.64, 30.64, 30.36, 30.36, 30.36, 30.36, 30.36, 30.11, 30.11, 30.11, 30.11, 30.11, 29.99, 29.99, 29.99, 29.99, 29.99, 29.91, 29.91, 29.91, 29.91, 29.91, 30.04, 30.04, 30.04, 30.04, 30.04, 30.03, 30.03, 30.03, 30.03, 30.03, 30.3, 30.3, 30.3, 30.3, 30.3, 30.31, 30.31, 30.31, 30.31, 30.31, 30.34, 30.34, 30.34, 30.34, 30.34, 30.58, 30.58, 30.58, 30.58, 30.58, 30.72, 30.72, 30.72, 30.72, 30.72, 30.87, 30.87, 30.87, 30.87, 30.87, 30.94, 30.94, 30.94, 30.94, 30.94, 31.1, 31.1, 31.1, 31.1, 31.1, 31.24, 31.24, 31.24, 31.24, 31.24, 31.07, 31.07, 31.07, 31.07, 31.07, 30.86, 30.86, 30.86, 30.86, 30.86, 30.84, 30.84, 30.84, 30.84, 30.84, 30.31, 30.31, 30.31, 30.31, 30.31, 30.55, 30.55, 30.55, 30.55, 30.55, 30.66, 30.66, 30.66, 30.66, 30.66, 30.72, 30.72, 30.72, 30.72, 30.72, 30.87, 30.87, 30.87, 30.87, 30.87, 30.36, 30.36, 30.36, 30.36, 30.36, 30.07, 30.07, 30.07, 30.07, 30.07, 29.7, 29.7, 29.7, 29.7, 29.7, 29.64, 29.64, 29.64, 29.64, 29.64, 29.55, 29.55, 29.55, 29.55, 29.55, 29.6, 29.6, 29.6, 29.6, 29.6, 29.62, 29.62, 29.62, 29.62, 29.62, 29.7, 29.7, 29.7, 29.7, 29.7, 29.74, 29.74, 29.74, 29.74, 29.74, 29.76, 29.76, 29.76, 29.76, 29.76, 29.78, 29.78, 29.78, 29.78, 29.78, 29.63, 29.63, 29.63, 29.63, 29.63, 29.68, 29.68, 29.68, 29.68, 29.68, 29.86, 29.86, 29.86, 29.86, 29.86, 29.93, 29.93, 29.93, 29.93, 29.93, 30.02, 30.02, 30.02, 30.02, 30.02, 30.14, 30.14, 30.14, 30.14, 30.14, 30.17, 30.17, 30.17, 30.17, 30.17, 30.21, 30.21, 30.21, 30.21, 30.21]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 557 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715592586 --> 1715593214
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.26, 0.26, 0.26, 0.26, 0.26, 0.42, 0.42, 0.42, 0.42, 0.42, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.33, 0.33, 0.33, 0.33, 0.33, 0.21, 0.21, 0.21, 0.21, 0.21, 0.41, 0.41, 0.41, 0.41, 0.41, 0.38, 0.38, 0.38, 0.38, 0.38, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.28, 0.28, 0.28, 0.28, 0.28, 0.32, 0.32, 0.32, 0.32, 0.32, 0.22, 0.22, 0.22, 0.22, 0.22, 0.26, 0.26, 0.26, 0.26, 0.26, 0.23, 0.23, 0.23, 0.23, 0.23, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.3, 0.3, 0.3, 0.3, 0.3, 0.16, 0.16, 0.16, 0.16, 0.16, 0.09, 0.09, 0.09, 0.09, 0.09, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.34, 0.34, 0.34, 0.34, 0.34, 0.37, 0.37, 0.37, 0.37, 0.37, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.33, 0.33, 0.33, 0.33, 0.33, 0.45, 0.45, 0.45, 0.45, 0.45, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 557 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715592586 --> 1715593214
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0]
                    
Loading

@ggerganov ggerganov merged commit 9aa6724 into ggml-org:master May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Refactoring Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants