1.5 KiB
Benchy-Graph
Benchy-Graph is a tool that generates performance dashboards from llama-benchy CSV benchmark data. It visualizes key metrics like throughput, latency, and performance across different phases and concurrency levels for language model inference.
Generating the CSV File
To generate the required CSV file, use llama-benchy, a benchmarking tool for llama.cpp servers.
Example command to generate the CSV:
uvx llama-benchy --base-url http://127.0.0.1:8000/v1 --model Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --concurrency 1 2 4 8 16 32 --pp 128 --tg 128 --format csv
This will produce a CSV file with benchmark results that can be used as input for Benchy-Graph.
Running the App
To generate a performance dashboard image from a CSV file:
- Ensure dependencies are installed:
pip install -r requirement.txt - Run the script:
python app.py <input.csv> <output.png>
Replace <input.csv> with the path to your llama-benchy CSV file and <output.png> with the desired output image path.
Running the Notebook
For an interactive experience, open notebook.ipynb in Jupyter Notebook or JupyterLab and execute the cells. The notebook contains all the necessary code and explanations for generating visualizations.
Example output
This example has been made with mixa3607/ML-gfx906 custom vLLM docker image
