Shortcuts

Profile Request Throughput

In the applications, the length of the user’s input prompt and the size of generated tokens are dynamic. The static inference performance is insufficient to reflect the inference engine’s ability to handle the dynamic characteristics.

Therefore, it is necessary to use real dialogue data to evaluate the dynamic inference capabilities of the inference engine. This article will introduce how to test the dynamic inference performance of LMDeploy on localhost.

The profiling script is profile_throughput.py. Before running it, please install the lmdeploy precompiled package, download the profiling script and the test dataset:

pip install lmdeploy
git clone --depth=1 https://github.com/InternLM/lmdeploy
cd lmdeploy/benchmark
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

Metrics

LMDeploy records the performance metrics like first token latency, token throughput (tokens/s) and request throughput (RPM)

first_token_latency is only reported in the case of streaming inference.

The formula for calculating token throughput is:

$$ TokenThroughput = Number\ of\ generated\ tokens/TotalTime $$

And the formula for calculating request throughput is:

$$ RPM(request\ per\ minute) = Number\ of\ prompts/TotalTime * 60 $$

Total time includes prefill time.

Profile

In this section, we take internlm/internlm-7b as an example to show how to profile the inference engines of LMDeploy.

Profile turbomind engine

python3 profile_throughput.py ./ShareGPT_V3_unfiltered_cleaned_split.json internlm/internlm-7b

Profile pytorch engine

python3 profile_throughput.py ./ShareGPT_V3_unfiltered_cleaned_split.json internlm/internlm-7b  --backend pytorch

For detailed argument specification of profile_throughput.py, such as request concurrency, sampling parameters, k/v cache memory percentage an so on, please run the help command python3 profile_throughput.py -h.

Read the Docs v: latest
Versions
latest
stable
v0.4.1
v0.4.0
v0.3.0
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.0
v0.1.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.