支持的模型#
TurboMind 支持的模型#
模型 |
模型规模 |
FP16/BF16 |
KV INT8 |
KV INT4 |
W4A16 |
|---|---|---|---|---|---|
Llama |
7B - 65B |
Yes |
Yes |
Yes |
Yes |
Llama2 |
7B - 70B |
Yes |
Yes |
Yes |
Yes |
Llama3 |
8B, 70B |
Yes |
Yes |
Yes |
Yes |
Llama3.1 |
8B, 70B |
Yes |
Yes |
Yes |
Yes |
InternLM |
7B - 20B |
Yes |
Yes |
Yes |
Yes |
InternLM2 |
7B - 20B |
Yes |
Yes |
Yes |
Yes |
InternLM2.5 |
7B |
Yes |
Yes |
Yes |
Yes |
InternLM-XComposer |
7B |
Yes |
Yes |
Yes |
Yes |
InternLM-XComposer2 |
7B, 4khd-7B |
Yes |
Yes |
Yes |
Yes |
InternLM-XComposer2.5 |
7B |
Yes |
Yes |
Yes |
Yes |
Qwen |
1.8B - 72B |
Yes |
Yes |
Yes |
Yes |
Qwen1.5 |
1.8B - 110B |
Yes |
Yes |
Yes |
Yes |
Qwen2 |
1.5B - 72B |
Yes |
Yes |
Yes |
Yes |
Mistral |
7B |
Yes |
Yes |
Yes |
No |
Qwen-VL |
7B |
Yes |
Yes |
Yes |
Yes |
DeepSeek-VL |
7B |
Yes |
Yes |
Yes |
Yes |
Baichuan |
7B |
Yes |
Yes |
Yes |
Yes |
Baichuan2 |
7B |
Yes |
Yes |
Yes |
Yes |
Code Llama |
7B - 34B |
Yes |
Yes |
Yes |
No |
YI |
6B - 34B |
Yes |
Yes |
Yes |
No |
LLaVA(1.5,1.6) |
7B - 34B |
Yes |
Yes |
Yes |
Yes |
InternVL-Chat |
v1.1- v1.5 |
Yes |
Yes |
Yes |
Yes |
InternVL2 |
2B-76B |
Yes |
Yes |
Yes |
Yes |
MiniCPM |
Llama3-V-2_5 |
Yes |
Yes |
Yes |
Yes |
MiniGeminiLlama |
7B |
Yes |
No |
No |
Yes |
GLM4 |
9B |
Yes |
Yes |
Yes |
No |
CodeGeeX4 |
9B |
Yes |
Yes |
Yes |
No |
“-” 表示还没有验证。
备注
turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关”use_sliding_window”的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
PyTorch 支持的模型#
模型 |
模型规模 |
FP16/BF16 |
KV INT8 |
W8A8 |
|---|---|---|---|---|
Llama |
7B - 65B |
Yes |
No |
Yes |
Llama2 |
7B - 70B |
Yes |
No |
Yes |
Llama3 |
8B, 70B |
Yes |
No |
Yes |
Llama3.1 |
8B, 70B |
Yes |
No |
- |
InternLM |
7B - 20B |
Yes |
No |
Yes |
InternLM2 |
7B - 20B |
Yes |
No |
- |
InternLM2.5 |
7B |
Yes |
No |
- |
Baichuan2 |
7B - 13B |
Yes |
No |
Yes |
ChatGLM2 |
6B |
Yes |
No |
No |
Falcon |
7B - 180B |
Yes |
No |
No |
YI |
6B - 34B |
Yes |
No |
No |
Mistral |
7B |
Yes |
No |
No |
Mixtral |
8x7B |
Yes |
No |
No |
Qwen |
1.8B - 72B |
Yes |
No |
No |
Qwen1.5 |
0.5B - 110B |
Yes |
No |
No |
Qwen2 |
0.5B - 72B |
Yes |
No |
No |
Qwen1.5-MoE |
A2.7B |
Yes |
No |
No |
DeepSeek-MoE |
16B |
Yes |
No |
No |
DeepSeek-V2 |
16B, 236B |
Yes |
No |
No |
Gemma |
2B-7B |
Yes |
No |
No |
Dbrx |
132B |
Yes |
No |
No |
StarCoder2 |
3B-15B |
Yes |
No |
No |
Phi-3-mini |
3.8B |
Yes |
No |
No |
Phi-3-vision |
4.2B |
Yes |
No |
No |
CogVLM-Chat |
17B |
Yes |
No |
No |
CogVLM2-Chat |
19B |
Yes |
No |
No |
LLaVA(1.5,1.6) |
7B-34B |
Yes |
No |
No |
InternVL-Chat(v1.5) |
2B-26B |
Yes |
No |
No |
InternVL2 |
1B-40B |
Yes |
No |
No |
Gemma2 |
9B-27B |
Yes |
No |
No |
GLM4 |
9B |
Yes |
No |
No |
CodeGeeX4 |
9B |
Yes |
No |
No |