支持的模型#

TurboMind 支持的模型#

模型

模型规模

FP16/BF16

KV INT8

KV INT4

W4A16

Llama

7B - 65B

Yes

Yes

Yes

Yes

Llama2

7B - 70B

Yes

Yes

Yes

Yes

Llama3

8B, 70B

Yes

Yes

Yes

Yes

Llama3.1

8B, 70B

Yes

Yes

Yes

Yes

InternLM

7B - 20B

Yes

Yes

Yes

Yes

InternLM2

7B - 20B

Yes

Yes

Yes

Yes

InternLM2.5

7B

Yes

Yes

Yes

Yes

InternLM-XComposer

7B

Yes

Yes

Yes

Yes

InternLM-XComposer2

7B, 4khd-7B

Yes

Yes

Yes

Yes

InternLM-XComposer2.5

7B

Yes

Yes

Yes

Yes

Qwen

1.8B - 72B

Yes

Yes

Yes

Yes

Qwen1.5

1.8B - 110B

Yes

Yes

Yes

Yes

Qwen2

1.5B - 72B

Yes

Yes

Yes

Yes

Mistral

7B

Yes

Yes

Yes

No

Qwen-VL

7B

Yes

Yes

Yes

Yes

DeepSeek-VL

7B

Yes

Yes

Yes

Yes

Baichuan

7B

Yes

Yes

Yes

Yes

Baichuan2

7B

Yes

Yes

Yes

Yes

Code Llama

7B - 34B

Yes

Yes

Yes

No

YI

6B - 34B

Yes

Yes

Yes

No

LLaVA(1.5,1.6)

7B - 34B

Yes

Yes

Yes

Yes

InternVL-Chat

v1.1- v1.5

Yes

Yes

Yes

Yes

InternVL2

2B-76B

Yes

Yes

Yes

Yes

MiniCPM

Llama3-V-2_5

Yes

Yes

Yes

Yes

MiniGeminiLlama

7B

Yes

No

No

Yes

GLM4

9B

Yes

Yes

Yes

No

CodeGeeX4

9B

Yes

Yes

Yes

No

“-” 表示还没有验证。

备注

turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关”use_sliding_window”的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine

PyTorch 支持的模型#

模型

模型规模

FP16/BF16

KV INT8

W8A8

Llama

7B - 65B

Yes

No

Yes

Llama2

7B - 70B

Yes

No

Yes

Llama3

8B, 70B

Yes

No

Yes

Llama3.1

8B, 70B

Yes

No

-

InternLM

7B - 20B

Yes

No

Yes

InternLM2

7B - 20B

Yes

No

-

InternLM2.5

7B

Yes

No

-

Baichuan2

7B - 13B

Yes

No

Yes

ChatGLM2

6B

Yes

No

No

Falcon

7B - 180B

Yes

No

No

YI

6B - 34B

Yes

No

No

Mistral

7B

Yes

No

No

Mixtral

8x7B

Yes

No

No

Qwen

1.8B - 72B

Yes

No

No

Qwen1.5

0.5B - 110B

Yes

No

No

Qwen2

0.5B - 72B

Yes

No

No

Qwen1.5-MoE

A2.7B

Yes

No

No

DeepSeek-MoE

16B

Yes

No

No

DeepSeek-V2

16B, 236B

Yes

No

No

Gemma

2B-7B

Yes

No

No

Dbrx

132B

Yes

No

No

StarCoder2

3B-15B

Yes

No

No

Phi-3-mini

3.8B

Yes

No

No

Phi-3-vision

4.2B

Yes

No

No

CogVLM-Chat

17B

Yes

No

No

CogVLM2-Chat

19B

Yes

No

No

LLaVA(1.5,1.6)

7B-34B

Yes

No

No

InternVL-Chat(v1.5)

2B-26B

Yes

No

No

InternVL2

1B-40B

Yes

No

No

Gemma2

9B-27B

Yes

No

No

GLM4

9B

Yes

No

No

CodeGeeX4

9B

Yes

No

No