# 支持的模型

## TurboMind 支持的模型

|         模型          |   模型规模   | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-------------------: | :----------: | :-------: | :-----: | :-----: | :---: |
|         Llama         |   7B - 65B   |    Yes    |   Yes   |   Yes   |  Yes  |
|        Llama2         |   7B - 70B   |    Yes    |   Yes   |   Yes   |  Yes  |
|        Llama3         |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
|       Llama3.1        |   8B, 70B    |    Yes    |   Yes   |   Yes   |  Yes  |
|       InternLM        |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
|       InternLM2       |   7B - 20B   |    Yes    |   Yes   |   Yes   |  Yes  |
|      InternLM2.5      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|  InternLM-XComposer   |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|  InternLM-XComposer2  | 7B, 4khd-7B  |    Yes    |   Yes   |   Yes   |  Yes  |
| InternLM-XComposer2.5 |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|         Qwen          |  1.8B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
|        Qwen1.5        | 1.8B - 110B  |    Yes    |   Yes   |   Yes   |  Yes  |
|         Qwen2         |  1.5B - 72B  |    Yes    |   Yes   |   Yes   |  Yes  |
|        Mistral        |      7B      |    Yes    |   Yes   |   Yes   |  No   |
|        Qwen-VL        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|      DeepSeek-VL      |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|       Baichuan        |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|       Baichuan2       |      7B      |    Yes    |   Yes   |   Yes   |  Yes  |
|      Code Llama       |   7B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
|          YI           |   6B - 34B   |    Yes    |   Yes   |   Yes   |  No   |
|    LLaVA(1.5,1.6)     |   7B - 34B   |    Yes    |   Yes   |   Yes   |  Yes  |
|     InternVL-Chat     |  v1.1- v1.5  |    Yes    |   Yes   |   Yes   |  Yes  |
|       InternVL2       |    2B-76B    |    Yes    |   Yes   |   Yes   |  Yes  |
|        MiniCPM        | Llama3-V-2_5 |    Yes    |   Yes   |   Yes   |  Yes  |
|    MiniGeminiLlama    |      7B      |    Yes    |   No    |   No    |  Yes  |
|         GLM4          |      9B      |    Yes    |   Yes   |   Yes   |  No   |
|       CodeGeeX4       |      9B      |    Yes    |   Yes   |   Yes   |  No   |

“-” 表示还没有验证。

```{note}
turbomind 引擎不支持 window attention。所以，对于应用了 window attention，并开启了对应的开关"use_sliding_window"的模型，比如 Mistral、Qwen1.5 等，在推理时，请选择 pytorch engine
```

### PyTorch 支持的模型

|        模型         |  模型规模   | FP16/BF16 | KV INT8 | W8A8 |
| :-----------------: | :---------: | :-------: | :-----: | :--: |
|        Llama        |  7B - 65B   |    Yes    |   No    | Yes  |
|       Llama2        |  7B - 70B   |    Yes    |   No    | Yes  |
|       Llama3        |   8B, 70B   |    Yes    |   No    | Yes  |
|      Llama3.1       |   8B, 70B   |    Yes    |   No    |  -   |
|      InternLM       |  7B - 20B   |    Yes    |   No    | Yes  |
|      InternLM2      |  7B - 20B   |    Yes    |   No    |  -   |
|     InternLM2.5     |     7B      |    Yes    |   No    |  -   |
|      Baichuan2      |  7B - 13B   |    Yes    |   No    | Yes  |
|      ChatGLM2       |     6B      |    Yes    |   No    |  No  |
|       Falcon        |  7B - 180B  |    Yes    |   No    |  No  |
|         YI          |  6B - 34B   |    Yes    |   No    |  No  |
|       Mistral       |     7B      |    Yes    |   No    |  No  |
|       Mixtral       |    8x7B     |    Yes    |   No    |  No  |
|        Qwen         | 1.8B - 72B  |    Yes    |   No    |  No  |
|       Qwen1.5       | 0.5B - 110B |    Yes    |   No    |  No  |
|        Qwen2        | 0.5B - 72B  |    Yes    |   No    |  No  |
|     Qwen1.5-MoE     |    A2.7B    |    Yes    |   No    |  No  |
|    DeepSeek-MoE     |     16B     |    Yes    |   No    |  No  |
|     DeepSeek-V2     |  16B, 236B  |    Yes    |   No    |  No  |
|        Gemma        |    2B-7B    |    Yes    |   No    |  No  |
|        Dbrx         |    132B     |    Yes    |   No    |  No  |
|     StarCoder2      |   3B-15B    |    Yes    |   No    |  No  |
|     Phi-3-mini      |    3.8B     |    Yes    |   No    |  No  |
|    Phi-3-vision     |    4.2B     |    Yes    |   No    |  No  |
|     CogVLM-Chat     |     17B     |    Yes    |   No    |  No  |
|    CogVLM2-Chat     |     19B     |    Yes    |   No    |  No  |
|   LLaVA(1.5,1.6)    |   7B-34B    |    Yes    |   No    |  No  |
| InternVL-Chat(v1.5) |   2B-26B    |    Yes    |   No    |  No  |
|      InternVL2      |   1B-40B    |    Yes    |   No    |  No  |
|       Gemma2        |   9B-27B    |    Yes    |   No    |  No  |
|        GLM4         |     9B      |    Yes    |   No    |  No  |
|      CodeGeeX4      |     9B      |    Yes    |   No    |  No  |
