GitHub
InternLM
主页
 GitHub
推特

Table of Contents

v0.2.4

快速上手

快速上手

测试基准

静态推理性能测试
请求吞吐量性能测试
api_server 性能测试
Triton Inference Server 性能测试
如何使用OpenCompass测评LLMs

模型列表

支持的模型

推理

推理 pipeline
TurboMind 框架
TurboMind 配置
lmdeploy.pytorch 架构

服务

Restful API
请求分发服务器
从 LMDeploy 创建一个 huggingface 的在线 demo

量化

INT4 模型量化和部署
KV Cache 量化和测试结果
W8A8 LLM 模型部署

进阶指南

lmdeploy.pytorch 新模型支持
长文本外推
LMDeploy-QoS 介绍与用法

Docs >
索引
Read in English

Shortcuts

索引

© Copyright 2021-2024, OpenMMLab. Revision 24ea5dc5.

Built with Sphinx using a theme provided by Read the Docs.

GitHub