GitHub
InternLM
主页
 GitHub
推特

Table of Contents

v0.2.0

快速上手

快速上手

测试基准

静态推理性能测试方法
请求吞吐量测试方法
api_server 性能测试
Triton Inference Server 性能测试方法

模型列表

支持的模型

推理

推理 pipeline
TurboMind 框架
TurboMind 配置
lmdeploy.pytorch 架构

服务

Restful API
请求分发服务器

量化

INT4 模型量化和部署
KV Cache 量化和测试结果
W8A8 LLM 模型部署

进阶指南

lmdeploy.pytorch 新模型支持
长文本外推
LMDeploy-QoS 介绍与用法

Docs >
索引
Read in English

Shortcuts

索引

© Copyright 2021-2024, OpenMMLab. Revision b319dce6.

Built with Sphinx using a theme provided by Read the Docs.

GitHub