GitHub
InternLM
HomePage
GitHub
Twitter

Table of Contents

v0.2.3

Get Started

Get Started

Benchmark

Static Inference Performance Test Method
Request Throughput Test Method
API Server Performance Test Method
Triton Inference Server Performance Test Method
Evaluate LLMs with OpenCompass

Supported Models

Supported Models

Inference

Inference Pipeline
Architecture of TurboMind
TurboMind Config
Architecture of lmdeploy.pytorch

serving

Restful API
Request Distributor Server

Quantization

INT4 Weight-only Quantization and Deployment (W4A16)
KV Cache Quantization and Test Results
W8A8 LLM Model Deployment

Advanced Guide

How to support new model in lmdeploy.pytorch
Context length extrapolation
LMDeploy-QoS Introduce and Usage

Docs >
Index
以中文阅读

Shortcuts

Index

© Copyright 2021-2024, OpenMMLab. Revision 2831dc24.

Built with Sphinx using a theme provided by Read the Docs.

GitHub