GitHub
InternLM
HomePage
GitHub
Twitter
Table of Contents
v0.2.3
Get Started
Get Started
Benchmark
Static Inference Performance Test Method
Request Throughput Test Method
API Server Performance Test Method
Triton Inference Server Performance Test Method
Evaluate LLMs with OpenCompass
Supported Models
Supported Models
Inference
Inference Pipeline
Architecture of TurboMind
TurboMind Config
Architecture of lmdeploy.pytorch
serving
Restful API
Request Distributor Server
Quantization
INT4 Weight-only Quantization and Deployment (W4A16)
KV Cache Quantization and Test Results
W8A8 LLM Model Deployment
Advanced Guide
How to support new model in lmdeploy.pytorch
Context length extrapolation
LMDeploy-QoS Introduce and Usage
Docs
>
Index
以中文阅读
Shortcuts
Index
GitHub