GitHub
InternLM
HomePage
GitHub
Twitter
Table of Contents
v0.2.4
Get Started
Get Started
Benchmark
Profile Token Latency and Throughput
Profile Request Throughput
Profile API Server
Profile Triton Inference Server
Evaluate LLMs with OpenCompass
Supported Models
Supported Models
Inference
Inference Pipeline
Architecture of TurboMind
TurboMind Config
Architecture of lmdeploy.pytorch
serving
Restful API
Request Distributor Server
Steps to create a huggingface online demo
Quantization
INT4 Weight-only Quantization and Deployment (W4A16)
KV Cache Quantization and Test Results
W8A8 LLM Model Deployment
Advanced Guide
How to support new model in lmdeploy.pytorch
Context length extrapolation
LMDeploy-QoS Introduce and Usage
Docs
>
Index
以中文阅读
Shortcuts
Index
GitHub