GitHub
InternLM
HomePage
GitHub
Twitter

Table of Contents

v0.2.4

Get Started

Get Started

Benchmark

Profile Token Latency and Throughput
Profile Request Throughput
Profile API Server
Profile Triton Inference Server
Evaluate LLMs with OpenCompass

Supported Models

Supported Models

Inference

Inference Pipeline
Architecture of TurboMind
TurboMind Config
Architecture of lmdeploy.pytorch

serving

Restful API
Request Distributor Server
Steps to create a huggingface online demo

Quantization

INT4 Weight-only Quantization and Deployment (W4A16)
KV Cache Quantization and Test Results
W8A8 LLM Model Deployment

Advanced Guide

How to support new model in lmdeploy.pytorch
Context length extrapolation
LMDeploy-QoS Introduce and Usage

Docs >
Index
以中文阅读

Shortcuts

Index

© Copyright 2021-2024, OpenMMLab. Revision 24ea5dc5.

Built with Sphinx using a theme provided by Read the Docs.

GitHub