Skip to main content
Ctrl+K
lmdeploy - Home

Get Started

  • Installation
  • Quick Start
  • On Other Platforms
    • Get Started with Huawei Ascend
    • MetaX-tech
    • Cambricon

Models

  • Supported Models
  • Reward Models

Large Language Models(LLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Tools Calling
  • Reasoning Outputs
  • Serving LoRA
  • Request Distributor Server

Vision-Language Models(VLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Vision-Language Models
    • DeepSeek-VL2
    • LLaVA
    • InternVL
    • InternLM-XComposer-2.5
    • CogVLM
    • MiniCPM-V
    • Phi-3 Vision
    • Qwen2-VL
    • Qwen2.5-VL
    • Molmo
    • Gemma3

Quantization

  • AWQ/GPTQ
  • SmoothQuant
  • INT4/INT8 KV Cache
  • llm-compressor Support

Benchmark

  • Benchmark
  • Model Evaluation Guide
  • Multi-Modal Model Evaluation Guide

Advanced Guide

  • Architecture of TurboMind
  • Architecture of lmdeploy.pytorch
  • lmdeploy.pytorch New Model Support
  • Context length extrapolation
  • Customized chat template
  • How to debug Turbomind
  • Structured output
  • PyTorchEngine Multi-Node Deployment Guide
  • PyTorchEngine Profiling
  • Production Metrics
  • Context Parallel
  • Speculative Decoding
  • Update Weights

API Reference

  • Inference pipeline
  • OpenAPI Endpoints
  • Command-line Tools
  • Repository
  • Open issue

Index

_ | C | G | I | P | S | T

_

  • __init__() (lmdeploy.Pipeline method)

C

  • chat() (lmdeploy.Pipeline method)
  • ChatTemplateConfig (class in lmdeploy)

G

  • GenerationConfig (class in lmdeploy)
  • get_ppl() (lmdeploy.Pipeline method)

I

  • infer() (lmdeploy.Pipeline method)

P

  • Pipeline (class in lmdeploy)
  • pipeline() (in module lmdeploy)
  • PytorchEngineConfig (class in lmdeploy)

S

  • stream_infer() (lmdeploy.Pipeline method)

T

  • TurbomindEngineConfig (class in lmdeploy)

By LMDeploy Authors

© Copyright 2021-2024, OpenMMLab.

Last updated on Feb 04, 2026.