Skip to main content
Ctrl+K
lmdeploy - Home

Get Started

  • Installation
  • Quick Start
  • On Other Platforms
    • Get Started with Huawei Ascend (Atlas 800T A2 & Atlas 300I Duo)

Models

  • Supported Models

Large Language Models(LLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Tools Calling
  • Reasoning Outputs
  • Serving LoRA
  • WebUI Demo
  • Request Distributor Server

Vision-Language Models(VLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Vision-Language Models
    • DeepSeek-VL2
    • LLaVA
    • InternVL
    • InternLM-XComposer-2.5
    • CogVLM
    • MiniCPM-V
    • Phi-3 Vision
    • Mllama
    • Qwen2-VL
    • Qwen2.5-VL
    • Molmo
    • Gemma3

Quantization

  • AWQ/GPTQ
  • SmoothQuant
  • INT4/INT8 KV Cache

Benchmark

  • Benchmark
  • Evaluate LLMs with OpenCompass

Advanced Guide

  • Architecture of TurboMind
  • Architecture of lmdeploy.pytorch
  • lmdeploy.pytorch New Model Support
  • Context length extrapolation
  • Customized chat template
  • How to debug Turbomind
  • Structured output
  • PyTorchEngine Multi-Node Deployment Guide
  • PyTorchEngine Profiling

API Reference

  • inference pipeline
  • Repository
  • Open issue

Index

C | G | P | S | T

C

  • ChatTemplateConfig (class in lmdeploy)
  • client() (in module lmdeploy)

G

  • GenerationConfig (class in lmdeploy)

P

  • pipeline() (in module lmdeploy)
  • PytorchEngineConfig (class in lmdeploy)

S

  • serve() (in module lmdeploy)

T

  • TurbomindEngineConfig (class in lmdeploy)

By LMDeploy Authors

© Copyright 2021-2024, OpenMMLab.

Last updated on Jun 17, 2025.