Skip to main content
Ctrl+K
lmdeploy - Home lmdeploy - Home

Get Started

  • Installation
  • Quick Start
  • On Other Platforms
    • Get Started with Huawei Ascend (Atlas 800T A2)

Models

  • Supported Models

Large Language Models(LLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Tools Calling
  • Serving LoRA
  • WebUI Demo
  • Request Distributor Server

Vision-Language Models(VLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Vision-Language Models
    • LLaVA
    • InternVL
    • InternLM-XComposer-2.5
    • CogVLM
    • MiniCPM-V
    • Phi-3 Vision
    • Mllama
    • Qwen2-VL
    • Molmo

Quantization

  • AWQ/GPTQ
  • SmoothQuant
  • INT4/INT8 KV Cache

Benchmark

  • Profile Token Latency and Throughput
  • Profile Request Throughput
  • Profile API Server
  • Evaluate LLMs with OpenCompass

Advanced Guide

  • Architecture of TurboMind
  • Architecture of lmdeploy.pytorch
  • lmdeploy.pytorch New Model Support
  • Context length extrapolation
  • Customized chat template
  • How to debug Turbomind
  • Structured output

API Reference

  • inference pipeline
  • Repository
  • Open issue

Index

C | G | P | S | T

C

  • ChatTemplateConfig (class in lmdeploy)
  • client() (in module lmdeploy)

G

  • GenerationConfig (class in lmdeploy)

P

  • pipeline() (in module lmdeploy)
  • PytorchEngineConfig (class in lmdeploy)

S

  • serve() (in module lmdeploy)

T

  • TurbomindEngineConfig (class in lmdeploy)

By LMDeploy Authors

© Copyright 2021-2024, OpenMMLab.

Last updated on Dec 30, 2024.