Skip to main content
Ctrl+K
lmdeploy - Home lmdeploy - Home

Get Started

  • Installation
  • Quick Start
  • On Other Platforms
    • Get Started with Huawei Ascend (Atlas 800T A2)

Models

  • Supported Models

Large Language Models(LLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Tools Calling
  • Serving LoRA
  • WebUI Demo
  • Request Distributor Server

Vision-Language Models(VLMs) Deployment

  • Offline Inference Pipeline
  • OpenAI Compatible Server
  • Vision-Language Models
    • LLaVA
    • InternVL
    • InternLM-XComposer-2.5
    • CogVLM
    • MiniCPM-V
    • Phi-3 Vision
    • Mllama

Quantization

  • AWQ/GPTQ
  • SmoothQuant
  • INT4/INT8 KV Cache

Benchmark

  • Profile Token Latency and Throughput
  • Profile Request Throughput
  • Profile API Server
  • Evaluate LLMs with OpenCompass

Advanced Guide

  • Architecture of TurboMind
  • Architecture of lmdeploy.pytorch
  • lmdeploy.pytorch New Model Support
  • Context length extrapolation
  • Customized chat template
  • How to debug Turbomind
  • Structured output

API Reference

  • inference pipeline
  • Repository
  • Show source
  • Suggest edit
  • Open issue
  • .rst

Vision-Language Models

Vision-Language Models#

Examples

  • LLaVA
  • InternVL
    • Installation
    • Offline inference
    • Online serving
  • InternLM-XComposer-2.5
    • Introduction
    • Quick Start
    • Lora Model
    • Quantization
    • More examples
  • CogVLM
    • Introduction
    • Quick Start
  • MiniCPM-V
    • Installation
    • Offline inference
    • Online serving
  • Phi-3 Vision
    • Introduction
    • Installation
    • Offline inference
    • Online serving
  • Mllama
    • Introduction
    • Installation
    • Offline inference
    • Online serving

previous

OpenAI Compatible Server

next

LLaVA

By LMDeploy Authors

© Copyright 2021-2024, OpenMMLab.

Last updated on Nov 07, 2024.