Skip to main content

Ctrl+K

Get Started

Installation
Quick Start
On Other Platforms
- Get Started with Huawei Ascend (Atlas 800T A2)

Models

Supported Models

Large Language Models(LLMs) Deployment

Offline Inference Pipeline
OpenAI Compatible Server
Tools Calling
Serving LoRA
WebUI Demo
Request Distributor Server

Vision-Language Models(VLMs) Deployment

Offline Inference Pipeline
OpenAI Compatible Server
Vision-Language Models
- LLaVA
- InternVL
- InternLM-XComposer-2.5
- CogVLM
- MiniCPM-V
- Phi-3 Vision
- Mllama

Quantization

AWQ/GPTQ
SmoothQuant
INT4/INT8 KV Cache

Benchmark

Profile Token Latency and Throughput
Profile Request Throughput
Profile API Server
Evaluate LLMs with OpenCompass

Advanced Guide

Architecture of TurboMind
Architecture of lmdeploy.pytorch
lmdeploy.pytorch New Model Support
Context length extrapolation
Customized chat template
How to debug Turbomind
Structured output

API Reference

inference pipeline

Repository
Show source
Suggest edit
Open issue

.rst

Vision-Language Models

Vision-Language Models#

Examples

LLaVA
InternVL
InternLM-XComposer-2.5
CogVLM
- Introduction
- Quick Start
MiniCPM-V
Phi-3 Vision
Mllama

previous

OpenAI Compatible Server

next

LLaVA

By LMDeploy Authors

© Copyright 2021-2024, OpenMMLab.

Last updated on Nov 07, 2024.