Welcome to LMDeploy's tutorials! ==================================== .. figure:: ./_static/image/lmdeploy-logo.svg :width: 50% :align: center :alt: LMDeploy :class: no-scaled-link .. raw:: html

LMDeploy is a toolkit for compressing, deploying, and serving LLM.

LMDeploy has the following core features: * **Efficient Inference**: LMDeploy delivers up to 1.8x higher request throughput than vLLM, by introducing key features like persistent batch(a.k.a. continuous batching), blocked KV cache, dynamic split&fuse, tensor parallelism, high-performance CUDA kernels and so on. * **Effective Quantization**: LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation. * **Effortless Distribution Server**: Leveraging the request distribution service, LMDeploy facilitates an easy and efficient deployment of multi-model services across multiple machines and cards. * **Excellent Compatibility**: LMDeploy supports `KV Cache Quant `_, `AWQ `_ and `Automatic Prefix Caching `_ to be used simultaneously. Documentation ------------- .. _get_started: .. toctree:: :maxdepth: 1 :caption: Get Started get_started/installation.md get_started/get_started.md get_started/index.rst .. _supported_models: .. toctree:: :maxdepth: 1 :caption: Models supported_models/supported_models.md supported_models/reward_models.md .. _llm_deployment: .. toctree:: :maxdepth: 1 :caption: Large Language Models(LLMs) Deployment llm/pipeline.md llm/api_server.md llm/api_server_tools.md llm/api_server_reasoning.md llm/api_server_anthropic.md llm/api_server_lora.md llm/proxy_server.md .. _vlm_deployment: .. toctree:: :maxdepth: 1 :caption: Vision-Language Models(VLMs) Deployment multi_modal/vl_pipeline.md multi_modal/api_server_vl.md multi_modal/index.rst .. _quantization: .. toctree:: :maxdepth: 1 :caption: Quantization quantization/w4a16.md quantization/w8a8.md quantization/kv_quant.md quantization/llm_compressor.md .. _benchmark: .. toctree:: :maxdepth: 1 :caption: Benchmark benchmark/benchmark.md benchmark/evaluate_with_opencompass.md benchmark/evaluate_with_vlmevalkit.md .. toctree:: :maxdepth: 1 :caption: Advanced Guide inference/turbomind.md inference/pytorch.md advance/pytorch_new_model.md advance/long_context.md advance/chat_template.md advance/debug_turbomind.md advance/structed_output.md advance/pytorch_multinodes.md advance/pytorch_profiling.md advance/metrics.md advance/context_parallel.md advance/spec_decoding.md advance/update_weights.md .. toctree:: :maxdepth: 1 :caption: API Reference api/pipeline.rst api/openapi.rst api/cli.rst Indices and tables ================== * :ref:`genindex` * :ref:`search` * :ref:`routingtable`