Welcome to LMDeploy’s tutorials!¶ Get Started Get Started Installation Offline batch inference Serving Quantization Useful Tools Benchmark Static Inference Performance Test Method Request Throughput Test Method API Server Performance Test Method Triton Inference Server Performance Test Method Evaluate LLMs with OpenCompass Supported Models Supported Models Inference Inference Pipeline Architecture of TurboMind TurboMind Config Architecture of lmdeploy.pytorch serving Restful API Request Distributor Server Quantization INT4 Weight-only Quantization and Deployment (W4A16) KV Cache Quantization and Test Results W8A8 LLM Model Deployment Advanced Guide How to support new model in lmdeploy.pytorch Context length extrapolation LMDeploy-QoS Introduce and Usage Indices and tables¶ Index Search Page