Welcome to LMDeploy’s documentation!¶ You can switch between Chinese and English documents in the lower-left corner of the layout. Build Build from source Build in Docker (recommended) Build in localhost (optional) Chatting with PyTorch Pytorch Chat in command line Serving Serving a model Serving LLaMA-2 Serving LLaMA TurboMind Architecture of TurboMind High level overview of TurboMind Persistent Batch KV Cache Manager LLaMa implementation API Difference between FasterTransformer and TurboMind FAQ Switch Language English 简体中文 Indices and tables¶ Index Search Page