Get Started#
LMDeploy offers functionalities such as model quantization, offline batch inference, online serving, etc. Each function can be completed with just a few simple lines of code or commands.
Installation#
Install lmdeploy with pip (python 3.8+) or from source
pip install lmdeploy
The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:
export LMDEPLOY_VERSION=0.5.2
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
Offline batch inference#
import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm2_5-7b-chat")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)
For more information on inference pipeline parameters, please refer to here.
Serving#
LMDeploy offers various serving methods, choosing one that best meet your requirements.
Quantization#
LMDeploy provides the following quantization methods. Please visit the following links for the detailed guide
Useful Tools#
LMDeploy CLI offers the following utilities, helping users experience LLM features conveniently
Inference with Command line Interface#
lmdeploy chat internlm/internlm2_5-7b-chat
Serving with Web UI#
LMDeploy adopts gradio to develop the online demo.
# install dependencies
pip install lmdeploy[serve]
# launch gradio server
lmdeploy serve gradio internlm/internlm2_5-7b-chat