DeepSeek-VL2#
Introduction#
DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.
LMDeploy supports deepseek-vl2-tiny, deepseek-vl2-small and deepseek-vl2 in PyTorch engine.
Quick Start#
Install LMDeploy by following the installation guide.
Prepare#
When deploying the DeepSeek-VL2 model using LMDeploy, you must install the official GitHub repository and related 3-rd party libs. This is because LMDeploy reuses the image processing functions provided in the official repository.
pip install git+https://github.com/deepseek-ai/DeepSeek-VL2.git --no-deps
pip install attrdict timm 'transformers<4.48.0'
Worth noticing that it may fail with transformers>=4.48.0
, as known in this issue.
Offline inference pipeline#
The following sample code shows the basic usage of VLM pipeline. For more examples, please refer to VLM Offline Inference Pipeline.
To construct valid DeepSeek-VL2 prompts with image inputs, users should insert <IMAGE_TOKEN>
manually.
from lmdeploy import pipeline
from lmdeploy.vl import load_image
if __name__ == "__main__":
pipe = pipeline('deepseek-ai/deepseek-vl2-tiny')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('<IMAGE_TOKEN>describe this image', image))
print(response)