Integrate LMDeploy /v1/messages with Claude Code#

Claude Code can connect to an Anthropic-compatible gateway by setting ANTHROPIC_BASE_URL. LMDeploy exposes an Anthropic-compatible POST /v1/messages endpoint, so Claude Code can route requests to a local or remote LMDeploy api_server.

The request path is:

Claude Code -> http://<server>:<port>/v1/messages -> LMDeploy api_server

1. Start LMDeploy#

Launch an LMDeploy API server with the model you want Claude Code to use:

lmdeploy serve api_server Qwen/Qwen3.5-35B-A3B --backend pytorch --server-port 23333

For tool calling, launch the server with a tool parser supported by your model:

lmdeploy serve api_server <model> \
  --server-port 23333 \
  --tool-call-parser <parser-name>

2. Verify the Messages Endpoint#

Before configuring Claude Code, check that LMDeploy responds to Anthropic-style requests:

curl http://127.0.0.1:23333/v1/messages \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen3.5-35B-A3B",
    "max_tokens": 128,
    "messages": [
      {"role": "user", "content": "Say hello from LMDeploy"}
    ]
  }'

The model value must match a model name exposed by your LMDeploy server.

3. Configure Claude Code#

Set ANTHROPIC_BASE_URL to the server root, not to /v1. Claude Code appends /v1/messages itself. Add the LMDeploy gateway configuration to ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://127.0.0.1:23333",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "ANTHROPIC_MODEL": "Qwen/Qwen3.5-35B-A3B",
    "ANTHROPIC_CUSTOM_MODEL_OPTION": "Qwen/Qwen3.5-35B-A3B",
    "ANTHROPIC_CUSTOM_MODEL_OPTION_NAME": "LMDeploy local model",
    "ANTHROPIC_CUSTOM_MODEL_OPTION_DESCRIPTION": "Served by LMDeploy /v1/messages",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "Qwen/Qwen3.5-35B-A3B",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "Qwen/Qwen3.5-35B-A3B",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "Qwen/Qwen3.5-35B-A3B",
    "CLAUDE_CODE_SUBAGENT_MODEL": "Qwen/Qwen3.5-35B-A3B",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  }
}

The model name must exactly match the name exposed by LMDeploy.

Then start Claude Code:

claude --model Qwen/Qwen3.5-35B-A3B

4. Streaming Behavior#

Claude Code commonly uses streaming. LMDeploy’s /v1/messages endpoint supports stream=true and returns Anthropic-style server-sent events:

message_start
content_block_start
content_block_delta
content_block_stop
message_delta
message_stop

Streaming supports text deltas, reasoning deltas when parser configuration enables them, and tool input JSON deltas when tool calling is enabled.

5. Model Discovery Note#

Claude Code may query /v1/models when ANTHROPIC_BASE_URL points to a gateway. LMDeploy’s Anthropic model list endpoint is currently documented as GET /anthropic/v1/models. If Claude Code does not discover your model automatically, use ANTHROPIC_MODEL and ANTHROPIC_CUSTOM_MODEL_OPTION as shown above.