PyTorchEngine Multi-Node Deployment Guide#
To support larger-scale model deployment requirements, PyTorchEngine provides multi-node deployment support. Below are the detailed steps for deploying a tp=16
model across two 8-GPU nodes.
1. Create Docker Containers (Optional)#
To ensure consistency across the cluster environment, it is recommended to use Docker to set up the cluster. Create containers on each node as follows:
docker run -it \
--network host \
-v $MODEL_PATH:$CONTAINER_MODEL_PATH \
openmmlab/lmdeploy:latest
[!IMPORTANT] Ensure that the model is placed in the same directory on all node containers.
2. Set Up the Cluster Using Ray#
2.1 Start the Head Node#
Select one node as the head node and run the following command in its container:
ray start --head --port=$DRIVER_PORT
2.2 Join the Cluster#
On the other nodes, use the following command in their containers to join the cluster created by the head node:
ray start --address=$DRIVER_NODE_ADDR:$DRIVER_PORT
run ray status
on head node to check the cluster.
[!IMPORTANT] Ensure that
DRIVER_NODE_ADDR
is the address of the head node andDRIVER_PORT
matches the port number used during the head node initialization.
3. Use LMDeploy Interfaces#
In the head node’s container, you can use all functionalities of PyTorchEngine as usual.
3.1 Start the Server#
lmdeploy serve api_server \
$CONTAINER_MODEL_PATH \
--backend pytorch \
--tp 16
3.2 Use the Pipeline#
from lmdeploy import pipeline, PytorchEngineConfig
if __name__ == '__main__':
model_path = '/path/to/model'
backend_config = PytorchEngineConfig(tp=16)
with pipeline(model_path, backend_config=backend_config) as pipe:
outputs = pipe('Hakuna Matata')
[!NOTE] PyTorchEngine will automatically choose the appropriate launch method (single-node/multi-node) based on the
tp
parameter and the number of devices available in the cluster. If you want to enforce the use of the Ray cluster, you can configuredistributed_executor_backend='ray'
inPytorchEngineConfig
or use the environment variableLMDEPLOY_EXECUTOR_BACKEND=ray
.
By following the steps above, you can successfully deploy PyTorchEngine in a multi-node environment and leverage the Ray cluster for distributed computing.
[!WARNING] To achieve better performance, we recommend users to configure a higher-quality network environment (such as using InfiniBand) to improve engine efficiency.