Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 1.32 KB

File metadata and controls

34 lines (25 loc) · 1.32 KB

How to Run Distributed Llama on 🧠 GPU

Distributed Llama can run on GPU devices using Vulkan API. This article describes how to build and run the project on GPU.

Before you start here, please check how to build and run Distributed Llama on CPU:

To run on GPU, please follow these steps:

  1. Install Vulkan SDK for your platform.
  1. Build Distributed Llama with GPU support:
DLLAMA_VULKAN=1 make dllama
DLLAMA_VULKAN=1 make dllama-api
  1. Now dllama and dllama-api binaries supports arguments related to GPU usage.
--gpu-index <index>   Use GPU device with given index (use `0` for first device)
  1. You can run the root node or worker node on GPU by specifying the --gpu-index argument. Vulkan backend requires single thread, so you should also set --nthreads 1.
./dllama inference ... --nthreads 1 --gpu-index 0 
./dllama chat      ... --nthreads 1 --gpu-index 0 
./dllama worker    ... --nthreads 1 --gpu-index 0 
./dllama-api       ... --nthreads 1 --gpu-index 0