January 10, 2025

How to Install and Run DeepSeek-V3 Model Locally on GPU or CPU

In this tutorial, we explain how to install and run a (quantized) version of DeepSeek-V3 on a local computer by using the llama.cpp program. DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model that according to the developers of DeepSeek-V3 outperforms other LLMs, such as ChatGPT and Llama. The YouTube tutorial is given below.

– We will install and run a quantized version of DeepSeek-V3 on a local computer.

Prerequisites:

– 200 GB of disk space for the smallest model and more than 400 GB disk space for the larger models.

– Significant amount of RAM memory. In our case, we have 48 GB of RAM memory and the model inference is relatively slow. Probably the inference speed can be improved by adding more RAM memory.

– Decent GPU. We performed tests on NVIDIA 3090 GPU with 24 GB VRAM. Better GPU will definitely increase the inference speed. After some tests we realized that the GPU resources are not used fully. This can be improved by building the llama.cpp from the source. This will be explored in the future tutorials

How to Install and Run DeepSeek-V3 Locally

The first step is to install the C++ compiler. For that purpose go the the Microsoft Visual Studio C++ website, and download Microsoft Visual Studio C++ Community Edition:

https://visualstudio.microsoft.com/vs/features/cplusplus

The second step is to install the CUDA toolkit:

https://developer.nvidia.com/cuda-toolkit

The next step is download and install llama.cpp. To download llama.cpp, go to this website

https://github.com/ggerganov/llama.cpp/releases

and download the file

The file should be in the form llama-***-bin-win-cuda-cu***-x64.zip. Download this file, and unzip it. Then, copy all the downloaded files to a new folder. In our case, the name of the new folder is test12

Then, download the DeepSeek model files. For that purpose go the Huggingface website

https://huggingface.co/unsloth/DeepSeek-V3-GGUF

and click on the desired model, and download all the 5 model files. We have selected the model. In our case, we select the model Q2_K_XS and download the files

While downloading all 5 files, make sure to save them in the folder in which llama.cpp files are extracted. In our case, the name of the folder is test12

After the model is downloaded, we can run the model. Open a Command Prompt and navigate to the folder in which llama.cpp and model files are saved. In that folder type the following:

llama-cli --model DeepSeek-V3-Q2_K_XS-00001-of-00005.gguf --cache-type-k q5_0 --threads 16 --prompt "<|User|>How to solve a quadratic equation?<|Assistant|>"

Here, the question with proper formatting is

"<|User|>How to solve a quadratic equation?<|Assistant|>"

This should run the model.