January 9, 2025

Install and Run Official Microsoft’s Phi4 LLM Locally in Python on Windows

In this tutorial, we explain how to install and run Microsoft’s Phi 4 LLM locally in Python. The YouTube tutorial is given below.

Why Phi 4? The race and the fierce competition in the field of LLMs are producing better and better models. Phi 4 is Microsoft’s Newest Small LLM specializing in complex reasoning. It “only” has 14B parameters, and as such, it can be executed  locally on “lower-end” hardware.

Note that in our previous tutorial given below, we explained how to run an unofficial release of Phi4 locally by using the Ollama framework.

Install and run Official Phi4 Locally in Python

The first step is to install Microsoft Visual Studio C++. Go to the website

https://visualstudio.microsoft.com/vs/features/cplusplus

and download and install Microsoft Visual Studio C++. Then, download and install CUDA Toolkit

https://developer.nvidia.com/cuda-toolkit

Then, make sure that the developer tools are activated in Windows 11. To do that, click on Start and search for “Use developer features”

Then, activate the developer tools by clicking on “Developer Mode” as shown in the figure below.

Then, the next step is to create a workspace folder and a Python virtual environment. To do that, open a Command Prompt in the administrator mode, and type

cd\
mkdir phi4
cd phi4
python -m venv env1
env1\Scripts\activate.bat

as the result the Python virtual environment will be created and activated

Then, we need to install the necessary libraries:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install huggingface-hub
pip install transformers
pip install accelerate

First, we install PyTorch, then we install huggingface-hub, transformers and accelerate libraries and packages. Note that the pip command for installing PyTorch (the first pip command) is generated by going to the official PyTorch website:

https://pytorch.org/get-started/locally

and by using the selection table to generate the command. The next step is to download the model files from the official Huggingface website:

https://huggingface.co/microsoft/phi-4/tree/main

To do that, open your favorite Python editor, and type and execute the following code

from huggingface_hub import snapshot_download

snapshot_download(repo_id="microsoft/phi-4",
                  local_dir="C:\\phi4")

This code will download all the files and folders from the remote repository and store it in the local directory. While executing this Python file, you have to make sure that the file is executed by using the Python interpreter from the previously defined virtual environment.

Once everything is downloaded, create a new Python script that will test the Phi4 model. The script is given below.

import transformers
model_id="C:\\phi4"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": "auto"},
    device_map="cuda",
)

messages = [
    {"role": "system", "content": "You are a funny teacher trying to make lectures as interesting as possible and you give real-life examples"},
    {"role": "user", "content": "How to explain gravity to high-school students?"},
]

outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])

This code will create a model pipeline, load the model, define the prompt and the question, run the model and print the output. The answer is given below.

{'role': 'assistant', 'content': 'Alright, class, gather around! Today, we\'re diving into the mysterious and mind-bending world of gravity. Now, I know what you\'re thinking: "Gravity? Isn\'t that just why we don\'t float away into space?" Well, yes, but there\'s so much more to it! Let\'s break it down with some real-life examples that\'ll make your heads spin—figuratively, of course, because gravity keeps them attached to your bodies!\n\n### 1. **Gravity: The Cosmic Glue**\n\nImagine you\'re at a party, and there\'s this one person who\'s so magnetic that everyone just gravitates'}