October 4, 2024

How to install and Run Llama 3.2 1B and 3B Large Language Models (LLMs) on Raspberry Pi 5 and Linux Ubuntu

In this Large Language Model (LLM) and machine learning tutorial, we explain how to run Llama 3.2 1B and 3B LLMs on Raspberry Pi in Linux Ubuntu. In this tutorial, we use Raspberry Pi 5. We also created a tutorial on how to run Llama 3.2 models on Raspberry Pi 4. Raspberry 5 is much faster than Raspberry Pi 4, and consequently, we suggest to everyone to use Raspberry Pi 5. The YouTube tutorial is given below.

Llama 3.2 1B and 3B are the newest models in the Llama family of large-language models. They are highly compressed and optimized models and their purpose is to run on edge devices with limited memory and computational resources. On the other hand, Raspberry Pi 5 can also be seen as an edge device (in some sense).

Our Raspberry Pi 5 computational platform is shown below

We have Raspberry Pi 5 with 8GB RAM. We have installed an NVMe SSD with a capacity of around 500GB. By installing an NVMe SSD, we have significantly increased the read and write speed of our Raspberry Pi 5. We have created a separate video tutorial on how to install NVMe SSD in Raspberry Pi. Besides an NVMe SSD, you can also use an external USB SSD. Our suggestion is to avoid micro-SD cards since they are really slow. A tutorial on how to install an NVMe SSD is given here:

Before we start with explanations, we need to emphasize the following:

In this tutorial we are using Linux Ubuntu 24.04. However, you can also use any other supported version of Linux Ubuntu. We created a separate video tutorial explaining how to install Linux Ubuntu 24.04 on Raspberry Pi:

Install Ollama and Llama 3.2 1B and 3B models on Raspberry Pi 5

First, make sure that your system is up-to-date. For that purpose, open a terminal and type

sudo apt update && sudo apt upgrade

Next, in parallel to the current terminal, open another terminal and that terminal open a program called top. In the terminal type

top

This program is used to monitor the computational resources.

We will install Llama 3.2 by first installing Ollama. Ollama is a framework and an API for running different large language models. It provides an easy to use interface to run different models. It also has a Python library that is very simple to use. First, we will install Ollama and then we will install Llama 3.2 models.

We have to make sure that our computer allows for inbound connections on port 11434. To do that, open a terminal and type

sudo ufw allow 11434/tcp

To install Ollama in Linux Ubuntu, open a terminal and type

curl -fsSL https://ollama.com/install.sh | sh

It will take some time to download Ollama and to install it. To verify the installation, first open a web browser, and type this:

127.0.0.1:11434

You should see a message saying that Ollama is running. If you do not see this message, then, you need to manually start Ollama. You can do it by opening a terminal and by typing

ollama serve

This should start Ollama if Ollama was not running. Let us now open a terminal and type:

ollama

and

ollama list

If you see the response, it means that the Ollama can be executed from a terminal window. The next step is to download the Ollama models.

To download the 1B and 3B models, in the terminal type this:

ollama pull llama3.2:1b
ollama pull llama3.2:3b

It is going to take 5-10 minutes to download the models. After the models are downloaded, we can execute them by typing this (to run 1B model)

ollama run llama3.2:1b

or this (to execute 3B) model

ollama run llama3.2:3b

After that, you will see a model prompt and you can start asking questions. To exit a model, you need to type this

/bye

To list all the models on your computer, you need to type this

ollama list