– In this tutorial, we explain how to install and run Llama 3.3 70B Large Language Model (LLM) locally on Linux Ubuntu. To install Llama 3.3 we will use Ollama. Ollama is one of the most simplest command-line tools and frameworks for running LLMs locally. Furthermore, it is simple to install Ollama, and we can run different LLMs from the command line.
- Background information: Llama 3.3 is a very powerful LLM that can be executed on a local computer with “modest” hardware. The performance of this model is similar to the Llama 3.1 LLM which has 405B parameters. Llama 3.3 is one of the most powerful LLM that can be executed on a local computer that does not have an expensive GPU. The benefits of running LLMs locally are: privacy, low-cost (only electricity), easy integration in your application, and complete control of LLM behavior.
- Prerequisites: We were able to run Llama 3.3 on a computer with NVIDIA 3090 GPU, 64 GB RAM, and Intel i9 processor. The inference speed is not fast. However, this can be improved by using a more powerful GPU, such as 4090 or 5090. You will need 40-50 GB of disk space to download the model.
The YouTube tutorial is given below.
Installation Instructions
The first step is to install Ollama. First, we have to make sure that our computer allows for inbound connections on port 11434. To do that, open a Linux Ubuntu terminal and type
sudo ufw allow 11434/tcp
Then, install curl
udo apt update && sudo apt upgrade
sudo apt install curl
curl --version
Then, to install Ollama, type this:
curl -fsSL https://ollama.com/install.sh | sh
Once we have installed Ollama, we can verify that Ollama is running by opening a web browser, and in the address bar of the web browser, we need to type
localhost:11434
If Ollama is installed, you should see the message “Ollama is running”. Then, to download the model, type this
ollama pull llama3.3
After the model is downloaded, we can run the model. To run the model, type this
ollama run llama3.3