Ollama AI as a Network Service // Public Notes

Enabling Seamless Integration with Ollama as a Network Service

Developed by the same team behind the innovative llama models, Ollama offers a unique advantage: compatibility with LocalAI API in its later versions. This means that Ollama can be seamlessly integrated into various LLM (Large Language Model) integration applications, such as Nextcloud Assistant or the Continue plugin for VSCode and IntelliJ.

For my own installation process, I opted for a Docker-based approach. Not only does this provide enhanced security by running as root in an isolated environment, but it also simplifies future updates and changes without affecting the host OS.

However, to fully utilize Docker and leverage the power of GPU acceleration (in this case, nVidia), the Docker installation must be specifically configured for direct GPU usage. This enhancement is crucial for optimizing performance with LLM models offloaded onto the GPU.

Installation of Nvidia Container Toolkit

Source Link: Nvidia Container Toolkit Installation

Installation with Apt

If you are using Debian or Ubuntu-based systems, you can use the package manager apt to install the Nvidia Container Toolkit. Run the following commands in your terminal to install the toolkit:

Step 1: Configure the repository

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update

Step 2: Install the NVIDIA Container Toolkit packages

sudo apt-get install -y nvidia-container-toolkit

Step 3: Configure Docker to use Nvidia driver

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 4: Test it with a Sample Container

Start the following Docker Image:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

If you see output similar to this:

Sun Sep 15 03:02:26 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off | 00000000:01:00.0  On |                  N/A |
|  0%   44C    P8              17W / 250W |   1256MiB / 11264MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

then you have successfully installed the Nvidia Container Toolkit on your system.

Start and Test the Ollama Docker Container

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Loading an Run an Ollama model using Docker exec

In order to load a model and run it as a test you can utilize the following command.

docker exec -it ollama ollama run llama3.1
>>> hello world
Hello world back to you! Is there something I can help you with, or would you like to chat?

>>> /bye

Once you run this command, you will be at the prompt of the Ollama model. You can then start chatting with it by typing in the message you want to send it and pressing enter. It will then respond with the message to you. You close the session by typing in /bye and pressing enter.

Bonus: Seamless integration on the Host System

If you plan to use Ollama frequently from your host terminal, you can create an alias for the Docker exec command. This way, you can run Ollama as if it were installed locally without using Docker.

To do this, open your ~/.bash_aliases file and add the following line:

alias ollama="docker exec -it ollama ollama"

Now you can use Ollama using the alias and run it as follows as if you would have installed it locally without using Docker.

$ ollama list
NAME           	ID          	SIZE  	MODIFIED     
starcoder2:3b  	9f4ae0aff61e	1.7 GB	21 hours ago	
llama3.1:latest	42182419e950	4.7 GB	21 hours ago

The command shows the installed models on the host machine.

Congratulation, now you can also test Ollama using the curl command over the network.

Test Ollama with a curl rest API call over the network

The following command will send an OpenAI chat completion request to the Ollama model running on your host machine.

curl http://IpOrDomainOfYourHost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Tell a joke."}],
    "max_tokens": 100,
    "temperature": 0.7
}'

It should return a response like this:

{"id":"chatcmpl-302","object":"chat.completion","created":1726370508,"model":"llama3.1","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":31,"total_tokens":45}}

Next Steps

From here many OpenAI API integrations are available to you.

For example Nextcloud Assistant which integrates LocalAI into the Nextcloud ecosystem.
Or the Continue plugin for the VSCode IDE and the Intellij IDE wich allows you to have an assistent during coding workflow.
Or many Chatbot integrations which allow you to have a chatbot in your browser.

Most of these integrations only require the URL to your LocalAI API providing instance which you now have!

From here you can also explore the OpenAI API Docs and learn how to build your own integrations.

Follow up

I made a followup post setting up the integration with VSCode IDE here: 🖇Using Ollama with VSCode