Ollama AI as a Network Service
Enabling Seamless Integration with Ollama as a Network Service
Developed by the same team behind the innovative llama models, Ollama offers a unique advantage: compatibility with LocalAI API in its later versions. This means that Ollama can be seamlessly integrated into various LLM (Large Language Model) integration applications, such as Nextcloud Assistant or the Continue plugin for VSCode and IntelliJ.
For my own installation process, I opted for a Docker-based approach. Not only does this provide enhanced security by running as root in an isolated environment, but it also simplifies future updates and changes without affecting the host OS.
However, to fully utilize Docker and leverage the power of GPU acceleration (in this case, nVidia), the Docker installation must be specifically configured for direct GPU usage. This enhancement is crucial for optimizing performance with LLM models offloaded onto the GPU.
Installation of Nvidia Container Toolkit
Source Link: Nvidia Container Toolkit Installation
Installation with Apt
If you are using Debian or Ubuntu-based systems, you can use the package manager apt to install the Nvidia Container Toolkit. Run the following commands in your terminal to install the toolkit:
Step 1: Configure the repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update
Step 2: Install the NVIDIA Container Toolkit packages
sudo apt-get install -y nvidia-container-toolkit
Step 3: Configure Docker to use Nvidia driver
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Step 4: Test it with a Sample Container
Start the following Docker Image:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
If you see output similar to this:
Sun Sep 15 03:02:26 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 Ti Off | 00000000:01:00.0 On | N/A |
| 0% 44C P8 17W / 250W | 1256MiB / 11264MiB | 6% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
then you have successfully installed the Nvidia Container Toolkit on your system.
Start and Test the Ollama Docker Container
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Loading an Run an Ollama model using Docker exec
In order to load a model and run it as a test you can utilize the following command.
docker exec -it ollama ollama run llama3.1
>>> hello world
Hello world back to you! Is there something I can help you with, or would you like to chat?
>>> /bye
Once you run this command, you will be at the prompt of the Ollama model. You can then start chatting with it by typing in the message you want to send it and pressing enter.
It will then respond with the message to you. You close the session by typing in /bye and pressing enter.
Bonus: Seamless integration on the Host System
If you plan to use Ollama frequently from your host terminal, you can create an alias for the Docker exec command. This way, you can run Ollama as if it were installed locally without using Docker.
To do this, open your ~/.bash_aliases file and add the following line:
alias ollama="docker exec -it ollama ollama"
Now you can use Ollama using the alias and run it as follows as if you would have installed it locally without using Docker.
$ ollama list
NAME ID SIZE MODIFIED
starcoder2:3b 9f4ae0aff61e 1.7 GB 21 hours ago
llama3.1:latest 42182419e950 4.7 GB 21 hours ago
The command shows the installed models on the host machine.
Congratulation, now you can also test Ollama using the curl command over the network.
Test Ollama with a curl rest API call over the network
The following command will send an OpenAI chat completion request to the Ollama model running on your host machine.
curl http://IpOrDomainOfYourHost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Tell a joke."}],
"max_tokens": 100,
"temperature": 0.7
}'
It should return a response like this:
{"id":"chatcmpl-302","object":"chat.completion","created":1726370508,"model":"llama3.1","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":31,"total_tokens":45}}
Next Steps
From here many OpenAI API integrations are available to you.
- For example Nextcloud Assistant which integrates LocalAI into the Nextcloud ecosystem.
- Or the Continue plugin for the VSCode IDE and the Intellij IDE wich allows you to have an assistent during coding workflow.
- Or many Chatbot integrations which allow you to have a chatbot in your browser.
Most of these integrations only require the URL to your LocalAI API providing instance which you now have!
From here you can also explore the OpenAI API Docs and learn how to build your own integrations.
Follow up
I made a followup post setting up the integration with VSCode IDE here: đŸ–‡Using Ollama with VSCode