Skip to main content
Ollama runs open language models right on your own server: your data never goes to the cloud, and you don’t need API keys or subscriptions. You install the engine, download a model, and chat with it through the console or over a local API from your own apps.
Commands are current as of writing. Before installing, check the official site ollama.com and the repository github.com/ollama/ollama — the syntax and the model list change.

The reality about hardware

An LLM loves memory and is slow at crunching numbers on a CPU. Without a graphics card, go for modest models:
Model sizeWhere it runsWhat to expect
1B–3B (e.g. llama3.2, gemma3:1b)CPU + a few GB of RAMit answers, but not instantly
7B–8BCPU + lots of RAM, noticeably slowertolerable for experiments
13B and upyou really need a GPUnot an option on a regular VPS
The main limiter is RAM. The model loads into RAM in full; if there isn’t enough, the OOM-killer terminates the process. Choose a model sized to your server and keep some memory in reserve. On a small plan, a swap file helps — but swap is slow; it’s a safety net, not a replacement for RAM.

Installation

1

Install Ollama

The official script installs Ollama and brings up a background service:
curl -fsSL https://ollama.com/install.sh | sh
2

Run a model

Download and start a conversation right away with one command:
ollama run llama3.2
Without a tag, the command pulls the 3B variant (~2 GB). On small servers (2–4 GB of RAM), grab the lightweight variant explicitly:
ollama run llama3.2:1b
The :1b and :3b tags set the model size. The first run downloads the model (from hundreds of MB to several GB); after that it’s pulled from disk. To exit the conversation — /bye.

Managing models

ollama pull gemma3
The list of available models and their sizes is at ollama.com/library.
ollama list
ollama rm gemma3

API for your apps

After installation, Ollama listens on a local API at http://localhost:11434. This is how your code talks to it:
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": false
}'
Want to chat through a browser instead of the console? Install the Open WebUI web interface in a container. It’s a handy chat on top of Ollama. How to set up Docker — in the Docker on your server guide.
Don’t expose port 11434 to the internet “as is” — an open API lets anyone load it up with your resources. Keep it reachable locally only (that’s the default), or close it off with a firewall and let traffic in only through a secured proxy with authentication.
Lumi handles the server and network; software setup is on you. Network or port issues — write to @lumisup_robot.

Where to next

Swap

How to add swap when there isn’t enough RAM for the model.

Docker

Run Open WebUI for Ollama via Docker.