Commands are current as of writing. Before installing, check the official site ollama.com and the repository github.com/ollama/ollama — the syntax and the model list change.
The reality about hardware
An LLM loves memory and is slow at crunching numbers on a CPU. Without a graphics card, go for modest models:| Model size | Where it runs | What to expect |
|---|---|---|
1B–3B (e.g. llama3.2, gemma3:1b) | CPU + a few GB of RAM | it answers, but not instantly |
| 7B–8B | CPU + lots of RAM, noticeably slower | tolerable for experiments |
| 13B and up | you really need a GPU | not an option on a regular VPS |
Installation
Run a model
Download and start a conversation right away with one command:Without a tag, the command pulls the 3B variant (~2 GB). On small servers (2–4 GB of RAM), grab the lightweight variant explicitly:The
:1b and :3b tags set the model size. The first run downloads the model (from hundreds of MB to several GB); after that it’s pulled from disk. To exit the conversation — /bye.Managing models
Download a model ahead of time
Download a model ahead of time
List downloaded models
List downloaded models
Remove a model and free up disk
Remove a model and free up disk
API for your apps
After installation, Ollama listens on a local API athttp://localhost:11434. This is how your code talks to it:
Where to next
Swap
How to add swap when there isn’t enough RAM for the model.
Docker
Run Open WebUI for Ollama via Docker.