> ## Documentation Index > Fetch the complete documentation index at: https://wiki.lumiweb.cc/llms.txt > Use this file to discover all available pages before exploring further. # Your own LLM (Ollama) > Run a neural net on your server without a GPU Ollama runs open language models right on your own server: your data never goes to the cloud, and you don't need API keys or subscriptions. You install the engine, download a model, and chat with it through the console or over a local API from your own apps. Commands are current as of writing. Before installing, check the official site [ollama.com](https://ollama.com) and the repository [github.com/ollama/ollama](https://github.com/ollama/ollama) — the syntax and the model list change. ## The reality about hardware An LLM loves memory and is slow at crunching numbers on a CPU. Without a graphics card, go for modest models: | Model size | Where it runs | What to expect | | ------------------------------------ | ------------------------------------ | ------------------------------ | | 1B–3B (e.g. `llama3.2`, `gemma3:1b`) | CPU + a few GB of RAM | it answers, but not instantly | | 7B–8B | CPU + lots of RAM, noticeably slower | tolerable for experiments | | 13B and up | you really need a GPU | not an option on a regular VPS | The main limiter is RAM. The model loads into RAM in full; if there isn't enough, the OOM-killer terminates the process. Choose a model sized to your server and keep some memory in reserve. On a small plan, a [swap file](/en/vps/swap) helps — but swap is slow; it's a safety net, not a replacement for RAM. ## Installation The official script installs Ollama and brings up a background service: ```bash theme={"system"} curl -fsSL https://ollama.com/install.sh | sh ``` Download and start a conversation right away with one command: ```bash theme={"system"} ollama run llama3.2 ``` Without a tag, the command pulls the 3B variant (\~2 GB). On small servers (2–4 GB of RAM), grab the lightweight variant explicitly: ```bash theme={"system"} ollama run llama3.2:1b ``` The `:1b` and `:3b` tags set the model size. The first run downloads the model (from hundreds of MB to several GB); after that it's pulled from disk. To exit the conversation — `/bye`. ## Managing models ```bash theme={"system"} ollama pull gemma3 ``` The list of available models and their sizes is at [ollama.com/library](https://ollama.com/library). ```bash theme={"system"} ollama list ``` ```bash theme={"system"} ollama rm gemma3 ``` ## API for your apps After installation, Ollama listens on a local API at `http://localhost:11434`. This is how your code talks to it: ```bash theme={"system"} curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [{ "role": "user", "content": "Hello" }], "stream": false }' ``` Want to chat through a browser instead of the console? Install the [Open WebUI](https://github.com/open-webui/open-webui) web interface in a container. It's a handy chat on top of Ollama. How to set up Docker — in the [Docker on your server](/en/vps/docker) guide. Don't expose port `11434` to the internet "as is" — an open API lets anyone load it up with your resources. Keep it reachable locally only (that's the default), or close it off with a [firewall](/en/vps/firewall) and let traffic in only through a secured proxy with authentication. Lumi handles the server and network; software setup is on you. Network or port issues — write to [@lumisup\_robot](https://t.me/lumisup_robot). ## Where to next How to add swap when there isn't enough RAM for the model. Run Open WebUI for Ollama via Docker.