> ## Documentation Index
> Fetch the complete documentation index at: https://wiki.lumiweb.cc/llms.txt
> Use this file to discover all available pages before exploring further.

# Your own LLM (Ollama)

> Run a neural net on your server without a GPU

Ollama runs open language models right on your own server: your data never goes to the cloud, and you don't need API keys or subscriptions. You install the engine, download a model, and chat with it through the console or over a local API from your own apps.

<Note>
  Commands are current as of writing. Before installing, check the official site [ollama.com](https://ollama.com) and the repository [github.com/ollama/ollama](https://github.com/ollama/ollama) — the syntax and the model list change.
</Note>

## The reality about hardware

An LLM loves memory and is slow at crunching numbers on a CPU. Without a graphics card, go for modest models:

| Model size                           | Where it runs                        | What to expect                 |
| ------------------------------------ | ------------------------------------ | ------------------------------ |
| 1B–3B (e.g. `llama3.2`, `gemma3:1b`) | CPU + a few GB of RAM                | it answers, but not instantly  |
| 7B–8B                                | CPU + lots of RAM, noticeably slower | tolerable for experiments      |
| 13B and up                           | you really need a GPU                | not an option on a regular VPS |

<Warning>
  The main limiter is RAM. The model loads into RAM in full; if there isn't enough, the OOM-killer terminates the process. Choose a model sized to your server and keep some memory in reserve. On a small plan, a [swap file](/en/vps/swap) helps — but swap is slow; it's a safety net, not a replacement for RAM.
</Warning>

## Installation

<Steps>
  <Step title="Install Ollama">
    The official script installs Ollama and brings up a background service:

    ```bash theme={"system"}
    curl -fsSL https://ollama.com/install.sh | sh
    ```
  </Step>

  <Step title="Run a model">
    Download and start a conversation right away with one command:

    ```bash theme={"system"}
    ollama run llama3.2
    ```

    Without a tag, the command pulls the 3B variant (\~2 GB). On small servers (2–4 GB of RAM), grab the lightweight variant explicitly:

    ```bash theme={"system"}
    ollama run llama3.2:1b
    ```

    The `:1b` and `:3b` tags set the model size. The first run downloads the model (from hundreds of MB to several GB); after that it's pulled from disk. To exit the conversation — `/bye`.
  </Step>
</Steps>

## Managing models

<AccordionGroup>
  <Accordion title="Download a model ahead of time">
    ```bash theme={"system"}
    ollama pull gemma3
    ```

    The list of available models and their sizes is at [ollama.com/library](https://ollama.com/library).
  </Accordion>

  <Accordion title="List downloaded models">
    ```bash theme={"system"}
    ollama list
    ```
  </Accordion>

  <Accordion title="Remove a model and free up disk">
    ```bash theme={"system"}
    ollama rm gemma3
    ```
  </Accordion>
</AccordionGroup>

## API for your apps

After installation, Ollama listens on a local API at `http://localhost:11434`. This is how your code talks to it:

```bash theme={"system"}
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": false
}'
```

<Tip>
  Want to chat through a browser instead of the console? Install the [Open WebUI](https://github.com/open-webui/open-webui) web interface in a container. It's a handy chat on top of Ollama. How to set up Docker — in the [Docker on your server](/en/vps/docker) guide.
</Tip>

<Warning>
  Don't expose port `11434` to the internet "as is" — an open API lets anyone load it up with your resources. Keep it reachable locally only (that's the default), or close it off with a [firewall](/en/vps/firewall) and let traffic in only through a secured proxy with authentication.
</Warning>

Lumi handles the server and network; software setup is on you. Network or port issues — write to [@lumisup\_robot](https://t.me/lumisup_robot).

## Where to next

<CardGroup cols={2}>
  <Card title="Swap" icon="hard-drive" href="/en/vps/swap">
    How to add swap when there isn't enough RAM for the model.
  </Card>

  <Card title="Docker" icon="box" href="/en/vps/docker">
    Run Open WebUI for Ollama via Docker.
  </Card>
</CardGroup>
