Ollama Server

Updated Jun 15, 2026 ·

Overview

Ollama can be used from the terminal, but it can also be used from code.

When Ollama is running, it starts a local API server that your Python scripts can send requests to.

Run local models from code
Use custom Ollama models
Send prompts through an API
Use the OpenAI SDK with Ollama

This makes Ollama useful for building local AI tools and small automation scripts.

Lab: Prerequisites

See the codefiles here: Github

Clone the repository and navigate to the directory.

git clone https://github.com/joseeden/llm-engineering-sandbox.git
cd llm-engineering-sandbox/openllm-ollama/basics-api

Project structure:

llm-engineering-sandbox/openllm-ollama/basics-api
|

Create a virtual environment and activate it.

python -m venv ~/venv
source ~/venv/bin/activate  

If using Powershell on Windows, use:

python -m venv "$HOME\venv"
& "$HOME\venv\Scripts\Activate.ps1"

Install dependencies:
```
pip install -r requirements.txt
```
Create a .env file in the root of the project.
```
cp .env.example .env
```
Populate your .env file with the appropriate values.
```
OLLAMA_BASE_URL=http://localhost:11434
MODEL_NAME=<ENTER_LOCAL_MODEL_NAME>
```
To get the model name that you currently have, run:
```
ollama list
```

Start the Ollama server

Ollama usually starts its server automatically when the app is running.

The default API address is:

http://localhost:11434

If the server is not running, start it manually:

ollama serve

Output:

Listening on 127.0.0.1:11434

This keeps the Ollama server running in the terminal.

You can also start the server by running any model:

ollama run gemma4:e4b

After that, Ollama is ready to receive API requests.

To verify, you can try running cURL:

curl -s http://localhost:11434/api/tags | jq

Output:

{
  "models": [
    {
      "name": "qwen-custom-2:latest",
      "model": "qwen-custom-2:latest",
      "modified_at": "2026-06-13T21:27:32.762128769+08:00",
      .... 

Sending Prompts to the Ollama API

Ollama has its own API. You can send a prompt to a model using Python.

See codefiles here: Github

Run the script:

python simple-chat.py

Output:

The exact response may be different because model output can vary.

Use a Custom Ollama Model

You can also call your own custom model from Python. This is useful when your custom model already has a system prompt or parameters saved.

For example, if you created a model called service-agent:

$ ollama list

NAME                    ID              SIZE      MODIFIED          
qwen-custom-2:latest    2bbf5a10ab12    1.3 GB    58 minutes ago       
qwen-custom:latest      1a807b3eee59    968 MB    About an hour ago    
service-agent:latest    20a396a79699    9.6 GB    2 hours ago          
gemma3:4b               a2af6cc3eb7f    3.3 GB    5 hours ago          
gemma4:e4b              c6eb396dbd59    9.6 GB    5 hours ago          
minimax-m3:cloud        d03a959f45c0    -         6 hours ago  

You can store the model name in an environment variable and use it in your code.

See codefiles here: Github

export MODEL_NAME="qwen-custom"

Run the script:

python using-custom-model.py

Structured JSON Output

Ollama can return structured JSON if you provide a schema.

This is useful when you want to use the model response in another script or application.

See codefiles here: Github

Run the sample script:

python structured-pokemon-names.py

Structured output is helpful when the response needs to follow a fixed format.

Use the OpenAI SDK with Ollama

Ollama also supports an OpenAI-compatible API.

This means you can use the OpenAI Python SDK, but point it to your local Ollama server.

First, install the OpenAI package:

pip install openai

See codefiles here: Github

Run the script:

python using-openai-sdk.py

Output:

Ollama is an easy-to-use command-line tool that allows developers and users to easily download, run, and manage various open-source large language models (LLMs) on their local machines without needing cloud services or complex setup.

Note: The API key value does not matter because the model is running locally.

Send Images to a Vision Model

Some Ollama models can understand images.

For example, a vision model can describe an image when the image is converted to Base64.

Sample image:

See codefiles here: Github

Run the script:

python image-parser.py

The exact response depends on the image and the model used.

Overview​

Lab: Prerequisites​

Start the Ollama server​

Sending Prompts to the Ollama API​

Use a Custom Ollama Model​

Structured JSON Output​

Use the OpenAI SDK with Ollama​

Send Images to a Vision Model​

Overview

Lab: Prerequisites

Start the Ollama server

Sending Prompts to the Ollama API

Use a Custom Ollama Model

Structured JSON Output

Use the OpenAI SDK with Ollama

Send Images to a Vision Model