Building Applications

Updated Jun 15, 2026 ·

Overview

Local models can be used in two main ways:

Chat interface
API access

With a chat interface, you interact with the model manually. With API access, your application sends requests to the model and receives responses automatically.

Common use cases include:

Building applications
Automating workflows
Generating content
Processing user input

Using an API makes it possible to integrate local AI models into your own tools and services.

info

See code files here: Github

LM Studio API Server

LM Studio includes a local API server that allows applications to communicate with models running on your machine.

Runs locally
Exposes API endpoints
Accepts HTTP requests
Does not require external AI services

Before applications can access a model, the LM Studio API server must be running.

Open the Developer section in LM Studio and start the server.

Once started, LM Studio displays:

Item	Description
Server status	Shows whether the LM Studio API server is running or stopped.
API endpoints	Shows the URLs that applications use to communicate with the server.
Request logs	Displays incoming API requests and helps with troubleshooting.
Server settings	Provides configuration options such as port settings and model loading behavior.

Because the server runs locally, it is usually only accessible from your own machine unless you manually expose it to other systems.

This local server acts as the bridge between your applications and the model.

Useful Server Settings

LM Studio provides several server settings you can adjust.

Port configuration
Just-in-Time model loading (On-demand)
Automatic model unloading

Just-in-Time Model Loading (On-demand)

If enabled, LM Studio automatically loads a model when a request is received for a model that is not currently loaded.

Less manual work
No need to preload models

Trade-off:

The first request may take longer because the model must be loaded first

Automatic Model Unloading

LM Studio can automatically unload models that have not been used for a period of time.

Frees memory
Reduces resource usage

This helps manage system resources when working with multiple models.

Accessing Local Models

There are two common ways to access a local model.

Method	Description	Advantages
HTTP requests	Send requests directly to the API.	Gives full control and works with any programming language.
OpenAI SDK	Use an OpenAI-compatible client library.	Easier to use because it provides helper functions and a familiar interface.

Before testing each method, make sure you can connect to the LM Studio API server by sending a simple request using a tool like curl or Postman.

curl http://127.0.0.1:1234/v1/models

If you are using WSL in Windows, please see the Troubleshooting: Using WSL. section below.

If you are using Windows, run from PowerShell:

curl http://localhost:1234/v1/models

Lab: Prerequisites

See the codefiles here: Github

Clone the repository and navigate to the lm-studio/api-server directory.

git clone https://github.com/joseeden/llm-engineering-sandbox.git
cd llm-engineering-sandbox/openllm-lm-studio/api-server

Project structure:

llm-engineering-sandbox/openllm-lm-studio/api-server
|
├── README.md
├── chatbot.py
├── image-parser.py
├── images
│   └── taal.jpg
├── requirements.txt
├── using-http-request.py
├── using-openai-sdk.py
├── wsl-chatbot.py
├── wsl-image-parser.py
├── wsl-using-http-request.py
└── wsl-using-openai-sdk.py

Create a virtual environment and activate it.

python -m venv ~/venv
source ~/venv/bin/activate  

If using Powershell on Windows, use:

python -m venv "$HOME\venv"
& "$HOME\venv\Scripts\Activate.ps1"

Install dependencies:
```
pip install -r requirements.txt
```
Create a .env file in the root of the project.
```
cp .env.example .env
```

Populate your .env file with the appropriate values.

LM_STUDIO_BASE_URL=<ENTER_LM_STUDIO_API_URL>
MODEL_NAME=<ENTER_LOCAL_MODEL_NAME>

## Example (if using localhost):
# LM_STUDIO_BASE_URL=http://localhost:1234
# MODEL_NAME=google/gemma-4-e4b

If you're using WSL in Windows, get your Windows host IP and use it instead of localhost:

cat /etc/resolv.conf | grep nameserver | awk '{print $2}'

Then use it in your .env file:

LM_STUDIO_BASE_URL=http://<ENTER_WINDOWS_HOST_IP>:1234

Direct HTTP Requests

One of the simplest ways to communicate with a local model is by sending an HTTP request.

Create a Python script (e.g., app.py) that sends a request to the LM Studio API server. You can create this file anywhere on your system since now we're doing everything outside of LM Studio.

See the script here: Github

The code does the following:

Connects to the LM Studio API server
Sends a prompt
Receives a response
Prints the returned JSON

Note If you are using WSL in Windows, please see the Troubleshooting: Using WSL. section below.

Run the script:

python using-http-request.py

Output:

An LLM (Large Language Model) is a sophisticated type of artificial intelligence trained on immense datasets of text and code, allowing it to understand and generate humanremarkable fluency. It functions by identifying complex statistical patterns within this vast data, predicting the most logically and grammatically probable sequence of wo to any given prompt or query. This ability enables it to perform a wide range of tasks, including answering questions, summarizing documents, translating languages, and give text that mimics human communication.

The exact response depends on the model currently loaded in LM Studio.

This approach works because LM Studio exposes a standard HTTP API that applications can call.

Using the OpenAI SDK

LM Studio provides an API that is compatible with the OpenAI API format.

Because of this, applications can use the OpenAI SDK while still communicating with a local model.

Same API structure
Uses local models
No OpenAI API required

The main difference is that you must point the SDK to the LM Studio server instead of the real OpenAI service.

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"
)

Note If you are using WSL in Windows, please see the Troubleshooting: Using WSL. section below.

See the script here: Github

Run the script:

python using-openai-sdk.py

Output:

An LLM is a type of artificial intelligence model trained on massive amounts of text data to understand and generate human language. It operates by identifying complex statistical patterns within this data, allowing it to predict the most plausible sequence of words in response to any prompt. Consequently, these models can perform sophisticated tasks like writing articles, summarizing documents, answering questions, and even generating code.

The exact response depends on the model currently loaded in LM Studio.

Examples using OpenAI SDK

Chatbot

You can build a simple chatbot that uses a local model to respond to user input.

See the script here: Github

Run the script:

python chatbot.py

Image Parser

In this example, we have a script that sends an image to a local model and asks it to describe the image in detail. It does this by:

Reading an image file
Encoding it as a base64 data URL
Sending it to the model with a prompt
Printing the model's response

See the script here: Github

Sample image:

Run the script:

python image-parser.py

Troubleshooting

Using WSL

If you are using WSL in Windows, you may encounter issues connecting to the LM Studio API server.

Make sure to enable the local network access option in LM Studio settings.

Try running cURL from WSL:

curl http://$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):1234/v1/models

If you can connect successfully, you can proceed with running the scripts.

For reference, I've created scripts for both direct HTTP requests and using the OpenAI SDK that are compatible with WSL.

See the script here: Github

├── wsl-chatbot.py
├── wsl-image-parser.py
├── wsl-using-http-request.py
└── wsl-using-openai-sdk.py

Overview​

LM Studio API Server​

Useful Server Settings​

Just-in-Time Model Loading (On-demand)​

Automatic Model Unloading​

Accessing Local Models​

Lab: Prerequisites​

Direct HTTP Requests​

Using the OpenAI SDK​

Examples using OpenAI SDK​

Chatbot​

Image Parser​

Troubleshooting​

Using WSL​

Overview

LM Studio API Server

Useful Server Settings

Just-in-Time Model Loading (On-demand)

Automatic Model Unloading

Accessing Local Models

Lab: Prerequisites

Direct HTTP Requests

Using the OpenAI SDK

Examples using OpenAI SDK

Chatbot

Image Parser

Troubleshooting

Using WSL