SoFunction
Updated on 2025-04-13

Ollama Python usage summary

Ollama provides the Python SDK, which allows developers to interact with models running locally in a Python environment.

Ollama's Python SDK enables easy integration of natural language processing tasks into Python projects, performing various operations such as text generation, dialogue generation, model management, etc. without manually calling the command line.

Install Python SDK

First, you need to install Ollama's Python SDK.

You can use pip to install:

pip install ollama

Make sure that Python is installed in the environment and that the network environment can access Ollama local services.

Start local services

Before using the Python SDK, make sure that the Ollama local service is started.

You can use the command line tool to start it:

ollama serve

After starting the local service, the Python SDK will communicate with the local service and perform tasks such as model inference.

Use Ollama's Python SDK for inference

Once the SDK is installed and the local service is started, you can interact with Ollama through Python code.

First, import chat and ChatResponse from the ollama library:

from ollama import chat
from ollama import ChatResponse

Through the Python SDK, you can send requests to a specified model, generating text or conversations:

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='deepseek-coder', messages=[
  {
    'role': 'user',
    'content': 'Who are you?  ',
  },
])
# Print response contentprint(response['message']['content'])

# Or directly access the field of the response object#print()

Execute the above code and the output is:

I am a programming intelligent assistant developed by China's DeepSeek company called DeepCoder. I can help you with questions and tasks related to computer science. If you have any topics about this or need to study or query information in a certain field, please feel free to ask questions!

The llama SDK also supports streaming response, and developers can enable responsive streaming by setting stream=True when sending requests.

from ollama import chat

stream = chat(
    model='deepseek-coder',
    messages=[{'role': 'user', 'content': 'Who are you?  '}],
    stream=True,
)

# Print response content piece by piecefor chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Customize the client

Developers can also create custom clients to further control request configuration, such as setting custom headers or specifying the URL of the local service.

Create a custom client

Through the Client, you can customize the request settings (such as request headers, URLs, etc.) and send the request.

from ollama import Client

client = Client(
    host='http://localhost:11434',
    headers={'x-some-header': 'some-value'}
)

response = (model='deepseek-coder', messages=[
    {
        'role': 'user',
        'content': 'Who are you?',
    },
])
print(response['message']['content'])

Output:

I am a programming intelligent assistant developed by China's DeepSeek company called DeepCoder. I mainly use it to answer computer science-related questions and help solve less clear or difficult places related to these topics. If you have any questions about Python, JavaScript or other computer science fields, please ask me a question!

Asynchronous client

If the developer wants to execute requests asynchronously, he can use the AsyncClient class, which is suitable for scenarios where concurrency is required.

import asyncio
from ollama import AsyncClient

async def chat():
    message = {'role': 'user', 'content': 'Who are you?'}
    response = await AsyncClient().chat(model='deepseek-coder', messages=[message])
    print(response['message']['content'])

(chat())

Output:

I am a programming intelligent assistant developed by China's DeepSeek company called "DeepCoder". I am an AI model that specifically answers computer science-related questions, which can help users answer questions about machine learning, artificial intelligence and other fields. I cannot provide services for other non-technical issues or requests, such as sentiment analysis or daily conversations.

Asynchronous clients support the same functionality as traditional synchronous requests. The only difference is that the requests are executed asynchronously, which can improve performance, especially in high concurrency scenarios.

Asynchronous streaming response

If the developer needs to process stream responses asynchronously, it can be done by setting stream=True as an asynchronous generator.

import asyncio
from ollama import AsyncClient

async def chat():
    message = {'role': 'user', 'content': 'Who are you?'}
    async for part in await AsyncClient().chat(model='deepseek-coder', messages=[message], stream=True):
        print(part['message']['content'], end='', flush=True)

(chat())

The response will be returned asynchronously part by part, each part can be processed on the fly.

Common API methods

The Ollama Python SDK provides some commonly used API methods for manipulating and managing models.

1. Chat method

Conduct dialogue with the model to generate, send user messages and get model response:

(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

2. generate method

Used for text generation tasks. Similar to the chat method, but it only requires a prompt parameter:

(model='llama3.2', prompt='Why is the sky blue?')

3. List method

List all available models:

()

4. Show method

Displays detailed information for the specified model:

('llama3.2')

5. create method

Create a new model from an existing model:

(model='example', from_='llama3.2', system="You are Mario from Super Mario Bros.")

6. Copy method

Copy the model to another location:

('llama3.2', 'user/llama3.2')

7. delete method

Delete the specified model:

('llama3.2')

8. Pull method

Pull the model from the remote repository:

('llama3.2')

9. Push method

Push local models to remote repository:

('user/llama3.2')

10. Embed method

Generate text embedding:

(model='llama3.2', input='The sky is blue because of rayleigh scattering')

11. ps method

View a list of running models:

()

Error handling

The Ollama SDK throws an error when the request fails or when there is a problem in response to streaming.

Developers can use the try-except statement to catch these errors and process them as needed.

Case

model = 'does-not-yet-exist'

try:
    response = (model)
except  as e:
    print('Error:', )
    if e.status_code == 404:
        (model)

In the above example, if the model does-not-yet-exist does not exist, a ResponseError error is thrown, and the developer can choose to pull the model or perform other processing after being caught.

This is the end of this article about the summary of Ollama Python usage. For more information about Ollama Python usage, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!