Ollama provides the Python SDK, which allows developers to interact with models running locally in a Python environment.
Ollama's Python SDK enables easy integration of natural language processing tasks into Python projects, performing various operations such as text generation, dialogue generation, model management, etc. without manually calling the command line.
Install Python SDK
First, you need to install Ollama's Python SDK.
You can use pip to install:
pip install ollama
Make sure that Python is installed in the environment and that the network environment can access Ollama local services.
Start local services
Before using the Python SDK, make sure that the Ollama local service is started.
You can use the command line tool to start it:
ollama serve
After starting the local service, the Python SDK will communicate with the local service and perform tasks such as model inference.
Use Ollama's Python SDK for inference
Once the SDK is installed and the local service is started, you can interact with Ollama through Python code.
First, import chat and ChatResponse from the ollama library:
from ollama import chat from ollama import ChatResponse
Through the Python SDK, you can send requests to a specified model, generating text or conversations:
from ollama import chat from ollama import ChatResponse response: ChatResponse = chat(model='deepseek-coder', messages=[ { 'role': 'user', 'content': 'Who are you? ', }, ]) # Print response contentprint(response['message']['content']) # Or directly access the field of the response object#print()
Execute the above code and the output is:
I am a programming intelligent assistant developed by China's DeepSeek company called DeepCoder. I can help you with questions and tasks related to computer science. If you have any topics about this or need to study or query information in a certain field, please feel free to ask questions!
The llama SDK also supports streaming response, and developers can enable responsive streaming by setting stream=True when sending requests.
from ollama import chat stream = chat( model='deepseek-coder', messages=[{'role': 'user', 'content': 'Who are you? '}], stream=True, ) # Print response content piece by piecefor chunk in stream: print(chunk['message']['content'], end='', flush=True)
Customize the client
Developers can also create custom clients to further control request configuration, such as setting custom headers or specifying the URL of the local service.
Create a custom client
Through the Client, you can customize the request settings (such as request headers, URLs, etc.) and send the request.
from ollama import Client client = Client( host='http://localhost:11434', headers={'x-some-header': 'some-value'} ) response = (model='deepseek-coder', messages=[ { 'role': 'user', 'content': 'Who are you?', }, ]) print(response['message']['content'])
Output:
I am a programming intelligent assistant developed by China's DeepSeek company called DeepCoder. I mainly use it to answer computer science-related questions and help solve less clear or difficult places related to these topics. If you have any questions about Python, JavaScript or other computer science fields, please ask me a question!
Asynchronous client
If the developer wants to execute requests asynchronously, he can use the AsyncClient class, which is suitable for scenarios where concurrency is required.
import asyncio from ollama import AsyncClient async def chat(): message = {'role': 'user', 'content': 'Who are you?'} response = await AsyncClient().chat(model='deepseek-coder', messages=[message]) print(response['message']['content']) (chat())
Output:
I am a programming intelligent assistant developed by China's DeepSeek company called "DeepCoder". I am an AI model that specifically answers computer science-related questions, which can help users answer questions about machine learning, artificial intelligence and other fields. I cannot provide services for other non-technical issues or requests, such as sentiment analysis or daily conversations.
Asynchronous clients support the same functionality as traditional synchronous requests. The only difference is that the requests are executed asynchronously, which can improve performance, especially in high concurrency scenarios.
Asynchronous streaming response
If the developer needs to process stream responses asynchronously, it can be done by setting stream=True as an asynchronous generator.
import asyncio from ollama import AsyncClient async def chat(): message = {'role': 'user', 'content': 'Who are you?'} async for part in await AsyncClient().chat(model='deepseek-coder', messages=[message], stream=True): print(part['message']['content'], end='', flush=True) (chat())
The response will be returned asynchronously part by part, each part can be processed on the fly.
Common API methods
The Ollama Python SDK provides some commonly used API methods for manipulating and managing models.
1. Chat method
Conduct dialogue with the model to generate, send user messages and get model response:
(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
2. generate method
Used for text generation tasks. Similar to the chat method, but it only requires a prompt parameter:
(model='llama3.2', prompt='Why is the sky blue?')
3. List method
List all available models:
()
4. Show method
Displays detailed information for the specified model:
('llama3.2')
5. create method
Create a new model from an existing model:
(model='example', from_='llama3.2', system="You are Mario from Super Mario Bros.")
6. Copy method
Copy the model to another location:
('llama3.2', 'user/llama3.2')
7. delete method
Delete the specified model:
('llama3.2')
8. Pull method
Pull the model from the remote repository:
('llama3.2')
9. Push method
Push local models to remote repository:
('user/llama3.2')
10. Embed method
Generate text embedding:
(model='llama3.2', input='The sky is blue because of rayleigh scattering')
11. ps method
View a list of running models:
()
Error handling
The Ollama SDK throws an error when the request fails or when there is a problem in response to streaming.
Developers can use the try-except statement to catch these errors and process them as needed.
Case
model = 'does-not-yet-exist' try: response = (model) except as e: print('Error:', ) if e.status_code == 404: (model)
In the above example, if the model does-not-yet-exist does not exist, a ResponseError error is thrown, and the developer can choose to pull the model or perform other processing after being caught.
This is the end of this article about the summary of Ollama Python usage. For more information about Ollama Python usage, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!