SoFunction
Updated on 2025-04-07

Detailed explanation of using the local large language model for Python calling Ollama library

Preface

ollamais a Python library for calling local large language models (LLMs). It aims to provide a simple and efficient API interface so that developers can easily interact with local large language models. Here is how to use it in PythonollamaA detailed introduction to the library.

1. Install Ollama

Before using the library, make sure thatollama. You can install it through the following command:

pip install ollama

If you have not installed Python's package management toolpip, you can refer to the official documentation to install it.

2. Ollama's main functions

ollamaIt provides a simple way to interact with local large language models (such as llama or other models), mainly by calling the model through API to generate text, answer questions, etc.

3. Basic examples of using Ollama

The following isollamaBasic usage of  .

3.1 Import library

In Python scripts, you need to introduceollama

import ollama

3.2 Calling the model using Ollama

The core function of Ollama is to call local models for inference and generation. You can call the model in the following ways:

Generate text examples

Here is a simple example of generating text:

import ollama

# Call Ollama to use the big language modelresponse = (
    model="llama",  # The model name used    prompt="Hello, please briefly introduce the characteristics of Python."
)

# Print the generated contentprint(response)

Analytical model output

ReturnedresponseUsually a string representing the result generated by the model. You can further process it, such as formatting the output or storing it into a file.

3.3 Set custom parameters

When calling the model, you can pass some custom parameters to adjust the behavior of the model, such as the maximum generation length, the generated temperature, etc.

Supported parameters

Here are some common parameters:

  • model: Specify the name of the model (such as "llama" etc.).
  • prompt: Enter a prompt.
  • temperature: Affects the randomness of generated content, with values ​​ranging from 0 to 1.
  • max_tokens: Limit the maximum number of tokens generated.

Example: Custom Parameters

response = (
    model="llama",
    prompt="Write me a poem about spring.",
    temperature=0.7,  # Randomness during generation    max_tokens=100    # Limit the maximum length generated)

print(response)

3.4 Using a custom model

If you have trained a custom model locally, or have downloaded another model, you can use it by specifying the model path.

response = (
    model="/path/to/your/model",  # Specify the local model path    prompt="How to learn machine learning?"
)

print(response)

4. Integrated streaming generation

In some scenarios, you may want to gradually receive the results generated by the model instead of waiting for all generations to complete. This is achieved through Streaming.

for chunk in (
    model="llama",
    prompt="Step by step to generate an article about artificial intelligence."
):
    print(chunk, end="")

In streaming generation, the model will gradually return part of the generated results, which you can process in real time.

5. Error handling

When calling the model, you may encounter errors (such as incorrect model file path, request timeout, etc.). These errors can be handled by catching exceptions.

try:
    response = (
        model="llama",
        prompt="Please explain what a big language model is."
    )
    print(response)
except Exception as e:
    print(f"An error occurred:{e}")

6. Advanced Usage: Integrate with other tools

ollamacan be with other tools (such asFlaskFastAPI) is combined to build your own AI applications.

Example: Build a simple Flask service

The following code shows how to use Flask to build a simple web application and call Ollama to generate:

from flask import Flask, request, jsonify
import ollama

app = Flask(__name__)

@('/generate', methods=['POST'])
def generate():
    data = 
    prompt = ("prompt", "")
    try:
        # Call Ollama        response = (
            model="llama",
            prompt=prompt,
            max_tokens=100
        )
        return jsonify({"response": response})
    except Exception as e:
        return jsonify({"error": str(e)}), 500


if __name__ == '__main__':
    (debug=True)

Use Postman or other tools to/generateEndpoint sends POST request:

{
    "prompt": "What are the main advantages of Python?"
}

The return result will be the model-generated answer.

7. Things to note

  • Model compatibility: Ensure that the local installed model isollamaSupported formats are compatible.
  • Hardware requirements: Large language models usually require high hardware performance (especially GPU support). When calling the local model, make sure that your environment is sufficient to meet the computing needs.
  • Version update: Regular inspectionollamaUpdated version to get the latest features and optimizations.

8. Reference Documents

For more detailed usage and configuration options, please refer toollamaOfficial documents or related resources.

  • Official website document link (if any): Please searchollamaOfficial resources.
  • Community Support: Help can be sought through GitHub or through the developer community.

Summarize

This is the article about using the local large language model of Python calling the Ollama library. For more related content of Python calling the Ollama library, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!