SoFunction
Updated on 2025-04-13

Python calls AnythingLLM API to implement stream output

Streaming is a common requirement when calling the AnythingLLM API using Python, especially when dealing with long text or real-time interactive scenarios. Streaming output allows you to receive response content step by step, rather than waiting for the entire response to complete before processing.

Here are detailed steps and code examples of how to implement streaming output:

1. The basic principle of streaming output

Streaming output allows the client to receive response data step by step, rather than receive a full response at once. This is useful when dealing with long text output from large models, reducing latency and improving user experience.

  • In HTTP requests, streaming output is usually set bystream=TrueStandards are implemented.
  • In AnythingLLM's API calls, you need to make sure that the API supports streaming responses (usually throughContent-Type: text/event-streamor similar mechanisms are implemented).

2. Code implementation

Here is a complete Python example showing how to use itrequestsThe library implements streaming output:

import requests
import json

def ask_anythingllm(question, workspace_name, api_key):
    url = f"http://ip:port/api/v1/workspace/{workspace_name}/stream-chat" #Note that there is an interface in the API without a stream, which will return at once    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "accept": "text/event-stream"
    }
    data = {
        "message": question,
        "mode": "query",  # optional chat/query mode        'max_tokens': 1024,  # Control the length of generated text        "stream":True
    }
    with (url, headers=headers, json=data, stream=True) as response:
        if response.status_code == 200:
            # Read streaming response line by line            for chunk in response.iter_lines():
                if chunk:  # Make sure the data block is not empty                    print(("utf-8"))

# Sample callapi_key = "WQ59GRH-1JC4M3R-MS2NN3X-VBQCY7H"  # Replace with your own apikeyworkspace = "8ceb3fb1-4e75-40fe-87db-570d5a689113"
question = "What does the Three Character Classic talk about? It is summarized in 50 words"
ask_anythingllm(question, workspace, api_key)

3. Get the workspace_name method

You can use this method to get the workspace_namel list, and of course you can also find it directly in AnythingLLM

import requests

headers = {
    "Authorization": "Bearer WQ59GRH-1JC4M3R-MS2NN3X-VBQCY7H",  # Replace with your API Key    "accept": "application/json"
}

response = ("http://ip:3001/api/v1/workspaces", headers=headers)
if response.status_code == 200:
    print("List of existing workspaces:", ())
else:
    print(f"Failed to obtain,Status code:{response.status_code},error message:{}")

4. Key points description

stream=TrueParameters:

  • exist()Settingsstream=True, this allows you to read the response content line by line, rather than receiving the full response at once.
  • useresponse.iter_lines()Methods read streaming response line by line.

Response header settings:

  • Make sure the request header containsaccept: text/event-stream, which tells the server that you expect a streaming response.
  • The server needs to support streaming response (such as returnContent-Type: text/event-stream)。

Line by line processing response:

  • Streaming responses are usually returned as rows, each row may be a complete event or part of data.
  • You can parse or process each line as needed.

5. Debugging and precautions

  • Check server support: Make sure that AnythingLLM's API supports streaming response. If the server does not support it, you may need to contact the developer or check the documentation.
  • Error handling: In practical applications, it is recommended to add more detailed error handling logic, such as retry mechanism or timeout handling.
  • Performance optimization: Streaming output reduces latency, but also increases the complexity of network interaction. Ensure the network environment is stable to avoid interruptions.

6. Extended application

Streaming output is ideal for the following scenarios:

  • Real-time interactive question and answer system.
  • Handle long text generation tasks (such as articles, stories, etc.).
  • Provide step-by-step user feedback to enhance user experience.

Through the above method, you can easily implement the streaming output function of Python calling AnythingLLM API.

This is the end of this article about Python calling AnythingLLM API to use stream output. For more related Python calling AnythingLLM API to stream output, please search for my previous article or continue browsing the related articles below. I hope everyone will support me in the future!