Steps to connect and operate Elasticsearch in Python

introduction

Elasticsearch is a powerful distributed search engine, widely used in log analysis, real-time search and big data analysis scenarios. It supports fast text retrieval, large data storage and real-time data analysis. Python provides an official Elasticsearch client library to facilitate our interaction with Elasticsearch.

This article will introduce in detail how to connect and operate Elasticsearch using Python, including installing clients, basic operations (such as creating indexes, adding data, querying data, etc.), and advanced applications (such as aggregation queries, index mapping, etc.).

1. Environmental preparation

1.1 Install Elasticsearch

Before you start, you need to make sure that Elasticsearch is installed and run. If it has not been installed, you can refer to the following steps to install:

Install Elasticsearch using Docker:

docker pull /elasticsearch/elasticsearch:7.10.0
docker run --name elasticsearch -d -p 9200:9200 -p 9300:9300 elasticsearch:7.10.0

In this way, Elasticsearch will be startedlocalhost:9200Port.

Use the official installation package:

You can alsoElasticsearch official websiteDownload and install.

1.2 Installing the Python Elasticsearch client

Python client that installs Elasticsearchelasticsearch, it is the official library for interacting with Elasticsearch.

pip install elasticsearch

2. Connect to Elasticsearch

2.1 Connect to the local Elasticsearch service

from elasticsearch import Elasticsearch

# Connect to local Elasticsearch instanceses = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# Check if the connection is successfulif ():
    print("The connection was successful!")
else:
    print("Connection failed!")

2.2 Connecting to the remote Elasticsearch service

If your Elasticsearch service is on a remote server, you can modify the connection configuration:

es = Elasticsearch([{'host': 'Remote IP Address', 'port': 9200}])

# Check the connectionif ():
    print("The connection was successful!")
else:
    print("Connection failed!")

3. Create indexes and mappings

In Elasticsearch, all data is stored in an index (Index), and the index has its own structure. Mapping is the definition of fields in the index.

3.1 Creating an index

# Create an indexindex_name = "my_index"
response = (index=index_name, ignore=400)  # ignore 400 The error is because the index already existsprint(response)

3.2 Create an index with a map

If you want to define the field type when creating an index, you can specify the mapping. Here is an example containing a mapping:

mapping = {
    "mappings": {
        "properties": {
            "name": {"type": "text"},
            "age": {"type": "integer"},
            "timestamp": {"type": "date"}
        }
    }
}

response = (index="my_index_with_mapping", body=mapping, ignore=400)
print(response)

4. Add data to Elasticsearch

Adding data to Elasticsearch can be done throughindexThe operation is completed and the data will be inserted as a document.

4.1 Single data insertion

document = {
    "name": "John Doe",
    "age": 29,
    "timestamp": "2024-12-24T10:00:00"
}

# Insert data into indexresponse = (index="my_index", document=document)
print(response)

4.2 Batch insertion of data

If you want to insert multiple pieces of data in batches, you can usebulk API。

from  import bulk

# Insert data in batchesactions = [
    {
        "_op_type": "index",  # Operation type, can be index, update, delete        "_index": "my_index",
        "_source": {
            "name": "Alice",
            "age": 30,
            "timestamp": "2024-12-24T12:00:00"
        }
    },
    {
        "_op_type": "index",
        "_index": "my_index",
        "_source": {
            "name": "Bob",
            "age": 35,
            "timestamp": "2024-12-24T12:05:00"
        }
    }
]

# Perform batch insertionsuccess, failed = bulk(es, actions)
print(f"Successfully inserted {success} strip，fail {failed} strip")

5. Query data

Elasticsearch provides powerful query functions, including basic matching queries, boolean queries, range queries, etc.

5.1 Basic Query

passsearchAPI, can perform simple queries. For example, querymy_indexAll documents in the index.

response = (index="my_index", body={
    "query": {
        "match_all": {}  # Query all documents    }
})
print(response)

5.2 Exact match query

response = (index="my_index", body={
    "query": {
        "match": {
            "name": "John Doe"  # Find documents with name field "John Doe"        }
    }
})
print(response)

5.3 Boolean query

Boolean query allows you to conduct complex queries in combination with multiple conditions.

response = (index="my_index", body={
    "query": {
        "bool": {
            "must": [
                {"match": {"name": "Alice"}},
                {"range": {"age": {"gte": 25}}}
            ],
            "filter": [
                {"term": {"timestamp": "2024-12-24T12:00:00"}}
            ]
        }
    }
})
print(response)

5.4 Scope Query

passrangeYou can query the range data of a certain field, such as looking for users older than 30.

response = (index="my_index", body={
    "query": {
        "range": {
            "age": {
                "gte": 30
            }
        }
    }
})
print(response)

6. Update and delete data

6.1 Update data

When updating a document, you can useupdateOperation, only the specified fields are updated.

document_id = "1"  # Assume this is the ID we want to update the document
update_doc = {
    "doc": {
        "age": 31
    }
}

response = (index="my_index", id=document_id, body=update_doc)
print(response)

6.2 Delete data

passdeleteOperation to delete the document.

document_id = "1"  # Assume this is the ID we want to delete the documentresponse = (index="my_index", id=document_id)
print(response)

7. Aggregation Query

Elasticsearch supports powerful aggregation functions and can be used for data analysis, such as counting the average, maximum, minimum, etc. of a certain field.

7.1 Aggregation query example

response = (index="my_index", body={
    "size": 0,  # Do not return the document, only the aggregate result    "aggs": {
        "average_age": {
            "avg": {
                "field": "age"
            }
        },
        "age_range": {
            "range": {
                "field": "age",
                "ranges": [
                    {"to": 30},
                    {"from": 30, "to": 40},
                    {"from": 40}
                ]
            }
        }
    }
})

# Print the aggregate resultsprint(response['aggregations'])

8. Delete the index

If an index is no longer needed, it can be deleted.

response = (index="my_index", ignore=[400, 404])
print(response)

9. Advanced Applications

9.1 Index alias

In Elasticsearch, an alias is a name that points to one or more indexes and can be used to simplify queries or to not change application code when indexes are upgraded.

# Create an index aliasresponse = .put_alias(index="my_index", name="my_index_alias")
print(response)

# Query with aliasresponse = (index="my_index_alias", body={
    "query": {
        "match_all": {}
    }
})
print(response)

9.2 Index template

Index templates are used to automatically apply settings for newly created indexes (such as mappings, number of shards, etc.).

template = {
    "index_patterns": ["log-*"],  # Match all indexes starting with log-    "mappings": {
        "properties": {
            "timestamp": {"type": "date"},
            "log_level": {"type": "keyword"}
        }
    }
}

response = .put_template(name="log_template", body=template)
print(response)

Summarize

Through this article, you have mastered how to connect and operate Elasticsearch using Python, including basic operations (such as creating indexes, adding data, querying data, etc.) and some advanced features (such as aggregation queries, index templates, and alias, etc.). Elasticsearch is a very powerful tool that can help you quickly process and analyze large-scale data. Hope this guide will be helpful to you in actual development!

The above is the detailed content of the process steps for connecting Python and operating Elasticsearch. For more information about connecting Python and operating Elasticsearch, please follow my other related articles!