introduction
Elasticsearch is a powerful distributed search engine, widely used in log analysis, real-time search and big data analysis scenarios. It supports fast text retrieval, large data storage and real-time data analysis. Python provides an official Elasticsearch client library to facilitate our interaction with Elasticsearch.
This article will introduce in detail how to connect and operate Elasticsearch using Python, including installing clients, basic operations (such as creating indexes, adding data, querying data, etc.), and advanced applications (such as aggregation queries, index mapping, etc.).
1. Environmental preparation
1.1 Install Elasticsearch
Before you start, you need to make sure that Elasticsearch is installed and run. If it has not been installed, you can refer to the following steps to install:
Install Elasticsearch using Docker:
docker pull /elasticsearch/elasticsearch:7.10.0 docker run --name elasticsearch -d -p 9200:9200 -p 9300:9300 elasticsearch:7.10.0
In this way, Elasticsearch will be startedlocalhost:9200
Port.
Use the official installation package:
You can alsoElasticsearch official websiteDownload and install.
1.2 Installing the Python Elasticsearch client
Python client that installs Elasticsearchelasticsearch
, it is the official library for interacting with Elasticsearch.
pip install elasticsearch
2. Connect to Elasticsearch
2.1 Connect to the local Elasticsearch service
from elasticsearch import Elasticsearch # Connect to local Elasticsearch instanceses = Elasticsearch([{'host': 'localhost', 'port': 9200}]) # Check if the connection is successfulif (): print("The connection was successful!") else: print("Connection failed!")
2.2 Connecting to the remote Elasticsearch service
If your Elasticsearch service is on a remote server, you can modify the connection configuration:
es = Elasticsearch([{'host': 'Remote IP Address', 'port': 9200}]) # Check the connectionif (): print("The connection was successful!") else: print("Connection failed!")
3. Create indexes and mappings
In Elasticsearch, all data is stored in an index (Index), and the index has its own structure. Mapping is the definition of fields in the index.
3.1 Creating an index
# Create an indexindex_name = "my_index" response = (index=index_name, ignore=400) # ignore 400 The error is because the index already existsprint(response)
3.2 Create an index with a map
If you want to define the field type when creating an index, you can specify the mapping. Here is an example containing a mapping:
mapping = { "mappings": { "properties": { "name": {"type": "text"}, "age": {"type": "integer"}, "timestamp": {"type": "date"} } } } response = (index="my_index_with_mapping", body=mapping, ignore=400) print(response)
4. Add data to Elasticsearch
Adding data to Elasticsearch can be done throughindex
The operation is completed and the data will be inserted as a document.
4.1 Single data insertion
document = { "name": "John Doe", "age": 29, "timestamp": "2024-12-24T10:00:00" } # Insert data into indexresponse = (index="my_index", document=document) print(response)
4.2 Batch insertion of data
If you want to insert multiple pieces of data in batches, you can usebulk
API。
from import bulk # Insert data in batchesactions = [ { "_op_type": "index", # Operation type, can be index, update, delete "_index": "my_index", "_source": { "name": "Alice", "age": 30, "timestamp": "2024-12-24T12:00:00" } }, { "_op_type": "index", "_index": "my_index", "_source": { "name": "Bob", "age": 35, "timestamp": "2024-12-24T12:05:00" } } ] # Perform batch insertionsuccess, failed = bulk(es, actions) print(f"Successfully inserted {success} strip,fail {failed} strip")
5. Query data
Elasticsearch provides powerful query functions, including basic matching queries, boolean queries, range queries, etc.
5.1 Basic Query
passsearch
API, can perform simple queries. For example, querymy_index
All documents in the index.
response = (index="my_index", body={ "query": { "match_all": {} # Query all documents } }) print(response)
5.2 Exact match query
response = (index="my_index", body={ "query": { "match": { "name": "John Doe" # Find documents with name field "John Doe" } } }) print(response)
5.3 Boolean query
Boolean query allows you to conduct complex queries in combination with multiple conditions.
response = (index="my_index", body={ "query": { "bool": { "must": [ {"match": {"name": "Alice"}}, {"range": {"age": {"gte": 25}}} ], "filter": [ {"term": {"timestamp": "2024-12-24T12:00:00"}} ] } } }) print(response)
5.4 Scope Query
passrange
You can query the range data of a certain field, such as looking for users older than 30.
response = (index="my_index", body={ "query": { "range": { "age": { "gte": 30 } } } }) print(response)
6. Update and delete data
6.1 Update data
When updating a document, you can useupdate
Operation, only the specified fields are updated.
document_id = "1" # Assume this is the ID we want to update the document update_doc = { "doc": { "age": 31 } } response = (index="my_index", id=document_id, body=update_doc) print(response)
6.2 Delete data
passdelete
Operation to delete the document.
document_id = "1" # Assume this is the ID we want to delete the documentresponse = (index="my_index", id=document_id) print(response)
7. Aggregation Query
Elasticsearch supports powerful aggregation functions and can be used for data analysis, such as counting the average, maximum, minimum, etc. of a certain field.
7.1 Aggregation query example
response = (index="my_index", body={ "size": 0, # Do not return the document, only the aggregate result "aggs": { "average_age": { "avg": { "field": "age" } }, "age_range": { "range": { "field": "age", "ranges": [ {"to": 30}, {"from": 30, "to": 40}, {"from": 40} ] } } } }) # Print the aggregate resultsprint(response['aggregations'])
8. Delete the index
If an index is no longer needed, it can be deleted.
response = (index="my_index", ignore=[400, 404]) print(response)
9. Advanced Applications
9.1 Index alias
In Elasticsearch, an alias is a name that points to one or more indexes and can be used to simplify queries or to not change application code when indexes are upgraded.
# Create an index aliasresponse = .put_alias(index="my_index", name="my_index_alias") print(response) # Query with aliasresponse = (index="my_index_alias", body={ "query": { "match_all": {} } }) print(response)
9.2 Index template
Index templates are used to automatically apply settings for newly created indexes (such as mappings, number of shards, etc.).
template = { "index_patterns": ["log-*"], # Match all indexes starting with log- "mappings": { "properties": { "timestamp": {"type": "date"}, "log_level": {"type": "keyword"} } } } response = .put_template(name="log_template", body=template) print(response)
Summarize
Through this article, you have mastered how to connect and operate Elasticsearch using Python, including basic operations (such as creating indexes, adding data, querying data, etc.) and some advanced features (such as aggregation queries, index templates, and alias, etc.). Elasticsearch is a very powerful tool that can help you quickly process and analyze large-scale data. Hope this guide will be helpful to you in actual development!
The above is the detailed content of the process steps for connecting Python and operating Elasticsearch. For more information about connecting Python and operating Elasticsearch, please follow my other related articles!