ElasticSearch Python usage examples detailed explanation

Depend on download

pip install elasticsearch
# Douban Sourcepip install -i /simple/ elasticsearch

Connect elasticsearch

There are several connection methods for connecting elasticsearch:

from elasticsearch import  Elasticsearch
# es = Elasticsearch() # Default connection to local elasticsearch# es = Elasticsearch(['127.0.0.1:9200']) # Connect to the local 9200 portes = Elasticsearch(
    ["192.168.1.10", "192.168.1.11", "192.168.1.12"], # Connect to the cluster and store the IP addresses of each node in the form of a list    sniff_on_start=True,    # Test before connection    sniff_on_connection_fail=True,  # Refresh node when the node is unresponsive    sniff_timeout=60    # Set timeout)

Configure ignore response status code

es = Elasticsearch(['127.0.0.1:9200'],ignore=400)  # Ignore the returned 400 status codees = Elasticsearch(['127.0.0.1:9200'],ignore=[400, 405, 502])  # Ignore multiple status codes in the form of a list

Example

from elasticsearch import  Elasticsearch
es = Elasticsearch()    # The default connection to local elasticsearchprint((index='py2', doc_type='doc', id=1, body={'name': "Open", "age": 18}))
print((index='py2', doc_type='doc', id=1))

The 1st print is created py2Index, insert a piece of data, and the second print query specifies the document.
The query results are as follows:

from elasticsearch import Elasticsearch
es = Elasticsearch() # Default connection to local elasticsearch
print((index='py2', doc_type='doc', id=1, body={'name': "open", "age": 18}))
print((index='py2', doc_type='doc', id=1))

Operations of Elasticsearch for Python

Regarding elasticsearch operations in Python, the following are the following aspects:

Result filtering, filtering the returned results is mainly to optimize the return content.
ElasticSearch (es for short), directly manipulates elasticsearch objects and processes some simple index information. The following aspects are based on the es object.
Indices, detailed operations on indexes, such as creating custom mappings.
Cluster, related operations about clusters.
Nodes, related operations about nodes.
Cat API, in another query method, generally returns are of type json, cat provides concise return results.
Snapshot, snapshot related, is a backup taken from a running Elasticsearch cluster. We can take a snapshot of a single index or an entire cluster and store it in a repository of a shared file system, and there are some plugins that support remote repositories on S3, HDFS, Azure, Google Cloud Storage, and more.
Task Management API, Task Management API is new and should still be considered a beta feature. The API may be changed in a way that is not backward compatible.

Result filtering

print((index='py2', filter_path=['', '._source']))    # Type type can be omittedprint((index='w2', doc_type='doc'))        # You can specify type typeprint((index='w2', doc_type='doc', filter_path=['']))

filter_pathParameters are used to reduce the response returned by elasticsearch, such as returning onlyand._sourcecontent.
besides,filter_pathParameters are also supported*Wildcards to match field names, any field, or field parts:

print((index='py2', filter_path=['hits.*']))
print((index='py2', filter_path=['._*']))
print((index='py2', filter_path=['*']))  # Return only the total of the response dataprint((index='w2', doc_type='doc', filter_path=['._*']))        # Can add optional type type

ElasticSearch(es object)

, add or update the document to the specified index. If the index does not exist, the index will be created first, and then the addition or update operation will be performed.

# print((index='w2', doc_type='doc', id='4', body={"name":"Cocoa", "age": 18})) # Normal# print((index='w2', doc_type='doc', id=5, body={"name":"Kakashi", "age":22})) # Normal# print((index='w2', id=6, body={"name": "Naruto", "age": 22})) # An error will be reported, TypeError: index() missing 1 required positional argument: 'doc_type'print((index='w2', doc_type='doc', body={"name": "Naruto", "age": 22}))  # Can not be specifiedid，Generate a by defaultid

, query the specified document in the index.

print((index='w2', doc_type='doc', id=5))  # normalprint((index='w2', doc_type='doc'))  # TypeError: get() missing 1 required positional argument: 'id'
print((index='w2',  id=5))  # TypeError: get() missing 1 required positional argument: 'doc_type'

, execute a search query and get a search match that matches the query.This is the most used and can be combined with complex query conditions.

indexComma-separated list of index names to search for; use _all or an empty string to perform operations on all indexes.
doc_typeComma-separated list of document types to search for; leave blank to perform actions on all types.
bodySearch definitions using Query DSL (QueryDomain Specific Language Query Expression).
_sourcereturn_sourcetrue or false of the field, or returned field list, returns the specified field.
_source_excludeTo return from_sourceThe field list excluded in the field, which fields are excluded from all fields returned.
_source_includefrom_sourceThe field list extracted and returned in the field, and_sourcealmost.

print((index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}}))  # General inquiryprint((index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source=['name', 'age']))  # Result Field Filteringprint((index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_exclude  =[ 'age']))
print((index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_include =[ 'age']))

es.get_source, obtains the source of the document through index, type and ID, and in fact, directly returns the desired dictionary.

print(es.get_source(index='py3', doc_type='doc', id='1'))  # {'name': 'Wang Wu', 'age': 19}

, execute the query and get the number of matches for the query. For example, query documents with age of 18.

body = {
    "query": {
        "match": {
            "age": 18
        }
    }
}
print((index='py2', doc_type='doc', body=body))  # {'count': 1, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
print((index='py2', doc_type='doc', body=body)['count'])  # 1
print((index='w2'))  # {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
print((index='w2', doc_type='doc'))  # {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}

, create an index (if the index does not exist) and add a new data. Only new index exists (only new ones, repeated execution will result in an error).

print((index='py3', doc_type='doc', id='1', body={"name": 'Wang Wu', "age": 20}))
print((index='py3', doc_type='doc', id='3'))

Internally, index is called, equivalent to:

print((index='py3', doc_type='doc', id='4', body={"name": "pock", "age": 21}))

But I personally think it's easy to use without index!

, delete the specified document. For example, delete the article id as4The document cannot be deleted. If you want to delete the index, you still need to deal with it.

print((index='py3', doc_type='doc', id='4', body={"name": "pock", "age": 21}))

es.delete_by_query, delete all documents that match the query.

indexComma-separated list of index names to search for; use _all or an empty string to perform operations on all indexes.
doc_typeComma-separated list of document types to search for; leave blank to perform actions on all types.
bodySearch definitions using Query DSL.

print(es.delete_by_query(index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}}))

, query whether the specified document exists in elasticsearch, and returns a boolean value.

print((index='py3', doc_type='doc', id='1'))

, obtain basic information about the current cluster.

print(())

, Return True if the cluster is started, otherwise False.

print(())

Indices（）

, create indexes in Elasticsearch, and use the most.For example, create a strict pattern, have 4 fields, andtitleField Specificationik_max_wordQuery granularitymappings. and apply topy4In index. This is also a common way to create custom indexes.

body = {
    "mappings": {
        "doc": {
            "dynamic": "strict",
            "properties": {
                "title": {
                    "type": "text",
                    "analyzer": "ik_max_word"
                },
                "url": {
                    "type": "text"
                },
                "action_type": {
                    "type": "text"
                },
                "content": {
                    "type": "text"
                }
            }
        }
    }
}
('py4', body=body)

, return the word participle result.

(body={'analyzer': "ik_max_word", "text": "Peter and Julie were elected "Model Couple of the Year" Brad Pitt and Angelina Jolie"})

, remove the index in Elasticsearch.

print((index='py4'))
print((index='w3'))    # {'acknowledged': True}

.put_alias, create an alias for one or more indexes. This alias can be used when querying multiple indexes.

indexA comma-separated list of index names that the alias should point to (wildcards are supported), and use _all to perform operations on all indexes.
nameThe name of the alias to be created or updated.
bodySettings of aliases, such as routing or filters.

print(.put_alias(index='py4', name='py4_alias'))  # Create an alias for a single indexprint(.put_alias(index=['py3', 'py2'], name='py23_alias'))  # Create the same alias for multiple indexes，Joint search

.delete_alias, delete one or more aliases.

print(.delete_alias(index='alias1'))
print(.delete_alias(index=['alias1, alias2']))

.get_mapping, retrieves the mapping definition of index or index/type.

print(.get_mapping(index='py4'))

.get_settings, retrieves settings for one or more (or all) indexes.

print(.get_settings(index='py4'))

, allows retrieval of information about one or more indexes.

print((index='py2'))    # Query whether the specified index existsprint((index=['py2', 'py3']))

.get_alias, retrieves one or more aliases.

print(.get_alias(index='py2'))
print(.get_alias(index=['py2', 'py3']))

.get_field_mapping, retrieves mapping information for specific fields.

print(.get_field_mapping(fields='url', index='py4', doc_type='doc'))
print(.get_field_mapping(fields=['url', 'title'], index='py4', doc_type='doc'))

.delete_alias, delete a specific alias.
, returns a boolean value indicating whether the given index exists.
.exists_type, check whether there is a type/type in the index/index.
, explicitly refresh one or more indexes.
.get_field_mapping, retrieves the mapping of a specific field.
.get_template, retrieve index templates by name.
, open a closed index to make it available for search.
, close the index to remove it from the cluster overhead. The closed index is blocked from reading/write operations.
.clear_cache, clears all caches or specific caches associated with one or more indexes.
.put_alias, create an alias for a specific index/index.
.get_uprade, monitors the level of upgrade of one or more indexes.
.put_mapping, registers specific mapping definitions for specific types.
.put_settings, change specific index level settings in real time.
.put_template, creates an index template that will be automatically applied to the created new index.
, When an existing index is considered too large or too old, the Flip Index API transfers the alias to the new index. The API accepts single alias and list of conditions. The alias must point to only a single index. If the index meets the specified criteria, a new index is created and the alias is to point to the new alias.
, provides low-level segment information for building Lucene indexes (shard level).

Cluster (cluster related)

.get_settigs, get cluster settings.

print(.get_settings())

, get a very simple state about the health of the cluster.

print(())

, obtain the comprehensive status information of the entire cluster.

print(())

, return information about the current node of the cluster.

print(())

Node (node related)

, return information about nodes in the cluster.

print(())  # Return to the nodeprint((node_id='node1'))   # Specify a nodeprint((node_id=['node1', 'node2']))   # Specify a list of multiple nodes

, obtain statistical information of nodes in the cluster.

print(())
print((node_id='node1'))
print((node_id=['node1', 'node2']))

.hot_threads, obtains thread information of the specified node.

print(())
print((node_id='node1'))
print((node_id=['node1', 'node2']))

, obtain the function usage information of nodes in the cluster.

print(())
print((node_id='node1'))
print((node_id=['node1', 'node2']))

Cat (a query method)

, return the alias information.
- nameComma-separated aliases to return.
- formatA short version of Accept header, such as json, yaml

print((name='py23_alias'))
print((name='py23_alias', format='json'))

, return the usage of shards.

print(())
print((node_id=['node1']))
print((node_id=['node1', 'node2'], format='json'))

, Count provides quick access to document counts for an entire cluster or a single index.

print(())
print((node_id=['node1']))
print((node_id=['node1', 'node2'], format='json'))

, displays information about the currently loaded fielddata based on each node.Some data are placed in memory for query efficiency. Fielddata is used to control which data should be placed in memory. ThisThen query which data is in memory, data size and other information.

print(())
print((format='json', bytes='b'))

bytesDisplay the units of byte values, the valid options are:'b'，'k'，'kb'，'m'，'mb'，'g'，'gb'，'t'，'tb' ，'p'，'pb'
formatA short version of Accept header, such as json, yaml

, from the clusterhealthThe simple cluster health information is filtered out.

print(())
print((format='json'))

**,returnHelp information.

print(())

, return the index information; you can also use this command to query how many indexes there are in the cluster.

print(())
print((index='py3'))
print((index='py3', format='json'))
print(len((format='json')))  # Query how many indexes are in the cluster

, return the IP of the primary node in the cluster, bind the IP and node name.

print(())
print((format='json'))

, return the node's custom properties.

print(())
print((format='json'))

,Return the topology of the node, and this information is often useful when viewing the entire cluster, especially for large clusters.How many nodes do I have that meet the criteria?

print(())
print((format='json'))

, return the plug-in information of the node.

print(())
print((format='json'))

, return information about Lucene for each index.

print(())
print((index='py3'))
print((index='py3', format='json'))

, return information about which node contains which shards.

print(())
print((index='py3'))
print((index='py3', format='json'))

.thread_pool, get information about thread pool.

print(.thread_pool())

Snapshot (snapshot related)

, create a snapshot in the repository.
- repository name of the repository.
- snapshot snapshot name.
- body snapshot definition.
, delete the snapshot from the repository.
.create_repository. Register a shared file system repository.
.delete_repository, delete the shared file system repository.
, search for information about snapshots.
.get_repository, returns information about the registered repository.
, restore the snapshot.
, return information about all currently running snapshots. By specifying the repository name, you can restrict the results to a specific repository.
.verify_repository, returns the list of nodes that successfully validate the repository, and if the verification process fails, an error message is returned.

Task (task-related)

, retrieve information about a specific task.
, cancel the task.
, task list.

This is the article about ElasticSearch Python usage. For more information about ElasticSearch Python usage, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!