What is Mapping
Similarly, let's talk about the basic concept first, what ismapping
, In the previous section, I gave you a brief example. Do you still have an impression?mapping
It is a relatively important concept in es. If you don’t understand this, the following learning will be very difficult.
We say that in relational databases, for examplemysql
, When creating a table, the fields will be declared and some descriptions are given to the fields, such as field names, types, etc.
existEs
The same thing, we talked about it beforeIndex
I mentioned to you that when creating a document, you also need to make field declarations for the document, including a series of field types, whether word segmentation, whether indexing, etc.property
andrule
, and what we are going to talk about todaymapping
It provides this function.
es
Provided inDynamic mapping
andStatic mapping
There are two ways to declare fields of a document in a type.
Here is a detailed explanation for you~
Mapping Properties
The mapping property can be understood as atype
, the attributes that each field has. Here are some commonly used attributes:
type
: Field type, commonly used are text, integer, etc.
index
: defaulttrue
, whether the field is establishedInverted index
, in case of false, cannot be searched, but supportsAggregation Analysis
enable
: Default true, whether the field is indexed inverted anddoc value
, in case of false, cannot be searched and aggregated, but saves memory space
store
: Default false, whether the fields are stored extra. If the fields that need to be queried are just small data in the document, these fields can store and reduce IO. And this storage is unique_source
storage.
doc_values
: Default true, optimized field sorting and aggregate script access, consumes disk space
fields
: Multi-field feature. Make a field have multiple subfield types so that a field can be indexed by multiple different indexing methods.
norms
: Default is true, whether rating is supported. If the field is only used for filtering and aggregation analysis and does not need to be scored, then it can be set to false; when type is text, the default is true;type
forkeyword
When , the default is false.
analyzer
:Specifyindex
and searchAnalyzer
, if specified at the same timesearch_analyzer
Then search_analyzer will be used first when searching.
search_analyzer
: Specify the analyzer during search, the highest priority is required during search.
fielddata
: Default false, optimized for text type sorting, aggregation, and script access, and avoid it as much as possible, and expensive operation
index_options
: Used to set the information contained in the reverse (index) list, which is used for search (Search) and highlighting:
- docs: index only the document number (Doc Number);
- freqs: index document number and term frequency;
- positions: index document number, word frequency and word position (serial number);
- offsets: index document number, word frequency, word offset (start and end positions) and word position (serial number).
By default, the analyzed string field uses positions, and other fields use docs by default.
In addition, it is important to note thatindex_option
yeselasticsearch
Unique setting properties; when searching and phrase querying,index_option
Must be set tooffsets
, and can also be used with highlightingpostings highlighter
. The more records are recorded, the more storage space is occupied.
Detailed explanation of field type (type)
Main core data types:
String type
- text
Can do word segmentation. Type istext
The fields can be passedFull text search
Searched. If a field is to be searched for full text, it should be usedtext
type. After setting the text type, the field content will be analyzed. Before generating the inverted index, the string will be divided into terms by the word segmenter. Field of type textNot used for sorting, and is rarely used for aggregation (except Terms Aggregation).
- keyword
Can't do word segmentation. Fields of type keyword can only be passedExact value
Searched. Types are suitable forIndex structure
fields, usually used forFilter, sort, aggregation。
***PS: The field types after Elasticsearch no longer support string and are replaced by text or keyword. If string is still used, a warning will be given. ***
Number Type
byte
: Value range -128 to 127
short
Value range -32768 to 32767
integer
Value range -2^31 to 2^31-1
long
Value range -2^63 to 2^63-1
double
Value range 64-bit double precision IEEE 754 floating point type
float
Value range 32-bit single precision IEEE 754 floating point type
half_float
Value range 16-bit half-precision IEEE 754 floating point type
scaled_float
Value range Scale type floating point number
It should be noted here that for numeric fields, when meeting the needs, select data types with a small range as much as possible, just like a person's age, at most three digits.short
It's perfectly competent,The shorter the length of the field, the more efficient the index and search
For floating point numbers, priority is givenscaled_float
type. scaled_float is throughScaling factor
Turn floating point numbers intolong
Types, for example, the price only needs to be accurate to the point. The value of the price field is 57.34, the amplification factor is 100, and it is 5734 when stored. All APIs will treat the value of price as a floating point number. In fact, Elasticsearch stores the integer type at the bottom, because compressing integers saves more storage space than compressing floating point numbers.
Date Type
- date
existES
The dates in this can be in the following forms:
- Format the string of dates, such as "2015-01-01" or "2015/01/01 12:10:30"
- Millisecond timestamp
- Seconds time stamp
It should be noted thatES
The date is internally converted to UTC (United States Standard Time) and stored as a millisecond timestamp, the reason for this is that values are faster when stored and processed than strings.
Boolean type
- boolean
If a field isBoolean type
, the acceptable value istrue、false
。
***Before Elasticsearch version 5.4, strings and numbers interpreted as true or false can be accepted. After version 5.4, only true, false, "true", "false" can be accepted. ***
Binary type
- binary
binary type data format isbase64 encoded string
,defaultNo extra storage, no searchability。
Range Type
range
- integer_range -2^31 to 2^31-1
- long_range -2^63 to 2^63-1
- float_range 32-bit IEEE 754
- double_range 64-bit IEEE 754
- date_range 64-bit integer, millisecond timing
Composite data type
Maybe you haven't heard of this, it is mainly divided intoArray type
, Object Type
, Nested Types
Array type
ES
There is no dedicated array type in it. By default, any field can contain 0 or more values, but the values in an array must be of the same type.
- Integer array: [1,3]
- Nested array: [1,[2,3]], equivalent to [1,2,3]
- Object array: [{"name": "lili", "age": "18"}, {"name": "liming", "age": "20"} ]
When adding data dynamically, the type of the first value of the array determines the type of the entire array.Mixed array types are not supported, for example:[1,”abc”]。
Arrays can containnull
Value, empty array[ ]
Will be treated asmissing field
treat.
Using array type in the document does not require any configuration in advance, and it is supported by default.
Object Type
The object type is easy to understand, that is, JSON objects. It should be noted that nested JSON objects will be made in es.Flattening
, How do you understand it?
For example, the following data:
{ "name":"lili", "friend":[ "name":"xiaohong" ] }
This becomes like this when storing:
{ "name":"lili", "":"xiaohong" }
Nested Types
Nested types are special object types, and es itself willObject Type
Fields madeFlat place
Therefore, when the stored object type is an object array, the association relationship will be invalid.
{ "name":"lili", "friends":[ { "name":"xiaohong", "age": 18 }, { "name":"xiaoming", "age": 20 } ] }
After processing:
{ "name":"lili", "":["xiaohong","xiaoming"], "":[18, 20] }
It can be seen that the direct correlation of data is gone
If you need to index the object array and avoid the above problems, you should usenested
object type instead of object type,nested
Object types can maintain the independence of each object in the array. The Nested type indexes each object in the array as an independent hidden document, which means that each nested object can be searched independently.
To prevent over-defined number of nested fields, the number of nested fields that can be defined per index is limited to 50
Geographic data types
Geographically related types include geographic coordinate type (geo_point) and geographic graph type (geo_shape)
Geographic coordinates (geo_point) type
geo point
Type is used to store latitude and longitude of geolocation information, and can be used in the following scenarios:
- Find geographical locations within a certain range.
- Aggregate documents by geographical location or distance relative to the center point.
- Integrate distance factors into the ratings of the document.
- Sort documents by distance.
- Storage format: Latitude and longitude
JSON
Format:{“lat”:41.12,”lon”:-71.34}
- Latitude and longitude
String
Format:"41.12,-71.34"
- Geographic coordinate hash value:
"u1269qu5dcgp"
- Latitude and longitude
Array
Form: [41.12,-71.34]
Geo-graphics (geo_shape) type
I won't introduce this to you, it's not commonly used, its storage form isgeoJson
, as follows:
{ "type":"Point", "coordinates":[ 100, 0 ] }
Special types
IP type (ip)
Fields of type ip are used for storageIPv4
orIPv6
address. Such as "192.168.1.1" or "192.168.0.0/16"
Token count type (token_count)
token_count
It is used to count the number of entries after the text participle, which is essentially an integer field. For example, in the map, name is specified as text type, and the added field is used to count the length of the word item after the word segmentation. The type is token_count, and the word segmentation is a standard word segmentation. The command is as follows:
{ "mappings": { "my_type": { "properties": { "name": { "type": "text", "fields": { "length": { "type": "token_count", "analyzer": "standard" } } } } } } }
Conclusion
This section mainly talks aboutMapping in ES
The concept and its basic properties are introduced to you.Core data types
,This section has a little more theories, so everyone needs to digest it. It is easy to forget it just by looking at it, and the impression of it will be deeper when practicing it. The correct usage type can help us solve problems better and save server costs. There are some things left in this section, and I will tell you what isDynamic mapping
, For more information about ElasticSearch ES Mapping, please follow my other related articles!