ElasticSearch Learning ES Mapping Practical Example

What is Mapping

Similarly, let's talk about the basic concept first, what ismapping, In the previous section, I gave you a brief example. Do you still have an impression?mappingIt is a relatively important concept in es. If you don’t understand this, the following learning will be very difficult.

We say that in relational databases, for examplemysql, When creating a table, the fields will be declared and some descriptions are given to the fields, such as field names, types, etc.

existEsThe same thing, we talked about it beforeIndexI mentioned to you that when creating a document, you also need to make field declarations for the document, including a series of field types, whether word segmentation, whether indexing, etc.propertyandrule, and what we are going to talk about todaymappingIt provides this function.

esProvided inDynamic mappingandStatic mappingThere are two ways to declare fields of a document in a type.

Here is a detailed explanation for you~

Mapping Properties

The mapping property can be understood as atype, the attributes that each field has. Here are some commonly used attributes:

type: Field type, commonly used are text, integer, etc.

index: defaulttrue, whether the field is establishedInverted index, in case of false, cannot be searched, but supportsAggregation Analysis

enable: Default true, whether the field is indexed inverted anddoc value, in case of false, cannot be searched and aggregated, but saves memory space

store: Default false, whether the fields are stored extra. If the fields that need to be queried are just small data in the document, these fields can store and reduce IO. And this storage is unique_sourcestorage.

doc_values: Default true, optimized field sorting and aggregate script access, consumes disk space

fields: Multi-field feature. Make a field have multiple subfield types so that a field can be indexed by multiple different indexing methods.

norms: Default is true, whether rating is supported. If the field is only used for filtering and aggregation analysis and does not need to be scored, then it can be set to false; when type is text, the default is true;typeforkeywordWhen , the default is false.

analyzer:Specifyindexand searchAnalyzer, if specified at the same timesearch_analyzerThen search_analyzer will be used first when searching.

search_analyzer: Specify the analyzer during search, the highest priority is required during search.

fielddata: Default false, optimized for text type sorting, aggregation, and script access, and avoid it as much as possible, and expensive operation

index_options: Used to set the information contained in the reverse (index) list, which is used for search (Search) and highlighting:

docs: index only the document number (Doc Number);
freqs: index document number and term frequency;
positions: index document number, word frequency and word position (serial number);
offsets: index document number, word frequency, word offset (start and end positions) and word position (serial number).

By default, the analyzed string field uses positions, and other fields use docs by default.

In addition, it is important to note thatindex_optionyeselasticsearchUnique setting properties; when searching and phrase querying,index_optionMust be set tooffsets, and can also be used with highlightingpostings highlighter. The more records are recorded, the more storage space is occupied.

Detailed explanation of field type (type)

Main core data types:

String type

text

Can do word segmentation. Type istextThe fields can be passedFull text searchSearched. If a field is to be searched for full text, it should be usedtexttype. After setting the text type, the field content will be analyzed. Before generating the inverted index, the string will be divided into terms by the word segmenter. Field of type textNot used for sorting, and is rarely used for aggregation (except Terms Aggregation).

keyword

Can't do word segmentation. Fields of type keyword can only be passedExact valueSearched. Types are suitable forIndex structurefields, usually used forFilter, sort, aggregation。

***PS: The field types after Elasticsearch no longer support string and are replaced by text or keyword. If string is still used, a warning will be given. ***

Number Type

byte: Value range -128 to 127

shortValue range -32768 to 32767

integerValue range -2^31 to 2^31-1

longValue range -2^63 to 2^63-1

doubleValue range 64-bit double precision IEEE 754 floating point type

floatValue range 32-bit single precision IEEE 754 floating point type

half_floatValue range 16-bit half-precision IEEE 754 floating point type

scaled_floatValue range Scale type floating point number

It should be noted here that for numeric fields, when meeting the needs, select data types with a small range as much as possible, just like a person's age, at most three digits.shortIt's perfectly competent,The shorter the length of the field, the more efficient the index and search

For floating point numbers, priority is givenscaled_floattype. scaled_float is throughScaling factorTurn floating point numbers intolong Types, for example, the price only needs to be accurate to the point. The value of the price field is 57.34, the amplification factor is 100, and it is 5734 when stored. All APIs will treat the value of price as a floating point number. In fact, Elasticsearch stores the integer type at the bottom, because compressing integers saves more storage space than compressing floating point numbers.

Date Type

date

existESThe dates in this can be in the following forms:

Format the string of dates, such as "2015-01-01" or "2015/01/01 12:10:30"
Millisecond timestamp
Seconds time stamp

It should be noted thatESThe date is internally converted to UTC (United States Standard Time) and stored as a millisecond timestamp, the reason for this is that values are faster when stored and processed than strings.

Boolean type

boolean

If a field isBoolean type, the acceptable value istrue、false。

***Before Elasticsearch version 5.4, strings and numbers interpreted as true or false can be accepted. After version 5.4, only true, false, "true", "false" can be accepted. ***

Binary type

binary

binary type data format isbase64 encoded string,defaultNo extra storage, no searchability。

Range Type

range

integer_range -2^31 to 2^31-1
long_range -2^63 to 2^63-1
float_range 32-bit IEEE 754
double_range 64-bit IEEE 754
date_range 64-bit integer, millisecond timing

Composite data type

Maybe you haven't heard of this, it is mainly divided intoArray type, Object Type, Nested Types

Array type

ESThere is no dedicated array type in it. By default, any field can contain 0 or more values, but the values in an array must be of the same type.

Integer array: [1,3]
Nested array: [1,[2,3]], equivalent to [1,2,3]
Object array: [{"name": "lili", "age": "18"}, {"name": "liming", "age": "20"} ]

When adding data dynamically, the type of the first value of the array determines the type of the entire array.Mixed array types are not supported, for example:[1，”abc”]。Arrays can containnullValue, empty array[ ]Will be treated asmissing fieldtreat.

Using array type in the document does not require any configuration in advance, and it is supported by default.

Object Type

The object type is easy to understand, that is, JSON objects. It should be noted that nested JSON objects will be made in es.Flattening, How do you understand it?

For example, the following data:

{
  "name":"lili",
  "friend":[
    "name":"xiaohong"
  ]
}

This becomes like this when storing:

{
  "name":"lili",
  "":"xiaohong"
}

Nested Types

Nested types are special object types, and es itself willObject TypeFields madeFlat placeTherefore, when the stored object type is an object array, the association relationship will be invalid.

{
  "name":"lili",
  "friends":[
    {
      "name":"xiaohong",
      "age": 18
    },
    {
      "name":"xiaoming",
      "age": 20
    }
  ]
}

After processing:

{
  "name":"lili",
  "":["xiaohong","xiaoming"],
  "":[18, 20]
}

It can be seen that the direct correlation of data is gone

If you need to index the object array and avoid the above problems, you should usenestedobject type instead of object type,nestedObject types can maintain the independence of each object in the array. The Nested type indexes each object in the array as an independent hidden document, which means that each nested object can be searched independently.

To prevent over-defined number of nested fields, the number of nested fields that can be defined per index is limited to 50

Geographic data types

Geographically related types include geographic coordinate type (geo_point) and geographic graph type (geo_shape)

Geographic coordinates (geo_point) type

geo pointType is used to store latitude and longitude of geolocation information, and can be used in the following scenarios:

Find geographical locations within a certain range.
Aggregate documents by geographical location or distance relative to the center point.
Integrate distance factors into the ratings of the document.
Sort documents by distance.
Storage format: Latitude and longitudeJSONFormat:{“lat”:41.12,”lon”:-71.34}
Latitude and longitudeStringFormat:"41.12,-71.34"
Geographic coordinate hash value:"u1269qu5dcgp"
Latitude and longitudeArrayForm: [41.12,-71.34]

Geo-graphics (geo_shape) type

I won't introduce this to you, it's not commonly used, its storage form isgeoJson, as follows:

{
 "type":"Point",
 "coordinates":[
     100,
     0
 ]
}

Special types

IP type (ip)

Fields of type ip are used for storageIPv4orIPv6address. Such as "192.168.1.1" or "192.168.0.0/16"

Token count type (token_count)

token_countIt is used to count the number of entries after the text participle, which is essentially an integer field. For example, in the map, name is specified as text type, and the added field is used to count the length of the word item after the word segmentation. The type is token_count, and the word segmentation is a standard word segmentation. The command is as follows:

{
  "mappings": {
    "my_type": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "length": {
              "type": "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Conclusion

This section mainly talks aboutMapping in ESThe concept and its basic properties are introduced to you.Core data types,This section has a little more theories, so everyone needs to digest it. It is easy to forget it just by looking at it, and the impression of it will be deeper when practicing it. The correct usage type can help us solve problems better and save server costs. There are some things left in this section, and I will tell you what isDynamic mapping, For more information about ElasticSearch ES Mapping, please follow my other related articles!