首先要说明下,template和mapping的关系,mapping默认是集成于template的,当然如果mapping有设定的话,就走自定义的mapping.
mapping相当于字段描述,比如某个字段是float,某个字段需要分词,某个字段是date类型,是否可以搜索
template,字面意思是模板,他所作的事情也是模板的事情,他可以针对index做别名,也就是说,xiaorui* = xiaorui_v2_2013 xiaorui_v2_2014 … 通配符
elasticsearch默认是字符串的类型的字段都会分词的,通俗说,你使用elasticsearch query_string match都可以命中查询的。
但是所有的字段分词,他也会带来性能及空间占用问题,所以我们只会针对特定的字段来进行开放分词。
文章写的不是很严谨,欢迎来喷,另外该文后续有更新的,请到原文地址查看更新.
题外一句,搜了不少文章,不少人在折腾在线动态修改mapping,但最后还是来最纯粹的方案,清理数据,修改mapping,然后重新灌数据。所以,这也是我们对于elasticsearch的应用场景,从不把elasticsearch作为最核心的库。 所谓我厂的兼职首席DBA(就是在下),总是喜欢把elasticsearch跟mysql对比,elasticsearch要是有成熟的工具,权限管理,再来个percona那种pt-online-change在线修改字段… 那真是,做梦.
修改mapping之前,要把数据都清理掉,不然你就算put了新的mapping,他以前的数据也更正不过来,后进去的数据会跟随旧数据的存储类型。 以前用elk做日志收集的时候,我会把一条条的日志构建成json数据。 如果你第一条数据的某个字段没有构建成float,那么只能把个index数据干掉,才能正常的显示float类型数据。 现在业务中使用elasticsearch,同样这么一回事。
删除数据, 别删除了,index后面最好跟着type !
http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec DELETE
查看你的mapping,因为你的数据被清理了,所以应该看到的是空字典… {}
http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping GET
然后我们mapping的prod_name字段改成分词的模式,可以指定你的分词器,这边用的是ik分词器. 注意下,method是PUT
POST内容中properties对应mapping里的内容,要进行分词和高亮因此要设置分词器和开启term_vector。
“prod_name”: {
“type”: “string”,
“term_vector”: “with_positions_offsets”,
“norms”: {
“enabled”: false
},
“analyzer”: “ik”,
“include_in_all”: false
},
更新Mapping
http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping PUT
not_analyzed是没有分词的意思.
#blog: xiaorui.cc { "properties": { "brand": { "type": "string", "index": "not_analyzed" }, "cdate": { "type": "date", "format": "dateOptionalTime" }, "distr_pan": { "type": "nested", "properties": { "k": { "type": "string", "index": "not_analyzed", "doc_values": true }, "v": { "type": "float", "doc_values": true } } }, "distr_segment": { "type": "nested", "properties": { "k": { "type": "string", "index": "not_analyzed", "doc_values": true }, "v": { "type": "long", "doc_values": true } } }, "id": { "type": "long" }, "idate": { "type": "date", "format": "dateOptionalTime" }, "keyword": { "type": "string", "index": "not_analyzed" }, "platform": { "type": "long" }, "prod_name": { "type": "string", "term_vector": "with_positions_offsets", "norms": { "enabled": false }, "analyzer": "ik", "include_in_all": false }, "product_id": { "type": "string", "index": "not_analyzed" }, "product_price": { "type": "double" }, "query": { "properties": { "query_string": { "properties": { "fields": { "type": "string", "index": "not_analyzed" }, "query": { "type": "string", "index": "not_analyzed" } } } } }, "score": { "type": "long" }, "source": { "properties": { "name": { "type": "string", "index": "not_analyzed", "doc_values": true }, "url": { "type": "string", "index": "not_analyzed", "doc_values": true } } }, "store": { "type": "string", "index": "not_analyzed" }, "text": { "type": "string", "term_vector": "with_positions_offsets", "norms": { "enabled": false }, "analyzer": "ik", "include_in_all": false } } }
然后我们还需要改template,因为上面的操作保证了现有的index使用了我们指定的mapping配置。 但是如果是新的index怎么办? 可以指定template模板.
我们需要在模板里面指定index别名。
有人不理解,一个/index/type 就可以了。 为什么要给index设计别名. 为了更好的管理数据,为了更好的性能,为了减轻查询带来的机器负载.
现在大多数别名的方式都是通过时间来区分,比如 xiaorui_v2_201401 xiaorui_v2_201402 xiaorui_v2_201511 . 每个月份是一个index,我们在每个月的index配置30个shards,他的速度提升很明显..
elk logstash默认的配置是把日志分成每天一个index…
http://es.xiaorui.cc:9111/_template/xiaorui PUT
#blog: xiaorui.cc { "order": 0, "template": "xiaorui*", "settings": { "index.replication": "async", "index.number_of_replicas": "1", "index.number_of_shards": "30" }, "mappings": { "_default_": { "dynamic_templates": [{ "text": { "mapping": { "indexAnalyzer": "ik", "searchAnalyzer": "ik", "include_in_all": false, "omit_norms": true, "store": "no", "norms": { "enabled": false }, "type": "string", "term_vector": "with_positions_offsets" }, "path_match": "text" } }, { "prod_name": { "mapping": { "indexAnalyzer": "ik", "searchAnalyzer": "ik", "include_in_all": false, "omit_norms": true, "store": "no", "norms": { "enabled": false }, "type": "string", "term_vector": "with_positions_offsets" }, "path_match": "prod_name" } }, { "date": { "mapping": { "format": "dateOptionalTime", "type": "date" }, "match": "(created_at|.*date)", "match_pattern": "regex" } }, { "source": { "mapping": { "properties": { "name": { "index": "not_analyzed", "doc_values": true, "type": "string" }, "url": { "index": "not_analyzed", "doc_values": true, "type": "string" } } }, "path_match": "source" } }, { "distr_pan": { "mapping": { "properties": { "v": { "doc_values": true, "type": "float" }, "k": { "index": "not_analyzed", "doc_values": true, "type": "string" } }, "type": "nested" }, "match": "distr_pan" } }, { "distr_artificial_pan": { "mapping": { "properties": { "v": { "doc_values": true, "type": "float" }, "k": { "index": "not_analyzed", "doc_values": true, "type": "string" } }, "type": "nested" }, "match": "distr_artificial_pan" } }, { "distr_segment": { "mapping": { "properties": { "v": { "doc_values": true, "type": "long" }, "k": { "index": "not_analyzed", "doc_values": true, "type": "string" } }, "type": "nested" }, "match": "distr_segment" } }, { "buzz_comment": { "mapping": { "properties": { "v": { "index": "not_analyzed", "doc_values": true, "type": "string" }, "k": { "format": "dateOptionalTime", "type": "date" } }, "type": "nested" }, "match": "buzz_comment" } }, { "doc_values_string": { "mapping": { "index": "not_analyzed", "omit_norms": true, "doc_values": true, "type": "string" }, "match_pattern": "regex", "path_match": "(source.name|user.gender|user.region.*)" } }, { "doc_values_long": { "mapping": { "doc_values": true, "type": "long" }, "match_pattern": "regex", "path_match": "(flash|acount)" } }, { "other": { "mapping": { "index": "not_analyzed", "omit_norms": true, "type": "string" }, "match": "*", "match_mapping_type": "string" } }], "_all": { "enabled": false } } }, "aliases": { "xiaorui": {} } }
对于elasticsearch大批量直接在线修改已有数据,我是持有怀疑态度的。我们查了不少的文档,最后在github issue中看到elasticsearch那边的回复。他们也没有啥好的在线修改mapping方案… 看来,大家暂时别尝试在线修改mapping了,还是重新导数靠谱…
提了一个针对elasticsearch在线修改mapping template的话题,https://github.com/elastic/elasticsearch/issues/15524 这回复让我很是尴尬…
END…
我存了 template_1.json 文件,创建新的所以更没有用到模板里的 mapping 啊
模板保存到 config/templates/里如何命名和应用?