elasticsearch如何修改mapping和template的方法

题目,elasticsearch修改mapping和template的方法

首先要说明下,template和mapping的关系,mapping默认是集成于template的,当然如果mapping有设定的话,就走自定义的mapping.

mapping相当于字段描述,比如某个字段是float,某个字段需要分词,某个字段是date类型,是否可以搜索

template,字面意思是模板,他所作的事情也是模板的事情,他可以针对index做别名,也就是说,xiaorui* = xiaorui_v2_2013 xiaorui_v2_2014 …  通配符

elasticsearch默认是字符串的类型的字段都会分词的,通俗说,你使用elasticsearch query_string match都可以命中查询的。

但是所有的字段分词,他也会带来性能及空间占用问题,所以我们只会针对特定的字段来进行开放分词。 


文章写的不是很严谨,欢迎来喷,另外该文后续有更新的,请到原文地址查看更新. 

http://xiaorui.cc/2015/12/17/elasticsearch%E5%A6%82%E4%BD%95%E4%BF%AE%E6%94%B9mapping%E5%92%8Ctemplate%E7%9A%84%E6%96%B9%E6%B3%95/

题外一句,搜了不少文章,不少人在折腾在线动态修改mapping,但最后还是来最纯粹的方案,清理数据,修改mapping,然后重新灌数据。所以,这也是我们对于elasticsearch的应用场景,从不把elasticsearch作为最核心的库。  所谓我厂的兼职首席DBA(就是在下),总是喜欢把elasticsearch跟mysql对比,elasticsearch要是有成熟的工具,权限管理,再来个percona那种pt-online-change在线修改字段… 那真是,做梦. 

修改mapping之前,要把数据都清理掉,不然你就算put了新的mapping,他以前的数据也更正不过来,后进去的数据会跟随旧数据的存储类型。  以前用elk做日志收集的时候,我会把一条条的日志构建成json数据。 如果你第一条数据的某个字段没有构建成float,那么只能把个index数据干掉,才能正常的显示float类型数据。  现在业务中使用elasticsearch,同样这么一回事。

删除数据,  别删除了,index后面最好跟着type !

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec  DELETE

查看你的mapping,因为你的数据被清理了,所以应该看到的是空字典… {}

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping   GET

然后我们mapping的prod_name字段改成分词的模式,可以指定你的分词器,这边用的是ik分词器. 注意下,method是PUT

POST内容中properties对应mapping里的内容,要进行分词和高亮因此要设置分词器和开启term_vector。

“prod_name”: {
“type”: “string”,
“term_vector”: “with_positions_offsets”,
“norms”: {
“enabled”: false
},
“analyzer”: “ik”,
“include_in_all”: false
},

更新Mapping

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping   PUT

not_analyzed是没有分词的意思.

#blog: xiaorui.cc

{
	"properties": {
		"brand": {
			"type": "string",
			"index": "not_analyzed"
		},
		"cdate": {
			"type": "date",
			"format": "dateOptionalTime"
		},
		"distr_pan": {
			"type": "nested",
			"properties": {
				"k": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"v": {
					"type": "float",
					"doc_values": true
				}
			}
		},
		"distr_segment": {
			"type": "nested",
			"properties": {
				"k": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"v": {
					"type": "long",
					"doc_values": true
				}
			}
		},
		"id": {
			"type": "long"
		},
		"idate": {
			"type": "date",
			"format": "dateOptionalTime"
		},
		"keyword": {
			"type": "string",
			"index": "not_analyzed"
		},
		"platform": {
			"type": "long"
		},
		"prod_name": {
			"type": "string",
			"term_vector": "with_positions_offsets",
			"norms": {
				"enabled": false
			},
			"analyzer": "ik",
			"include_in_all": false
		},
		"product_id": {
			"type": "string",
			"index": "not_analyzed"
		},
		"product_price": {
			"type": "double"
		},
		"query": {
			"properties": {
				"query_string": {
					"properties": {
						"fields": {
							"type": "string",
							"index": "not_analyzed"
						},
						"query": {
							"type": "string",
							"index": "not_analyzed"
						}
					}
				}
			}
		},
		"score": {
			"type": "long"
		},
		"source": {
			"properties": {
				"name": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"url": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				}
			}
		},
		"store": {
			"type": "string",
			"index": "not_analyzed"
		},
		"text": {
			"type": "string",
			"term_vector": "with_positions_offsets",
			"norms": {
				"enabled": false
			},
			"analyzer": "ik",
			"include_in_all": false
		}
	}

}

然后我们还需要改template,因为上面的操作保证了现有的index使用了我们指定的mapping配置。 但是如果是新的index怎么办? 可以指定template模板.
我们需要在模板里面指定index别名。

有人不理解,一个/index/type 就可以了。 为什么要给index设计别名. 为了更好的管理数据,为了更好的性能,为了减轻查询带来的机器负载. 
现在大多数别名的方式都是通过时间来区分,比如 xiaorui_v2_201401 xiaorui_v2_201402 xiaorui_v2_201511 . 每个月份是一个index,我们在每个月的index配置30个shards,他的速度提升很明显..
elk logstash默认的配置是把日志分成每天一个index… 

http://es.xiaorui.cc:9111/_template/xiaorui   PUT

#blog: xiaorui.cc

{
	"order": 0,
	"template": "xiaorui*",
	"settings": {
		"index.replication": "async",
		"index.number_of_replicas": "1",
		"index.number_of_shards": "30"
	},
	"mappings": {
		"_default_": {
			"dynamic_templates": [{
				"text": {
					"mapping": {
						"indexAnalyzer": "ik",
						"searchAnalyzer": "ik",
						"include_in_all": false,
						"omit_norms": true,
						"store": "no",
						"norms": {
							"enabled": false
						},
						"type": "string",
						"term_vector": "with_positions_offsets"
					},
					"path_match": "text"
				}
			}, {
				"prod_name": {
					"mapping": {
						"indexAnalyzer": "ik",
						"searchAnalyzer": "ik",
						"include_in_all": false,
						"omit_norms": true,
						"store": "no",
						"norms": {
							"enabled": false
						},
						"type": "string",
						"term_vector": "with_positions_offsets"
					},
					"path_match": "prod_name"
				}
			}, {
				"date": {
					"mapping": {
						"format": "dateOptionalTime",
						"type": "date"
					},
					"match": "(created_at|.*date)",
					"match_pattern": "regex"
				}
			}, {
				"source": {
					"mapping": {
						"properties": {
							"name": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							},
							"url": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						}
					},
					"path_match": "source"
				}
			}, {
				"distr_pan": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "float"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_pan"
				}
			}, {
				"distr_artificial_pan": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "float"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_artificial_pan"
				}
			}, {
				"distr_segment": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "long"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_segment"
				}
			}, {
				"buzz_comment": {
					"mapping": {
						"properties": {
							"v": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							},
							"k": {
								"format": "dateOptionalTime",
								"type": "date"
							}
						},
						"type": "nested"
					},
					"match": "buzz_comment"
				}
			}, {
				"doc_values_string": {
					"mapping": {
						"index": "not_analyzed",
						"omit_norms": true,
						"doc_values": true,
						"type": "string"
					},
					"match_pattern": "regex",
					"path_match": "(source.name|user.gender|user.region.*)"
				}
			}, {
				"doc_values_long": {
					"mapping": {
						"doc_values": true,
						"type": "long"
					},
					"match_pattern": "regex",
					"path_match": "(flash|acount)"
				}
			}, {
				"other": {
					"mapping": {
						"index": "not_analyzed",
						"omit_norms": true,
						"type": "string"
					},
					"match": "*",
					"match_mapping_type": "string"
				}
			}],
			"_all": {
				"enabled": false
			}
		}
	},
	"aliases": {
		"xiaorui": {}
	}
}

对于elasticsearch大批量直接在线修改已有数据,我是持有怀疑态度的。我们查了不少的文档,最后在github issue中看到elasticsearch那边的回复。他们也没有啥好的在线修改mapping方案… 看来,大家暂时别尝试在线修改mapping了,还是重新导数靠谱…


提了一个针对elasticsearch在线修改mapping template的话题,https://github.com/elastic/elasticsearch/issues/15524  这回复让我很是尴尬… 


原文, http://xiaorui.cc/?p=2493

END…


大家觉得文章对你有些作用! 如果想赏钱,可以用微信扫描下面的二维码,感谢!
另外再次标注博客原地址  xiaorui.cc

2 Responses

  1. 情感唯美 2016年7月1日 / 下午3:47

    我存了 template_1.json 文件,创建新的所以更没有用到模板里的 mapping 啊

  2. 情感唯美 2016年7月1日 / 下午3:46

    模板保存到 config/templates/里如何命名和应用?

发表评论

邮箱地址不会被公开。 必填项已用*标注