elasticsearch如何修改mapping和template的方法

题目,elasticsearch修改mapping和template的方法

首先要说明下，template和mapping的关系，mapping默认是集成于template的，当然如果mapping有设定的话，就走自定义的mapping.

mapping相当于字段描述，比如某个字段是float，某个字段需要分词，某个字段是date类型，是否可以搜索

template，字面意思是模板，他所作的事情也是模板的事情，他可以针对index做别名，也就是说，xiaorui* = xiaorui_v2_2013 xiaorui_v2_2014 … 通配符

elasticsearch默认是字符串的类型的字段都会分词的，通俗说，你使用elasticsearch query_string match都可以命中查询的。

但是所有的字段分词，他也会带来性能及空间占用问题，所以我们只会针对特定的字段来进行开放分词。

文章写的不是很严谨，欢迎来喷，另外该文后续有更新的，请到原文地址查看更新.

http://xiaorui.cc/2015/12/17/elasticsearch%E5%A6%82%E4%BD%95%E4%BF%AE%E6%94%B9mapping%E5%92%8Ctemplate%E7%9A%84%E6%96%B9%E6%B3%95/

题外一句，搜了不少文章，不少人在折腾在线动态修改mapping，但最后还是来最纯粹的方案，清理数据，修改mapping,然后重新灌数据。所以，这也是我们对于elasticsearch的应用场景，从不把elasticsearch作为最核心的库。所谓我厂的兼职首席DBA（就是在下），总是喜欢把elasticsearch跟mysql对比，elasticsearch要是有成熟的工具，权限管理，再来个percona那种pt-online-change在线修改字段… 那真是,做梦.

修改mapping之前,要把数据都清理掉,不然你就算put了新的mapping,他以前的数据也更正不过来，后进去的数据会跟随旧数据的存储类型。以前用elk做日志收集的时候，我会把一条条的日志构建成json数据。如果你第一条数据的某个字段没有构建成float，那么只能把个index数据干掉，才能正常的显示float类型数据。现在业务中使用elasticsearch，同样这么一回事。

删除数据, 别删除了，index后面最好跟着type !

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec  DELETE

查看你的mapping，因为你的数据被清理了，所以应该看到的是空字典… {}

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping   GET

然后我们mapping的prod_name字段改成分词的模式,可以指定你的分词器，这边用的是ik分词器. 注意下，method是PUT

POST内容中properties对应mapping里的内容,要进行分词和高亮因此要设置分词器和开启term_vector。

“prod_name”: {
“type”: “string”,
“term_vector”: “with_positions_offsets”,
“norms”: {
“enabled”: false
},
“analyzer”: “ik”,
“include_in_all”: false
},

更新Mapping

http://es.xiaorui.cc:9111/xiaorui_v2_201507/ec/_mapping PUT

not_analyzed是没有分词的意思.

#blog: xiaorui.cc

{
	"properties": {
		"brand": {
			"type": "string",
			"index": "not_analyzed"
		},
		"cdate": {
			"type": "date",
			"format": "dateOptionalTime"
		},
		"distr_pan": {
			"type": "nested",
			"properties": {
				"k": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"v": {
					"type": "float",
					"doc_values": true
				}
			}
		},
		"distr_segment": {
			"type": "nested",
			"properties": {
				"k": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"v": {
					"type": "long",
					"doc_values": true
				}
			}
		},
		"id": {
			"type": "long"
		},
		"idate": {
			"type": "date",
			"format": "dateOptionalTime"
		},
		"keyword": {
			"type": "string",
			"index": "not_analyzed"
		},
		"platform": {
			"type": "long"
		},
		"prod_name": {
			"type": "string",
			"term_vector": "with_positions_offsets",
			"norms": {
				"enabled": false
			},
			"analyzer": "ik",
			"include_in_all": false
		},
		"product_id": {
			"type": "string",
			"index": "not_analyzed"
		},
		"product_price": {
			"type": "double"
		},
		"query": {
			"properties": {
				"query_string": {
					"properties": {
						"fields": {
							"type": "string",
							"index": "not_analyzed"
						},
						"query": {
							"type": "string",
							"index": "not_analyzed"
						}
					}
				}
			}
		},
		"score": {
			"type": "long"
		},
		"source": {
			"properties": {
				"name": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				},
				"url": {
					"type": "string",
					"index": "not_analyzed",
					"doc_values": true
				}
			}
		},
		"store": {
			"type": "string",
			"index": "not_analyzed"
		},
		"text": {
			"type": "string",
			"term_vector": "with_positions_offsets",
			"norms": {
				"enabled": false
			},
			"analyzer": "ik",
			"include_in_all": false
		}
	}

}

然后我们还需要改template，因为上面的操作保证了现有的index使用了我们指定的mapping配置。但是如果是新的index怎么办？可以指定template模板.
我们需要在模板里面指定index别名。

有人不理解，一个/index/type 就可以了。为什么要给index设计别名. 为了更好的管理数据，为了更好的性能，为了减轻查询带来的机器负载.
现在大多数别名的方式都是通过时间来区分，比如 xiaorui_v2_201401 xiaorui_v2_201402 xiaorui_v2_201511 . 每个月份是一个index，我们在每个月的index配置30个shards,他的速度提升很明显..
elk logstash默认的配置是把日志分成每天一个index…

http://es.xiaorui.cc:9111/_template/xiaorui PUT

#blog: xiaorui.cc

{
	"order": 0,
	"template": "xiaorui*",
	"settings": {
		"index.replication": "async",
		"index.number_of_replicas": "1",
		"index.number_of_shards": "30"
	},
	"mappings": {
		"_default_": {
			"dynamic_templates": [{
				"text": {
					"mapping": {
						"indexAnalyzer": "ik",
						"searchAnalyzer": "ik",
						"include_in_all": false,
						"omit_norms": true,
						"store": "no",
						"norms": {
							"enabled": false
						},
						"type": "string",
						"term_vector": "with_positions_offsets"
					},
					"path_match": "text"
				}
			}, {
				"prod_name": {
					"mapping": {
						"indexAnalyzer": "ik",
						"searchAnalyzer": "ik",
						"include_in_all": false,
						"omit_norms": true,
						"store": "no",
						"norms": {
							"enabled": false
						},
						"type": "string",
						"term_vector": "with_positions_offsets"
					},
					"path_match": "prod_name"
				}
			}, {
				"date": {
					"mapping": {
						"format": "dateOptionalTime",
						"type": "date"
					},
					"match": "(created_at|.*date)",
					"match_pattern": "regex"
				}
			}, {
				"source": {
					"mapping": {
						"properties": {
							"name": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							},
							"url": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						}
					},
					"path_match": "source"
				}
			}, {
				"distr_pan": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "float"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_pan"
				}
			}, {
				"distr_artificial_pan": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "float"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_artificial_pan"
				}
			}, {
				"distr_segment": {
					"mapping": {
						"properties": {
							"v": {
								"doc_values": true,
								"type": "long"
							},
							"k": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							}
						},
						"type": "nested"
					},
					"match": "distr_segment"
				}
			}, {
				"buzz_comment": {
					"mapping": {
						"properties": {
							"v": {
								"index": "not_analyzed",
								"doc_values": true,
								"type": "string"
							},
							"k": {
								"format": "dateOptionalTime",
								"type": "date"
							}
						},
						"type": "nested"
					},
					"match": "buzz_comment"
				}
			}, {
				"doc_values_string": {
					"mapping": {
						"index": "not_analyzed",
						"omit_norms": true,
						"doc_values": true,
						"type": "string"
					},
					"match_pattern": "regex",
					"path_match": "(source.name|user.gender|user.region.*)"
				}
			}, {
				"doc_values_long": {
					"mapping": {
						"doc_values": true,
						"type": "long"
					},
					"match_pattern": "regex",
					"path_match": "(flash|acount)"
				}
			}, {
				"other": {
					"mapping": {
						"index": "not_analyzed",
						"omit_norms": true,
						"type": "string"
					},
					"match": "*",
					"match_mapping_type": "string"
				}
			}],
			"_all": {
				"enabled": false
			}
		}
	},
	"aliases": {
		"xiaorui": {}
	}
}

对于elasticsearch大批量直接在线修改已有数据，我是持有怀疑态度的。我们查了不少的文档，最后在github issue中看到elasticsearch那边的回复。他们也没有啥好的在线修改mapping方案… 看来，大家暂时别尝试在线修改mapping了，还是重新导数靠谱…

提了一个针对elasticsearch在线修改mapping template的话题，https://github.com/elastic/elasticsearch/issues/15524 这回复让我很是尴尬…

原文, http://xiaorui.cc/?p=2493

END…

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc

elasticsearch如何修改mapping和template的方法

2 Responses

发表评论取消回复

2 Responses

发表评论 取消回复

发表评论取消回复