docker ulimit引起elasticsearch Too many open files报错

国内业务方面重度使用elasticsearch的公司不是很多，我公司就是其一. 大多数是使用elk做日志收集展现. 前段时间我们遇到了个问题，是由于docker容器最大文件打开数配置太小引起的，其实我们在linux本机已经配置了/etc/security/limits.conf . 具体问题及解决方法下面瞅瞅。

文章写的不是很严谨，欢迎来喷，另外该文后续有更新的，请到原文地址查看更新.

http://xiaorui.cc/2016/01/05/docker-ulimit%E5%BC%95%E8%B5%B7elasticsearch-too-many-open-files%E6%8A%A5%E9%94%99/

我这有个电商导出程序出现个报警信息，通过排除不是nginx tcp proxy出的问题. 通过elasticsearch heath接口判断出他的状态。我们来看下elasticsearch日志报错内容，还好提示也是相当的明显，就是Too many open files. 做过一些高并发业务的朋友一定知道这是啥报错？ ulimit 配置不当引起的.

[10:58:24,165][WARN ][cluster.action.shard     ] [node1] received shard failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested: EngineCreationFailureException[[index9][2] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1 (Too many open files)]; ]] 
[10:58:24,166][WARN ][cluster.action.shard     ] [node1] received shard failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason [Failed to create shard, message [IndexShardCreationException[[index15][0] failed to create shard]; nested: IOException[directory '/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and is a directory, but cannot be listed: list() returned null]; ]] 
[10:58:24,195][WARN ][cluster.action.shard     ] [node1] received shard failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index16][3] failed recovery]; nested: EngineCreationFailureException[[index16][3] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1 (Too many open files)]; ]] 
[10:58:24,196][WARN ][cluster.action.shard     ] [node1] received shard failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index17][0] failed recovery]; nested: EngineCreationFailureException[[index17][0] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1 (Too many open files)]; ]] 
[10:58:24,198][WARN ][cluster.action.shard     ] [node1] received shard failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index21][4] failed recovery]; nested: EngineCreationFailureException[[index21][4] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock]; ]]    #xiaorui.cc
Output of nodes api

我们的elasticsearch是放在docker里面运行的，以前写过文章专门描述我们为什么会选择使用docker来运行elasticsearch。当然elasticsearch的data目录是在放在本机目录的。

其实我们的linux主机默认都会配置好sysctl内核参数及ulimit调参的，如果没有配置的朋友，请按照下面的方法先配置主机的ulimit.

首先在当前linux终端下执行ulimt -Hn和ulimit -Sn，可以看到当前用户允许打开文件的最大个数.

如果太小，那么可以改的大一点。打开/etc/security/limits.conf文件.（启动elasticsearch的用户为elasticsearch）
elasticsearch soft nofile 32000
elasticsearch hard nofile 32000

重新以elasticsearch用户身份登录，并执行执行ulimt -Hn和ulimit -Sn，以此验证上述配置是否生效；若设置生效，则重启Elasticsearch即可。

下面是正事:

光这么做是无效果的，docker默认是不继承主机的ulimit配置的, 所以这是个大坑。搜了相关的文章，发现很多人把haproxy nginx放在docker中，后因为ulimit配置不当，引起各种问题。

这边的docker环境是docker 1.6版本. Docker 1.6版本有两种方式配置容器的ulimit.

大多数被docker化的服务并不需要太多的nofile nproc, 但也不能使用默认配置1024. 可以使用下面的全局方式提高ulimit. 不要把全局的ulimit调大，因为有些人开发的程序实在诡异，会消耗大量的资源. 如果全局ulimit配置数值小的话，在一定程度还能限制资源。

1）全局默认的ulimit：
docker -d –default-ulimit nproc=1024:2048
docker -d –default-ulimit nofile=20480:40960 nproc=1024:2048 xxx

2）也可以针对单个特殊的服务进行ulimit配置。
docker run -d –ulimit nofile=20480:40960 nproc=1024:2048 xxx

额外说句废话，这两天折腾了elasticsearch dockerfile集群模式… 整理后会分享给大家.

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc

发表评论 取消回复

发表评论取消回复