docker ulimit引起elasticsearch Too many open files报错


      国内业务方面重度使用elasticsearch的公司不是很多,我公司就是其一.   大多数是使用elk做日志收集展现.    前段时间我们遇到了个问题, 是由于docker容器最大文件打开数配置太小引起的,其实我们在linux本机已经配置了/etc/security/limits.conf .  具体问题及解决方法下面瞅瞅。


文章写的不是很严谨,欢迎来喷,另外该文后续有更新的,请到原文地址查看更新.

http://xiaorui.cc/2016/01/05/docker-ulimit%E5%BC%95%E8%B5%B7elasticsearch-too-many-open-files%E6%8A%A5%E9%94%99/


我这有个电商导出程序出现个报警信息,通过排除不是nginx tcp proxy出的问题.  通过elasticsearch heath接口判断出他的状态。我们来看下elasticsearch日志报错内容,还好提示也是相当的明显,就是Too many open files. 做过一些高并发业务的朋友一定知道这是啥报错?  ulimit 配置不当引起的. 

[10:58:24,165][WARN ][cluster.action.shard     ] [node1] received shard failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested: EngineCreationFailureException[[index9][2] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1 (Too many open files)]; ]] 
[10:58:24,166][WARN ][cluster.action.shard     ] [node1] received shard failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason [Failed to create shard, message [IndexShardCreationException[[index15][0] failed to create shard]; nested: IOException[directory '/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and is a directory, but cannot be listed: list() returned null]; ]] 
[10:58:24,195][WARN ][cluster.action.shard     ] [node1] received shard failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index16][3] failed recovery]; nested: EngineCreationFailureException[[index16][3] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1 (Too many open files)]; ]] 
[10:58:24,196][WARN ][cluster.action.shard     ] [node1] received shard failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index17][0] failed recovery]; nested: EngineCreationFailureException[[index17][0] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1 (Too many open files)]; ]] 
[10:58:24,198][WARN ][cluster.action.shard     ] [node1] received shard failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index21][4] failed recovery]; nested: EngineCreationFailureException[[index21][4] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock]; ]]    #xiaorui.cc
Output of nodes api

我们的elasticsearch是放在docker里面运行的,以前写过文章专门描述我们为什么会选择使用docker来运行elasticsearch。当然elasticsearch的data目录是在放在本机目录的。

其实我们的linux主机默认都会配置好sysctl内核参数及ulimit调参的,如果没有配置的朋友,请按照下面的方法先配置主机的ulimit.

首先在当前linux终端下执行ulimt -Hn和ulimit -Sn,可以看到当前用户允许打开文件的最大个数.

如果太小,那么可以改的大一点。打开/etc/security/limits.conf文件.(启动elasticsearch的用户为elasticsearch)
elasticsearch soft nofile 32000
elasticsearch hard nofile 32000

重新以elasticsearch用户身份登录,并执行执行ulimt -Hn和ulimit -Sn,以此验证上述配置是否生效;若设置生效,则重启Elasticsearch即可。

下面是正事:

光这么做是无效果的,docker默认是不继承主机的ulimit配置的, 所以这是个大坑。搜了相关的文章,发现很多人把haproxy nginx放在docker中,后因为ulimit配置不当,引起各种问题。

这边的docker环境是docker 1.6版本. Docker 1.6版本有两种方式配置容器的ulimit.

大多数被docker化的服务并不需要太多的nofile nproc, 但也不能使用默认配置1024. 可以使用下面的全局方式提高ulimit.  不要把全局的ulimit调大,因为有些人开发的程序实在诡异,会消耗大量的资源.  如果全局ulimit配置数值小的话,在一定程度还能限制资源。


1) 全局默认的ulimit:
docker -d –default-ulimit nproc=1024:2048
docker -d –default-ulimit nofile=20480:40960 nproc=1024:2048 xxx

2)也可以针对单个特殊的服务进行ulimit配置。
docker run -d –ulimit nofile=20480:40960 nproc=1024:2048 xxx

额外说句废话,这两天折腾了elasticsearch dockerfile集群模式…   整理后会分享给大家. 


大家觉得文章对你有些作用! 如果想赏钱,可以用微信扫描下面的二维码,感谢!
另外再次标注博客原地址  xiaorui.cc

发表评论

邮箱地址不会被公开。 必填项已用*标注