今天一上班朱伟就告诉我,有大量的邮件报警,是关于logstash redis队列的堆积,数目有些大,已经积攒了100w了。 很是晕头,上次其实已经遇到过这样的问题,当时因为是做了redis升级调整,以为是这个引起的,所以重启了logstash server端解决了。 后来又发生了这样的情况,也就是 logstash不工作的情况。 今天就把这问题给排查下。
[ruifengyun@bj-log-1 ~]redis-cli -c llen key_count (integer) 2250942 [ruifengyun@bj-log-1 ~] redis-cli -c llen key_count (integer) 2250983 [ruifengyun@bj-log-1 ~]redis-cli -c llen key_count (integer) 2251359 [ruifengyun@bj-log-1 ~] redis-cli -c llen key_count (integer) 2251281 [ruifengyun@bj-log-1 ~]$ redis-cli -c llen key_count (integer) 2251312
队列的数目一直在增长,但是logstash的进程还是存在的。
ruifengyun@bj-log-1 ~]ps uax|grep logstash|grep -v grep
503 5905 53.3 0.9 4472976 935488 pts/1 Sl Apr24 3606:37 /usr/java/jdk1.8.0_25/bin/java -Xmx500m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -jar /opt/logstash-1.4.2/vendor/jar/jruby-complete-1.7.11.jar -I/opt/logstash-1.4.2/lib /opt/logstash-1.4.2/lib/logstash/runner.rb agent -f /opt/logstash-1.4.2/logstash.conf
503 6020 51.1 0.9 4276360 931504 pts/1 Sl Apr24 3458:11 /usr/java/jdk1.8.0_25/bin/java -Xmx500m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -jar /opt/logstash-1.4.2/vendor/jar/jruby-complete-1.7.11.jar -I/opt/logstash-1.4.2/lib /opt/logstash-1.4.2/lib/logstash/runner.rb agent -f /opt/logstash-1.4.2/logstash.conf
[ruifengyun@bj-log-1 ~]
看下logstash的进程的状态, 用strace追踪下进程的函数调用。
[ruifengyun@bj-log-1 ~]$ sudo strace -p 5905 Process 5905 attached - interrupt to quit futex(0x7fe1219599d0, FUTEX_WAIT, 5914, NULL
用lsof看到了大量elasticsearch的CLOSE_WAIT的状态,看了下系统的sysctl.conf的配置,对于tcp’的调优已经是配置过了。 但是问题依旧
java 5905 ruifengyun 3594u IPv6 2236112621 0t0 TCP 192.168.1.50:40662->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3595u IPv6 2236150475 0t0 TCP 192.168.1.50:40667->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3596u IPv6 2236192556 0t0 TCP 192.168.1.50:40673->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3597u IPv6 2236236259 0t0 TCP 192.168.1.50:40680->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3598u IPv6 2236277898 0t0 TCP 192.168.1.50:40685->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3599u IPv6 2236314998 0t0 TCP 192.168.1.50:40690->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3600u IPv6 2236355853 0t0 TCP 192.168.1.50:40698->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3601u IPv6 2236394084 0t0 TCP 192.168.1.50:40702->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3602u IPv6 2236439308 0t0 TCP 192.168.1.50:40710->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3603u IPv6 2236481496 0t0 TCP 192.168.1.50:40717->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3604u IPv6 2236520014 0t0 TCP 192.168.1.50:40722->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3605u IPv6 2236564971 0t0 TCP 192.168.1.50:40728->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3606u IPv6 2236585984 0t0 TCP 192.168.1.50:40735->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3607u IPv6 2236604549 0t0 TCP 192.168.1.50:40743->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3608u IPv6 2236642216 0t0 TCP 192.168.1.50:40759->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3609u IPv6 2236681436 0t0 TCP 192.168.1.50:40772->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3610u IPv6 2236723744 0t0 TCP 192.168.1.50:40789->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3611u IPv6 2236623466 0t0 TCP 192.168.1.50:40751->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3612u IPv6 2236742961 0t0 TCP 192.168.1.50:40797->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3613u IPv6 2236662022 0t0 TCP 192.168.1.50:40764->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3614u IPv6 2236761054 0t0 TCP 192.168.1.50:40805->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3615u IPv6 2236702037 0t0 TCP 192.168.1.50:40780->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3616u IPv6 2236779867 0t0 TCP 192.168.1.50:40810->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3617u IPv6 2236798527 0t0 TCP 192.168.1.50:40818->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3618u IPv6 2236818794 0t0 TCP 192.168.1.50:40826->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3619u IPv6 2236840105 0t0 TCP 192.168.1.50:40835->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3620u IPv6 2236861229 0t0 TCP 192.168.1.50:csccfirewall->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3621u IPv6 2236881917 0t0 TCP 192.168.1.50:40852->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3622u IPv6 2236938623 0t0 TCP 192.168.1.50:40873->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3623u IPv6 2236955560 0t0 TCP 192.168.1.50:40881->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3624u IPv6 2236902415 0t0 TCP 192.168.1.50:40860->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3625u IPv6 2236973880 0t0 TCP 192.168.1.50:40890->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3626u IPv6 2236921262 0t0 TCP 192.168.1.50:40865->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3627u IPv6 2237010942 0t0 TCP 192.168.1.50:40906->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3628u IPv6 2237030179 0t0 TCP 192.168.1.50:40915->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3629u IPv6 2236992291 0t0 TCP 192.168.1.50:40898->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3630u IPv6 2237049196 0t0 TCP 192.168.1.50:40920->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3631u IPv6 2237066370 0t0 TCP 192.168.1.50:40929->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3632u IPv6 2237101082 0t0 TCP 192.168.1.50:40946->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3633u IPv6 2237138192 0t0 TCP 192.168.1.50:40962->192.168.1.103:cslistener (ESTABLISHED) java 5905 ruifengyun 3634r FIFO 0,8 0t0 2237149520 pipe java 5905 ruifengyun 3635u IPv6 2237085095 0t0 TCP 192.168.1.50:40937->192.168.1.103:cslistener (CLOSE_WAIT) java 5905 ruifengyun 3636w FIFO 0,8 0t0 2237149520 pipe java 5905 ruifengyun 3637u IPv6 2237119814 0t0 TCP 192.168.1.50:40954->192.168.1.103:cslistener (CLOSE_WAIT)
后来在nginx端做了keepalived保持,对于CLOSE_wait的效果还是有些提升的。 但还是会出现这样的情况,甚是蛋疼 !
试下scribe呢?
做个文件buffer
好吧