解决python elasticsearch的TransportError异常问题

解决python elasticsearch的TransportError异常问题

照例先扯闲话,今天的雾霾终于下去了,风很大,身体有些虚.  是时候该锻炼了. 

收到elasticsearch数据延迟的微信报警。 通过看日志得知consumer进程异常了, ps aux f看了下进程状态貌似是正常.  我们可以确定了41577 是主进程,剩下的都是由41577 spawn出去的. 让我们拿出大杀器,strace

[ruifengyun@wx-buzz-monitor01 shop_scripts]$ ps aux f|grep tran
503      41577  0.1  0.0 316868 31332 pts/9    S+    2015  17:09  |   \_ bulk_transfer
503      41583  0.0  0.0 312232 21888 pts/9    Sl+   2015   6:22  |       \_ bulk_transfer
503      41584  0.0  0.0 312232 21884 pts/9    Sl+   2015   6:19  |       \_ bulk_transfer
503      41585  0.0  0.0 312232 21884 pts/9    Sl+   2015   6:13  |       \_ bulk_transfer
503      41586  0.0  0.0 312232 21852 pts/9    Sl+   2015   6:22  |       \_ bulk_transfer
503      41587  0.0  0.0 312236 21900 pts/9    Sl+   2015   6:24  |       \_ bulk_transfer
503      41588  0.0  0.0 312236 21884 pts/9    Sl+   2015   6:18  |       \_ bulk_transfer
503      41589  0.0  0.0 312236 21888 pts/9    Sl+   2015   6:26  |       \_ bulk_transfer
503      41590  0.0  0.0 312236 21884 pts/9    Sl+   2015   6:20  |       \_ bulk_transfer
503      41591  0.0  0.0 312236 21888 pts/9    Sl+   2015   6:23  |       \_ bulk_transfer
503      41592  0.0  0.0 312240 21892 pts/9    Sl+   2015   6:18  |       \_ bulk_transfer
503      41593  0.0  0.0 312240 21892 pts/9    Sl+   2015   6:21  |       \_ bulk_transfer
503      41594  0.0  0.0 312108 21824 pts/9    Sl+   2015   6:16  |       \_ bulk_transfer
503      41595  0.0  0.0 312240 21896 pts/9    Sl+   2015   6:22  |       \_ bulk_transfer
503      41596  0.0  0.0 312240 21900 pts/9    Sl+   2015   6:18  |       \_ bulk_transfer
503      41597  0.0  0.0 312244 21856 pts/9    Sl+   2015   6:27  |       \_ bulk_transfer

文章写的不是很严谨,欢迎来喷,另外该文后续有更新的,请到原文地址查看更新.

http://xiaorui.cc/2016/01/06/%E8%A7%A3%E5%86%B3python-elasticsearch%E7%9A%84transporterror%E5%BC%82%E5%B8%B8%E9%97%AE%E9%A2%98/

通过strace -p pid看到主进程在等待41591主进程。 我们可以看到bulk_transfer在做futex_wait_bitset操作, 而没有去干活。  另外我这边会有一个逻辑,如果内置队列满2k,暂停工作.   为毛队列一直爆满,而消费进程又在干嘛.

#blog: http://xiaorui.cc

[ruifengyun@wx-buzz-monitor01 shop_scripts]$ strace -p 41577
Process 41577 attached - interrupt to quit
wait4(41591, ^C <unfinished ...>
Process 41577 detached

[ruifengyun@wx-buzz-monitor01 shop_scripts]$ strace -p 41591
Process 41591 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 398485}) = 0 (Timeout)
futex(0x7f9408cff000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1452049478, 527346000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
futex(0x7f9408cff000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1452049479, 38034000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
futex(0x7f9408cff000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1452049479, 548685000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
futex(0x7f9408cff000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1452049480, 59378000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
select(0, NULL, NULL, NULL, {0, 500000}^C <unfinished ...>
Process 41591 detached

下面是详细的程序日志.  TransportError又是TransportError ,上次也是这破问题,当时的解决方法是直接exit退出,然后用supervisord来控制进程. 

2016-01-06 05:21:28,111 - mylogger - INFO - pack cost 120 q_task queue len 0
2016-01-06 05:21:28,112 - mylogger - INFO - q_res queue len 1
Traceback (most recent call last):
  File "bulk_transfer.py", line 163, in <module>
    handle_request()
  File "bulk_transfer.py", line 149, in handle_request
    es.bulk(Q_RES)
  File "/home/ruifengyun/shop_master/shop_scripts/utils.py", line 67, in bulk
    data = helpers.bulk(self.es_conn, actions, stats_only=False, chunk_size=csize)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 176, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 118, in streaming_bulk
    raise e
elasticsearch.exceptions.TransportError: TransportError(503, u'ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2
/no master];]')

TransportError(503, u’ClusterBlockException  ,  这个异常一般是elasticsearch cluster集群出问题引起的。 当elasticsearch集群出问题时候,所有的客户端都会出现这样的报错。 

需要注意的是,不能简单的把这个TransportError异常try execpt过滤,最好在异常后重新创立一个新的elasticsearch连接。 


上次我只是过滤TransportError异常,但等elasticsearch正常后,还是无法正常的入库。 pdb调试了下,貌似连接的状态有问题, 在官方问了issuse也没有得到靠谱的回答。不知道是不是elasticsearch-py的一个bug.

#blog:  http://xiaorui.cc

import time

from elasticsearch.exceptions import TransportError
from elasticsearch import Elasticsearch

es = Elasticsearch()

while 1:
try:
response = es.search(index="test-index", body={"query": {"match_all": {}}})
   return response
except TransportError as e:
time.sleep(5)
es = Elasticsearch()

下面也是常见的elasticsearch python api的异常情况.

class elasticsearch.ConnectionError(TransportError)
Error raised when there was an exception while talking to ES. Original exception from the underlying Connection implementation is available as .info.

class elasticsearch.ConnectionTimeout(ConnectionError)
A network timeout. Doesn’t cause a node retry by default.

class elasticsearch.SSLError(ConnectionError)
Error raised when encountering SSL errors.

class elasticsearch.NotFoundError(TransportError)
Exception representing a 404 status code.

class elasticsearch.ConflictError(TransportError)
Exception representing a 409 status code.

class elasticsearch.RequestError(TransportError)
Exception representing a 400 status code.

class elasticsearch.ConnectionError(TransportError)
Error raised when there was an exception while talking to ES. Original exception from the underlying Connection implementation is available as .info.


大家觉得文章对你有些作用! 如果想赏钱,可以用微信扫描下面的二维码,感谢!
另外再次标注博客原地址  xiaorui.cc

发表评论

电子邮件地址不会被公开。 必填项已用*标注