前言:
Python技术群里经常有朋友问我,如何分析python服务里各种诡异的你问题,比如线程不工作、死锁、hang住没动静、不输出日志、不处理任务 等等问题。 我想大家应该在工作中也经常遇到过该问题。我在这里告诉大家如何快速的分析该问题。
请移步到原文地址 http://xiaorui.cc/?p=5219
解决问题:
通常我们可以打印python的函数调用栈stack来分析问题。 好的,我们来跑一个下面的例子,当我们触发信号USR1的时候,该程序会向终端输出python thread stack过程信息。 其实关键方法在于traceback.print_stack(sys._current_frames()[thread.ident]) , sys_current_frames是可以拿到线程的调用栈信息。
#coding: utf-8
# http://xiaorui.cc
# http://github.com/rfyiamcool
import threading, signal, time, os
import sys
import traceback
RUNNING = True
threads = []
def monitoring(tid, itemId=None, threshold=None):
global RUNNING
while(RUNNING):
print ("PID=", os.getpid(), ";id=", tid)
time.sleep(1)
print ("Thread stopped:", tid)
def handler(signum, frame):
print ("Signal is received:" + str(signum))
global RUNNING
RUNNING=False
def print_stack(signum, frame):
print "\n*** STACKTRACE - START ***\n"
for th in threading.enumerate():
print(th)
traceback.print_stack(sys._current_frames()[th.ident])
print("\n")
print "\n*** STACKTRACE - END ***\n"
if __name__ == '__main__':
signal.signal(signal.SIGUSR1, print_stack)
signal.signal(signal.SIGUSR2, handler)
signal.signal(signal.SIGALRM, handler)
signal.signal(signal.SIGINT, handler)
signal.signal(signal.SIGQUIT, handler)
print ("Starting all threads...")
for th in range(10):
thread = threading.Thread(target=monitoring, args=(th,), kwargs={'itemId':'1', 'threshold':60})
thread.start()
threads.append(thread)
c = set()
while 1:
if len(c) == len(threads):
break
for thread in threads:
if thread.is_alive():
thread.join(timeout=1)
else:
c.add(thread)
print ("All threads stopped.”)
下图是打印的结果:
既然有了函数调用栈信息,那么你可以分析各种问题,比如上面说的死锁问题,队列不消费的问题。 比如队列为什么不消费,那么你可以打印消费者的线程来分析过程。
像有些语言是可以通过系统的pstack来打印调用栈的,比如C语言。但pstack不适用于python、java、golang的。 拿python来说,我们通过pstack打印出来的调用栈信息都是python运行时的代码。这些信息虽然确实是pthread线程里真实的函数调用栈区,但对于我们来说,其实并不关心这么底层的stack, 而是要看python抽象的frame stack。当然通过pstack是可以看到一些基本信息的,比如启动线程,sleep,入栈出栈的动作信息。
# xiaorui.cc
Thread 5 (Thread 0x7fcfbd7fb700 (LWP 22143)):
#0 0x00000032544e1623 in select () from /lib64/libc.so.6
#1 0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2 0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3 0x00000000004aa877 in PyEval_EvalCodeEx ()
#4 0x000000000050de58 in function_call ()
#5 0x0000000000419947 in PyObject_Call ()
#6 0x00000000004a7630 in PyEval_EvalFrameEx ()
#7 0x00000000004a924f in PyEval_EvalFrameEx ()
#8 0x00000000004a924f in PyEval_EvalFrameEx ()
#9 0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcfbcdfa700 (LWP 22144)):
#0 0x00000032544e1623 in select () from /lib64/libc.so.6
#1 0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2 0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3 0x00000000004aa877 in PyEval_EvalCodeEx ()
#4 0x000000000050de58 in function_call ()
#5 0x0000000000419947 in PyObject_Call ()
#6 0x00000000004a7630 in PyEval_EvalFrameEx ()
#7 0x00000000004a924f in PyEval_EvalFrameEx ()
#8 0x00000000004a924f in PyEval_EvalFrameEx ()
#9 0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcfa3fff700 (LWP 22145)):
#0 0x00000032544e1623 in select () from /lib64/libc.so.6
#1 0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2 0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3 0x00000000004aa877 in PyEval_EvalCodeEx ()
#4 0x000000000050de58 in function_call ()
#5 0x0000000000419947 in PyObject_Call ()
#6 0x00000000004a7630 in PyEval_EvalFrameEx ()
#7 0x00000000004a924f in PyEval_EvalFrameEx ()
#8 0x00000000004a924f in PyEval_EvalFrameEx ()
#9 0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcfa35fe700 (LWP 22146)):
#0 0x00000032544e1623 in select () from /lib64/libc.so.6
#1 0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2 0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3 0x00000000004aa877 in PyEval_EvalCodeEx ()
#4 0x000000000050de58 in function_call ()
#5 0x0000000000419947 in PyObject_Call ()
#6 0x00000000004a7630 in PyEval_EvalFrameEx ()
#7 0x00000000004a924f in PyEval_EvalFrameEx ()
#8 0x00000000004a924f in PyEval_EvalFrameEx ()
#9 0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fcfcbf78700 (LWP 22136)):
上面说的是多线程打印stack, 多进程的道理是一样的,各自打印罢了。 但如果你是多协程模式,就不能用上面方法了。 比如 gevnet需要用gc.get_objects获取所有对象,然后过滤gr_frame获取gevnet相关的调用栈对象。
def get_traceback():
for obj in gc.get_objects():
if isinstance(obj, greenlet.greenlet):
stack_list = traceback.format_list(
traceback.extract_stack(obj.gr_frame)
)
print('greenlet {}:'.format(obj))
print(''.join(stack_list))
要分析asyncio,那么需要用asyncio.Task.all_tasks来获取所有的协程调用栈对象。
# xiaorui.cc
async def get_traceback(loop):
for task in asyncio.Task.all_tasks(loop):
stack_list = []
for stack in task.get_stack():
stack_list.extend(
traceback.format_list(traceback.extract_stack(stack))
)
print('asyncio task {}:'.format(task))
print(''.join(stack_list))
总结:
不限于语言,任务框架都可以使用该方法来解决问题,大同小异。