打印python线程stack分析当前上下文

前言:

Python技术群里经常有朋友问我,如何分析python服务里各种诡异的你问题,比如线程不工作、死锁、hang住没动静、不输出日志、不处理任务 等等问题。 我想大家应该在工作中也经常遇到过该问题。我在这里告诉大家如何快速的分析该问题。

请移步到原文地址 http://xiaorui.cc/?p=5219

解决问题:

通常我们可以打印python的函数调用栈stack来分析问题。 好的,我们来跑一个下面的例子,当我们触发信号USR1的时候,该程序会向终端输出python thread stack过程信息。 其实关键方法在于traceback.print_stack(sys._current_frames()[thread.ident]) , sys_current_frames是可以拿到线程的调用栈信息。

#coding: utf-8
# http://xiaorui.cc
# http://github.com/rfyiamcool

import threading, signal, time, os
import sys
import traceback


RUNNING = True
threads = []


def monitoring(tid, itemId=None, threshold=None):
    global RUNNING
    while(RUNNING):
        print ("PID=", os.getpid(), ";id=", tid)
        time.sleep(1)
    print ("Thread stopped:", tid)


def handler(signum, frame):
    print ("Signal is received:" + str(signum))
    global RUNNING
    RUNNING=False


def print_stack(signum, frame):
    print "\n*** STACKTRACE - START ***\n"
    for th in threading.enumerate():
        print(th)
        traceback.print_stack(sys._current_frames()[th.ident])
        print("\n")
    print "\n*** STACKTRACE - END ***\n"


if __name__ == '__main__':
    signal.signal(signal.SIGUSR1, print_stack)
    signal.signal(signal.SIGUSR2, handler)
    signal.signal(signal.SIGALRM, handler)
    signal.signal(signal.SIGINT, handler)
    signal.signal(signal.SIGQUIT, handler)


    print ("Starting all threads...")
    for th in range(10):
        thread = threading.Thread(target=monitoring, args=(th,), kwargs={'itemId':'1', 'threshold':60})
        thread.start()
        threads.append(thread)

    c = set()
    while 1:
        if len(c) == len(threads):
            break

        for thread in threads:
            if thread.is_alive():
                thread.join(timeout=1)
            else:
                c.add(thread)

    print ("All threads stopped.”)

下图是打印的结果:

既然有了函数调用栈信息,那么你可以分析各种问题,比如上面说的死锁问题,队列不消费的问题。 比如队列为什么不消费,那么你可以打印消费者的线程来分析过程。

像有些语言是可以通过系统的pstack来打印调用栈的,比如C语言。但pstack不适用于python、java、golang的。 拿python来说,我们通过pstack打印出来的调用栈信息都是python运行时的代码。这些信息虽然确实是pthread线程里真实的函数调用栈区,但对于我们来说,其实并不关心这么底层的stack, 而是要看python抽象的frame stack。当然通过pstack是可以看到一些基本信息的,比如启动线程,sleep,入栈出栈的动作信息。

# xiaorui.cc

Thread 5 (Thread 0x7fcfbd7fb700 (LWP 22143)):
#0  0x00000032544e1623 in select () from /lib64/libc.so.6
#1  0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2  0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3  0x00000000004aa877 in PyEval_EvalCodeEx ()
#4  0x000000000050de58 in function_call ()
#5  0x0000000000419947 in PyObject_Call ()
#6  0x00000000004a7630 in PyEval_EvalFrameEx ()
#7  0x00000000004a924f in PyEval_EvalFrameEx ()
#8  0x00000000004a924f in PyEval_EvalFrameEx ()
#9  0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcfbcdfa700 (LWP 22144)):
#0  0x00000032544e1623 in select () from /lib64/libc.so.6
#1  0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2  0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3  0x00000000004aa877 in PyEval_EvalCodeEx ()
#4  0x000000000050de58 in function_call ()
#5  0x0000000000419947 in PyObject_Call ()
#6  0x00000000004a7630 in PyEval_EvalFrameEx ()
#7  0x00000000004a924f in PyEval_EvalFrameEx ()
#8  0x00000000004a924f in PyEval_EvalFrameEx ()
#9  0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcfa3fff700 (LWP 22145)):
#0  0x00000032544e1623 in select () from /lib64/libc.so.6
#1  0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2  0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3  0x00000000004aa877 in PyEval_EvalCodeEx ()
#4  0x000000000050de58 in function_call ()
#5  0x0000000000419947 in PyObject_Call ()
#6  0x00000000004a7630 in PyEval_EvalFrameEx ()
#7  0x00000000004a924f in PyEval_EvalFrameEx ()
#8  0x00000000004a924f in PyEval_EvalFrameEx ()
#9  0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcfa35fe700 (LWP 22146)):
#0  0x00000032544e1623 in select () from /lib64/libc.so.6
#1  0x00007fcfc5507479 in time_sleep () from /usr/local/lib/python2.7/lib-dynload/time.so
#2  0x00000000004a8e6d in PyEval_EvalFrameEx ()
#3  0x00000000004aa877 in PyEval_EvalCodeEx ()
#4  0x000000000050de58 in function_call ()
#5  0x0000000000419947 in PyObject_Call ()
#6  0x00000000004a7630 in PyEval_EvalFrameEx ()
#7  0x00000000004a924f in PyEval_EvalFrameEx ()
#8  0x00000000004a924f in PyEval_EvalFrameEx ()
#9  0x00000000004aa877 in PyEval_EvalCodeEx ()
#10 0x000000000050dd5e in function_call ()
#11 0x0000000000419947 in PyObject_Call ()
#12 0x00000000004224ef in instancemethod_call ()
#13 0x0000000000419947 in PyObject_Call ()
#14 0x00000000004a2633 in PyEval_CallObjectWithKeywords ()
#15 0x00000000004e26a2 in t_bootstrap ()
#16 0x0000003254807aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032544e8bcd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fcfcbf78700 (LWP 22136)):

上面说的是多线程打印stack,  多进程的道理是一样的,各自打印罢了。  但如果你是多协程模式,就不能用上面方法了。 比如 gevnet需要用gc.get_objects获取所有对象,然后过滤gr_frame获取gevnet相关的调用栈对象。

def get_traceback():
    for obj in gc.get_objects():
        if isinstance(obj, greenlet.greenlet):
            stack_list = traceback.format_list(
                traceback.extract_stack(obj.gr_frame)
            )
            print('greenlet {}:'.format(obj))
            print(''.join(stack_list))

要分析asyncio,那么需要用asyncio.Task.all_tasks来获取所有的协程调用栈对象。

# xiaorui.cc

async def get_traceback(loop):
    for task in asyncio.Task.all_tasks(loop):
        stack_list = []
        for stack in task.get_stack():
            stack_list.extend(
                traceback.format_list(traceback.extract_stack(stack))
            )
        print('asyncio task {}:'.format(task))
        print(''.join(stack_list))

总结:

不限于语言,任务框架都可以使用该方法来解决问题,大同小异。 


大家觉得文章对你有些作用! 如果想赏钱,可以用微信扫描下面的二维码,感谢!
另外再次标注博客原地址  xiaorui.cc