纠结ps和top的cpu占用率不一致问题

恼怒呀，一个大大的失误，让我折腾了近几个小时. 先表明下这是我自己的原因，问题体现在 linux下 ps aux 和 top 查询的cpu信息不一致导致！

该文章写的有些乱，欢迎来喷 ! 另外文章后续不断更新中，请到原文地址查看更新. http://xiaorui.cc/?p=4470

前因后果是这样的！我这边写了一个后台服务，但是不知道为毛，当任务很多的时候，cpu占用率居然在慢慢的增长，而不是迅速的提高。当任务已经消费完了，cpu占用率居然在慢慢的减少，而不是迅速降低。

为了追踪这个问题，我就在来回的看后台处理逻辑。一般来说，当海量的任务已经到达时，你的cpu居然没有快速提升，那么我们可以粗略的认为，你的服务端消费架构是有问题的，原因可能出在入队列能力，出队列能力，逻辑处理能力等等。通过打日志来排除进程饿死的情况，通过metrics分析服务函数调用时间消耗。最后，最后在一个偶然的机会打开了top，一下子傻了，一下子恍然了。

ps aux是根据各类时间算出的cpu占用率，top是实时的！我以前是清楚这知识点的，记得以前也给同事们解答过这问题。

下图是 TOP 的实时监控数据

下面是 ps aux f拿到的计算后的数据.

那么我这里要套根问底，研究下 ps 的cpu，mem的占用率结果是如何计算出来的.

man ps 说明:

CPU usage is currently expressed as the percentage of time spent running
during the entire lifetime of a process. This is not ideal, and it does not
conform to the standards that ps otherwise conforms to. CPU usage is
unlikely to add up to exactly 100%.

uptime 是系统的开发时间，通过uptime命名可以拿到该数据，pu_time 进程拿到的cpu时间片时间。ps_time 是进程实例化后的时间。

uptime  = total time system has been running.
ps_time = process start time measured in seconds from boot.
pu_time = total time process has been using the CPU.

seconds   = uptime - ps_time

cpu_usage = pu_time * 1000 / seconds

print: cpu_usage / 10 "." cpu_usage % 10

Example:

uptime  = 344,545
ps_time = 322,462
pu_time =   3,383

seconds   = 344,545 - 322,462 = 22,083
cpu_usage = 3,383 * 1,000 / 22,083 = 153

print: 153 / 10 "." 153 % 10 => 15.3

man top 的说明

%CPU — CPU usage
The task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. In a true SMP environment, if ‘Irix
mode’ is Off, top will operate in ‘Solaris mode’ where a task’s cpu usage will be divided by the total number of CPUs. You toggle ‘Irix/Solaris’ modes
with the ‘I’ interactive command.

top里面拿到的数据是从/proc/pid/stats 拿到的，他跟ps不同的是，top会默认每秒做一次数值的计算，这也是top能拿到实时监控数据的原因。

END.

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc