分析openresty redis的长连接问题

前言:

为了避免被勿喷，标题中提到的openresty坑，其实是 resty库使用不当，或者业务逻辑导致的。我这边的高频接口多使用openresty构建的api, 这次遇到性能上不去的问题。通过不断的追加的日志和strace系统调用，发现openresty redis没有使用长连接，更没有连接池…. 下面是追查长连接问题的思路。

以前写过一篇关于内存泄露的问题，有兴趣可以瞅瞅, http://xiaorui.cc/?p=4784

resty.redis

redis 命令监控:

1503707472.401523 [0 127.0.0.1:57076] "auth" "123123"
1503707472.439407 [0 127.0.0.1:57076] "hexists" "sync_pla
"
1503707473.057015 [0 127.0.0.1:57082] "auth" "123123"
1503707473.094836 [0 127.0.0.1:57082] "hexists" "sync_pla
"
1503707473.694647 [0 127.0.0.1:57086] "auth" "123123"
1503707473.732524 [0 127.0.0.1:57086] "hexists" "sync_pla
"
1503707474.308147 [0 127.0.0.1:57092] "auth" "123123"
1503707474.346172 [0 127.0.0.1:57092] "hexists" "sync_pla

追踪openresty工作进程worker的系统调用:

connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)

系统调用统计:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.54    0.001000           6       174           epoll_wait
 24.22    0.000421          19        22           writev
  8.75    0.000152           7        22        22 connect
  6.44    0.000112           3        44           close
  1.50    0.000026           0        88           sendto
  1.09    0.000019           0       133           write
  0.46    0.000008           0       175           gettimeofday
  0.00    0.000000           0        22           ioctl
  0.00    0.000000           0        22           socket
  0.00    0.000000           0       174        20 recvfrom

openresty的各个worker都未绑定6379的长连接.

openresty 2976 nobody    3w   REG              202,1  4148486  1338754 /usr/local/openresty/nginx/logs/error.log
openresty 2976 nobody    6u  IPv4           39140146      0t0      TCP *:88 (LISTEN)
openresty 2976 nobody    7w   REG              202,1   313582  1338761 /usr/local/openresty/nginx/logs/access.log
openresty 2976 nobody    8u  IPv4           40715012      0t0      TCP xxxxx:42815->xxxxx:80 (ESTABLISHED)
openresty 2976 nobody    9u  unix 0xffff880108fb63c0      0t0 40681448 socket
openresty 2976 nobody   10u   REG                0,9        0     3919 [eventpoll]
openresty 2976 nobody   11u   REG                0,9        0     3919 [eventfd]
openresty 2976 nobody   12u  IPv4           40681475      0t0      UDP xxxx:15451->xxxx:53
openresty 2976 nobody   13u  IPv4           40715040      0t0      TCP xxxx:21521->xxxx:80 (ESTABLISHED)

resty.http

按照openresty lua-resty-redis模块文档说明，代码中其实使用了长连接的配置。但事实，当一个协程使用完连接后，没有塞回连接队列. 对的，没有归还连接？既然没有还回去，其他的请求必然找不到可用的，只能重建新连接。一开始虽然调用了 redis_conn:set_keepalive 方法，但是没有放对地方，set_keepalive应该放在return前面。。。也就是说，你每次new的时候，他其实是从全局队列里获取可用的连接.

set_keepalive用法:

syntax: ok, err = httpc:set_keepalive(max_idle_timeout, pool_size)

错误代码演示:

# xiaorui.cc
ok, err = red:set("dog", "an animal")
if not err then
        ngx.say("ok")
        red:set_keepalive(600000, 500)  -- 这个漏掉了...
       return
end

red:set_keepalive(600000, 500)

另外，我们可以通过 redis_conn:get_reused_times 方法获取该连接被使用的次数，如果为0，我们就把它作为新连接，针对新连接可以做 select db, auth密码认证。如果不加新连接的判断，你会不停的触发redis_conn.auth, select db操作… 多废了两次网络io…

我简单看了下resty.redis.get_reused_times的实现方法，开辟一个lua_shared_dict，接着针对每个连接做计数。

# xiaorui.cc

-- 获取 redis 连接
-- :rtype ok, err
local function get_redis_connect()
    -- 获取 redis 连接实例
    local redis_conn = redis:new()

    -- 设定 redis 连接信息
    local redis_ip   = config["redis"]["conn"]["host"]  -- ip 地址
    local redis_port = config["redis"]["conn"]["port"]  -- 端口号
    local redis_auth = config["redis"]["conn"]["auth"]  -- 密码

    -- 获取 Redis 连接
    local ok, err = redis_conn:connect(redis_ip, redis_port)
    if not ok then
        utils.error(string.format("Redis 连接失败: %s", err))
        return ok, err
    end

    -- 如果密码为空则不需要进行密码验证
    if redis_auth ~= "" then
        -- 如果连接来自于连接池中，get_reused_times() 永远返回一个非零的值
        -- 只有新的连接才会进行授权
        count, err = redis_conn:get_reused_times()
        if count == 0 then
            ok, err = redis_conn:auth(redis_auth)
            if not ok then
                utils.error(string.format("redis 授权失败: %s", err))
                return ok, err
            end
        end
    end
    return redis_conn, err
end

Openresty里好用的http client也就lua-resty-http了， lua-resty-http 默认实现了连接池。但因为业务上的需求会再次request第三方的接口，这时候如果第三方的接口处理时间过长，作为调用方的openresty http client 会造成大量的新连接的建立。

openresty 16499 nobody  291u  IPv4           40783314      0t0      TCP localhost:35450->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  292u  IPv4           40783317      0t0      TCP localhost:39414->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  294u  IPv4           40783319      0t0      TCP localhost:35454->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  295u  IPv4           40783322      0t0      TCP localhost:39418->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  297u  IPv4           40783324      0t0      TCP localhost:35458->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  298u  IPv4           40783327      0t0      TCP localhost:39422->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  300u  IPv4           40783329      0t0      TCP localhost:35462->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  301u  IPv4           40783332      0t0      TCP localhost:39426->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  303u  IPv4           40783334      0t0      TCP localhost:35466->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  304u  IPv4           40783337      0t0      TCP localhost:39430->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  305u  IPv4           40783338      0t0      TCP localhost:35470->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  306u  IPv4           40783341      0t0      TCP localhost:39434->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  307u  IPv4           40783342      0t0      TCP localhost:39436->localhost:8001 (ESTABLISHED)
openresty 16499 nobody  308u  IPv4           40783343      0t0      TCP localhost:39438->localhost:8001 (ESTABLISHED)

虽然这么多在线连接，但对方这么多的连接一点业务数据都没有吐回来，造成了client一直占用该连接，其他人拿不到可用的连接，另外你就算把连接池放大，本地的发起端口也只有理论的65535。

gettimeofday({1503711999, 568024}, NULL) = 0
write(4, "2017/08/26 09:46:39 [info] 16499"..., 85) = 85
epoll_wait(10,

怎么解决？针对我这个需求, 可以加 timeout 超时中断请求的。

条件判断

不得不说，还有一个lua条件判断的坑，每次都是豁然惊醒，事后，又再次犯错。。。在条件判断里，0，空字符居然为真 ….

-- refresh patch
if task_type == "refresh" then
    local uri_batch_str = ""
    for _, urls in pairs(tasks["task_body"]["url_list"]) do
        -- 获取该URL中域名
        domain_name = utils.get_domain_name(urls["url"])
        uri = vars.request_uri

        -- if not uri_batch_str then  以前的代码
        if uri_batch_str == "" then  -- 修复后的代码

            uri_batch_str = ngx.md5(domain_name..uri)
        else
            uri_batch_str = uri_batch_str..","..ngx.md5(domain_name..uri)
        end
    end
    utils.info("refresh params is "..uri_batch_str)

    local p2p_refresh_url  = config["api"]["p2p_refresh_api"]..uri_batch_str
    local status, body = utils.requests(p2p_refresh_url, 'GET', "")

    if status then
        local js = json.decode(body)
        if js["state"]["code"] ~= 0 then
            -- 0: 成功, -1: 参数不全, -2: 执行失败

END.

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc