技术分享之分布式行情推送系统(golang)

前言：

行情推送主要是为了解决交易所的产生的实时数据，兼容了前端、移动端和量化程序。为了解决行情推送的高性能和高可用，着实做了很多的调优，优化涵盖了架构、中间件和golang的机制优化。线上的推送集群可稳定支撑近35w的客户端。

该文章后续仍在不断的更新修改中，请移步到原文地址 http://xiaorui.cc/?p=6250

内容:

行情推送采用golang语言开发的，为啥用golang，简单实用，高并发。

推送系统从架构上来说，主要分了推送网关和推送业务服务，网关主要是为了解决鉴权和协议编码，业务端主要是维护订阅关系及缓存逻辑。

关于分享的ppt放到github了，有兴趣的可以看看。 https://github.com/rfyiamcool/share_ppt/blob/master/push_cluster.pdf

github对于大文件的访问，时常有些抽风导致浏览失败，可直接下载 http://xiaorui.cc/static/push_cluster.pdf

截图：

下面是部分截图.

检索：


// xiaorui.cc

分布式⾏行行情推送系统
rfyiamcool
xiaorui.cc
github.com/rfyiamcool

1

架构介绍

2

性能优化

3

排坑记

4

总结

1

架构设计

推送集群架构

组件介绍
push-router
服务发现注册及调度
push-gateway
鉴权及协议转换
⽀持websocket,
grpc, grpc-web,
kcp

push-server
业务逻辑, 维护订阅关系及缓存
push-control
集群管理
nats-bus
消息总线

技术选型
golang

mq

proto
grpc

json

envoy -> grpc-web

protobuf

websocket
pb over tcp
pb over kcp

nats-stream
cache
redis
database
scylla

技术选型
grpc
内部量化
grpc-web
web前端
websocket
供外部量化api

移动端
pb over tcp
pb over kcp …

grpc
优点
protobuf
HTTP2

⽀持各类语⾔
基于http2兼容好
⽀持bidi全双⼯通信模式

TLS
TCP

protobuf
⾼性能序列化
压缩

服务注册发现
Client

push-router

register/hb/stats
register
discovery

push-gateway
push-gateway

discovery

push-server
push-server

grpc-web in fe
chrome

unary

server side stream

sub/unsub/getHistory

push-gateway

push-gateway

push-server

push-server

😅 不⽀持grpc bidi模式

多级缓存
push-server
初始化缓存数据

push-server

expire 24h in cache
query timestamp in cache

>15 d

< 15d in redis
< 15 d

> 15d in scylla

redis

redis

redis

scylla

scylla

scylla

scylla

2

性能优化

推送优化

当某个topic的订阅者超过⼀个量级
并发触发会更快
协程池有效减少栈扩充消耗
切chunk可减少协程池调度

推送优化
kline订阅关系变更
⽤户的订阅和撤销订阅，以及上下线
发布订阅消息时需遍历订阅客户端
kline是双层嵌套的map, 频繁变更带来锁竞争
kline map
ticker_btc:usdt
client1
client2
…
ticker_btc:eth
client1
client3

嵌套map改成分段为1024个map结构
锁粒度尽量降低到topic级别

grpc连接池
为什么要使⽤连接池?

push-gateway

stream复⽤产⽣了锁竞争
benchmark
1 client, 100 goroutine, 8w qps

x 10
grpc bidi

x 10

1 client, 300 goroutine, 5w qps
10 client, 300 goroutine, 15w qps
50 client, 300 goroutine, 30w qps

push-server

push-server

push-server

grpc连接池

减少系统调⽤

msg-1

msg-1

msg-2

msg-3

msg-2
msg-3

sender

socket.read/write

msg-4
msg-5

msg-1

msg-2

msg-3

⼼跳定时器优化
升级golang版本到1.10.3以上
runtime改进为p个timer定时器
严重的锁竞争
业务上允许低时间精度

实现⾃定义时间轮
锁分散到每个槽位
使⽤map存储定时任务
损失精度来减少锁竞争

定时器优化
Ticker

按照刻度扫描

最大程度减少
锁冲突 !!!

回调队列

callBack
caller

struct timerEntry {
map
cas
}

callBack
caller

缓存优化
cache

Rwlock

优化为分段锁 !

cache

cache

cache

Rwlock

Rwlock

Rwlock

⼴播惊群
push-server-1

push-server-2

msg1;msg2

share-topic

msg1;msg2

producer

push-server-1

msg1;msg2

nasts-bus

msg1

push-topic1

订单只需推给相关买卖家
push-server被频繁唤醒

push-server-2

msg2

push-topic1

msg1;msg2

producer

加⼤吞吐
event wait

goroutine

goroutine

Chan

pipeline

goroutine

goroutine
event wait
goroutine

Chan

goroutine
goroutine

并发单请求合并成pipeline访问redis
减少系统调⽤开销

pipeline

⽇志引起的问题
logger

logger

golang线程数增多
宿主机disk io有时飙⾼, 引起写⽇志阻塞
继⽽造成runtime sysmon检测
由于syscall长时间阻塞, 启⽤新线程绑定P
线程数不会减少
由于disk io阻塞业务协程, 造成时延升⾼
writer

pipeline flush

golang in docker
golang默认P的数量为取cpu core

P

docker内cpuinfo为宿主机配置

P

P

P

P数的增多会增加runtime消耗

P

P

P

根据docker的cpu-quota来动态配置P

P

P

P

cpu 64 core

http://github.com/uber-go/automaxprocs

panic: send on closed channel
现状
每个client会有读写协程及chan

问题
当⽤户关闭退出时, 如何清理回收 ?
⽅法
不主动关闭channel
关闭context通知
解绑 topic-> client对应关系及删除

⾼可⽤性
if push-router crash ?
多个router由envoy-ingress负载均衡
if push-gateway crash?
push-router会得知健康状态
客户端从push-router获取可⽤的gateway

if push-server crash ?
gateway从router选择最优push-server
下发⾏情订阅请求
通过上次的ack id下发⽤户订单订阅

内核优化

开启bbr拥塞控制算法
国内不明显
外国效果明显

各类优化
必须注意锁竞争的问题
使⽤sync.pool缓存频繁的堆对象
bytes.Buffer复⽤
减少系统调⽤
优化defer的调⽤

通过pipeline提⾼各端的效率
协程池
控制并发
消除⽑刺
减少more stack

3

排坑记

⼤坑
grpc-web
不⽀持bidi mode
需要envoy做协议转换
goroutine per connnect模式造成协程过多
kcp的表现并不美好

⼩坑
runtime gFree & allgs
runtime开销
过多的退出协程
过多的休眠协程
gc, sysmon, deadlock check …

netpoll vs raw epoll

go

go

conn.read

conn.read

go

go

go

…
conn.read

netpoll in golang runtime

go

raw epoll

conn.write

Q&A

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc