技术分享《分布式一致性raft算法实现原理》

这段时间跟同事聊了不少分布式系统中的常用算法协议，中间有聊到分布式一致性的话题，当然我们对一致性理解都是那种介绍的层次。哈哈，后来用了心思去学习分布式一致性协议 raft, 现在有些心得，就拿出来给同事分享下。

先简单聊下什么是raft协议. 他用来做什么的？分布式存储系统通常会维护多个副本，这样不仅能提高系统的可用性，因为有多个副本所以性能也有提高。但是多副本带来的代价就是分布式存储系统的核心问题之一：需要维护多个副本的数据一致性。 Raft一致性协议就是用来干这事的，即使在部分副本宕机的情况下，只要符合raft的原则，照样可以对外提供服务。

Raft是一种较容易理解的一致性协议。我曾经也写过工夫去学习paxos，结果…. 我想大家都懂的，没搞明白。学习paxos的过程是有些痛苦的，国内的一些文档说的不明不白的，国外的文档又太有深度。。。现在只是浅薄的看懂他在正常情况下的选举，日志复制，分区容错，但是paxos对于错误的处理，有不少知识点不是很理解。

Raft是个好东西呀，我以前用的influxdb，现在用的etcd,consul 都是采用Raft来确保数据的一致性。为了做raft的话题分享，硬着头皮看了国外的raft说明文档，有些醉心. 我一般分享不会录制视频，所以尽量会把ppt做的详细点，有点raft基础的人，应该可以流畅的看下去。

PDF地址:

http://static.xiaorui.cc/raft_design.pdf

slideshare.net：

<br />

Raft from rfyiamcool

slideshare是需要翻墙才能访问的，另外把 Raft分享的摘要贴出来。

1. 分布式一致性raft实现原理 - 峰云就她了 - xiaorui.cc
2. 什么是一致性协议 ? raft有哪些特点 ? raft vs paxos ? raft的构成组件及实现原理 ? 各种所谓奇葩的raft场景 ? 如何实现raft ? 介绍
3. 单节点环境 client server 存在数据一致性问题 ?
4. 多节点环境 node 1 node 3 node 2 那么如何保证数据的一致性 ?
5. 角色 Follower Candidate Leader
6. KeyWorld 定时器 Term 时间片 Term ID N/2 + 1 Heartbeats
7. KeyWorld 选举成Leader需提供TermID 和 LogIndex Leader 绝对不会删除自己的日志 客户端自己携带ID帮助raft保持幂等性 一条记录提交了，那么它之前的记录一定都是 commited.
8. KeyWorld 节点之间的Term和索引一致, 我们就认为数据是 一致的. 在一个Term里只会有一个Leader 每个Follower只能选一个Leader
9. KeyWorld currentTerm 服务器最后一次知道的任期号（初始化为 0，持续递增） voteFor 在当前获得选票的候选人的 Id log[] 日志条目集( 状态机指令及TermId ) commitIndex 已知最大的索引值 nextIndex[] 每个follower的下一个索引值
10. Vote RPC Term 候选人的任期号 candidateid ID lastLogIndex 候选人的最后日志的索引值 lastLogTerm 候选人最后日志的任期号 Term 当前的任期号, 用于领导人去更新自己 voteGranted True or False
11. most simple election vote for me vote for me OK ! OK !
12. C-1 simple election F-2 F-1 vote for me vote for me NO timer 155 Term 2 Timer 170 Term 3 Condition比Follwer的term id小 不影响 “F” 定时器在转 ! C 已得知情况, 故意Vote超时, 等他人选举 . Timer 183 Term 3
13. C-1 simple election RequestVote(term=2) voteGranted=true, term=2 C-2 same term id wait timeout! NO ! Term not match RequestVote(term=2)
14. hard election -1 vote for me OK ! vote for me not term match term conflict not n/2 + 1 OK ! 都变为一个term id !
15. summery election 过程 定时器触发, followers把current_term_id + 1 改变成candidate状态 发送RequestVoteRPC请求 结果 成功选举 别人被选 重新选
16. Client Works with leader Leader return to response when it commits an entry ! Assign uniquqeID to every command , Leader store latest ID with response.
17. client process Only log entry ! 1 Hello 2 Raft 1 Hello 2 Raft 1 Hello 2 Raft
18. Log Replication 默认心跳为 50 ms 默认心跳超时为 300ms 每次心跳的时候做 Log entry commit 超过 n/2+1 就算成功
19. Log RPC Term 领导人的任期号 LeaderID 领导人的 Id，以便于跟随者重定向请求 pervLogIndex 新的日志条目紧随之前的索引值 entries[] 需要存储当然日志条目（表示心跳时为空；一次性发送多个是为了 提高效率） LeaderCommit 领导人已经提交的日志的索引值 Term 当前的任期号, 用于领导人去更新自己 success 跟随者包含了匹配上 prevLogIndex 和 prevLogTerm 的日志时为真
20. log replication - 1 Heartbaet & Append Entries1 Hello 1 Hello 1 Hello Heartbaet & Append Entries Only log entry !
21. log replication - 2 OK ! 1 Hello 1 Hello 1 Hello OK ! Leader commit !
22. Le_1 log replication - 3 F_2 F_1 Heartbaet & commit1 Hello Heartbaet & commit 1 Hello 1 Hello Follower commit !
23. 常见疑难杂症
24. Le_1 if a node reply timeout ？ F_2 F_1 Heartbaet & commit 1 Hello 1 Hello 1 Hellotimeout !!! F_2 如何保持数据一致性 ? Leader会重试 !
25. Le_1 Leader crash F_2 F_1 Log entry Ack 1 Hello 1 Hello 1 Hello Leader在本地commit后, 发给follower commit 之前crash ! Hello 还在么？ F_3 1 Hello
26. Le_1 Follower crash F_2 F_1 prevLogIndex 1 Hello 2 Raft F_3 crash重新启动后如何平衡数据. F_3 1 Hello 2 Raft 1 Hello 2 Raft 1 Hello 2
27. Network Partition
28. Le_1 正常情况 F_2 F_1 Heartbaet & commit 1 Hello F_3 F_4 1 Hello 1 Hello 1 Hello 1 Hello
29. Le_1 网络分区 F_2 F_1 Request Vote 1 Hello F_3 F_4 1 Hello 1 Hello 1 Hello 1 Hello 两个人怎么够法定人数 ! ! ! Vote Granted
30. Le_1 新集群正常 F_2 F_1 Heartbeat & Log entry & commit 1 Hello 2 Tim F_3 F_4 1 Hello 2 Ying 1 Hello 2 Ying 1 Hello 2 Tim 1 Hello 2 Ying 两个人怎么够法定人数 ! ! !
31. Le_1 网络恢复 F_2 F_1 Heartbeart & Append Log Entries 1 Hello Le_2 F_4 1 Hello 2 Ying 1 Hello 2 Ying 1 Hello 1 Hello 2 Ying 网络好了后, 开始抢夺Leader Le_1 term 小于 Le_2 !
32. 一致性 F_2 F_1 Heartbeat & Log entry & commit Le_2 F_4 1 Hello 2 Ying 1 Hello 2 Ying 1 Hello 2 Ying F_5 1 Hello 2 Ying 1 Hello 2 Ying
33. 冲突Split brain 如符合法定人数并产生了N条数据 与 新集群怎么保持数据一致性 覆盖 VS 合并 ? 被分区前有些node没有收到commit ? timer check
34. 预防Split brain 单播制定节点 指定法定人数 , 每次addreduce都需要更改 加大timeout , retry 统一 client 入口 , But … 监控脑裂情况, 反查各个node的leader是否一致
35. 复杂一致性 1 2 3 4 5 6 7 8 9 10 S1 44 44 55 66 77 80 89 90 S2 44 44 55 66 77 80 89 S3 44 44 55 66 77 S4 44 44 55 70 70 85 85 S5 44 44 55 70 70 85 index Host term id 每个方格为Log entry
36. Log compress 1 2 3 4 5 6 7 8 9 10 S1 44 44 55 66 77 80 89 90 index Snapshot Last included index : 6 Last included term : 80 state macheie state: x <— 0 y <— 9 all commited !!!
37. study 动画演示: https://ongardie.github.io/raft-talk-archive/2015/buildstuff/raftscope-replay/ 文档: http://en.youscribe.com/catalogue/tous/professional-resources/it-systems/raft- in-search-of-an-understandable-consensus-algorithm-2088704 Googole …
38. Q & A

END.

大家觉得文章对你有些作用！如果想赏钱，可以用微信扫描下面的二维码，感谢!
另外再次标注博客原地址 xiaorui.cc

巷子口 2017年7月12日 / 下午7:04

hi，博主
你好，最近在用consul，因为raft协议有点难懂且国内博文较少，希望博主还是写一篇详细的博文或者是录制一份视频来造福苍生！
谢谢！！！[/强]

回复
厉害了我的哥 2016年11月17日 / 上午7:52

这个厉害呀！！！

回复
琴 2016年7月10日 / 上午8:04

你的分享有些频繁呀公司的技术气氛这么浓？

回复
- 峰云就她了 2016年7月13日 / 下午10:02
  
  恩，公司在招Python工程师，有兴趣可以试试
  
  回复
  - 找工作 2016年7月26日 / 上午12:12
    
    有jd么？
    
    回复
刘 2016年7月10日 / 上午12:13

楼主使用哪个项目做得博客？看起来很棒！

回复
wwek 2016年7月9日 / 下午9:14

很好

回复

技术分享《分布式一致性raft算法实现原理》

7 Responses

发表评论取消回复

7 Responses

发表评论 取消回复

发表评论取消回复