前言:
测试golang锁的性能是个无聊的事情,说下缘由吧,前两天朋友问我如果要对一个slice进行线程安全的读写操作,需要读写锁还是互斥锁,哪个性能高一点?
测试
Rwmutex在什么场景下会比mutex性能要好? 在我看来lock和unlock之间,在没有io逻辑,没有复杂的计算逻辑下,mutex互斥锁要比rwlock读写锁更加的高效。像社区里rwlock读写锁的设计实现有好几种,大多是抽象两把lock和reader计数器实现。
该文章后续仍在不断的更新修改中, 请移步到原文地址 http://xiaorui.cc/?p=5611
我对比过在cpp下lock和rwlock的性能对比,在简单赋值的逻辑下,他的benchmark跟我的预测是一样的。也就是说,互斥锁lock要比rwlock读写锁高效。当中间逻辑是一个空io读写操作时,rwlock也要比lock高效,这个也跟我们想的一样。
但当中间逻辑是map查找时,rwlock也要比lock高。但想来map是个复杂的数据结构,当查找key时,需要hashcode计算,然后通过hashcode找到数组里对应的bucket,然后再从链表里找到相关的key。
// xiaorui.cc
简单赋值:
1、raw_lock耗时1.832699s;
2、raw_rwlock耗时3.620338s
io操作:
1、simple_lock耗时14.058138s;
2、simple_rwlock耗时9.445691s
map:
1、lock耗时2.925601s;
2、rwlock耗时0.320296s
比完c++的锁,那么我们对比下golang的sync.rwmutex和sync.mutex的性能。废话不多说,直接贴测试代码。
// xiaorui.cc
package main
// xiaorui.cc
// github.com/rfyiamcool/golib
import (
"fmt"
"sync"
"time"
)
var (
num = 1000 * 10
gnum = 1000
)
func main() {
fmt.Println("only read")
testRwmutexReadOnly()
testMutexReadOnly()
fmt.Println("write and read")
testRwmutexWriteRead()
testMutexWriteRead()
fmt.Println("write only")
testRwmutexWriteOnly()
testMutexWriteOnly()
}
func testRwmutexReadOnly() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.get(in)
}
}()
}
w.Wait()
fmt.Println("testRwmutexReadOnly cost:", time.Now().Sub(t1).String())
}
func testRwmutexWriteOnly() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.set(in, in)
}
}()
}
w.Wait()
fmt.Println("testRwmutexWriteOnly cost:", time.Now().Sub(t1).String())
}
func testRwmutexWriteRead() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
if i%2 == 0 {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.get(in)
}
}()
} else {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.set(in, in)
}
}()
}
}
w.Wait()
fmt.Println("testRwmutexWriteRead cost:", time.Now().Sub(t1).String())
}
func testMutexReadOnly() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.get(in)
}
}()
}
w.Wait()
fmt.Println("testMutexReadOnly cost:", time.Now().Sub(t1).String())
}
func testMutexWriteOnly() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.set(in, in)
}
}()
}
w.Wait()
fmt.Println("testMutexWriteOnly cost:", time.Now().Sub(t1).String())
}
func testMutexWriteRead() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
if i%2 == 0 {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.get(in)
}
}()
} else {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.set(in, in)
}
}()
}
}
w.Wait()
fmt.Println("testMutexWriteRead cost:", time.Now().Sub(t1).String())
}
func newRwmutex() *rwmutex {
var t = &rwmutex{}
t.mu = &sync.RWMutex{}
t.ipmap = make(map[int]int, 100)
for i := 0; i < 100; i++ {
t.ipmap[i] = 0
}
return t
}
type rwmutex struct {
mu *sync.RWMutex
ipmap map[int]int
}
func (t *rwmutex) get(i int) int {
t.mu.RLock()
defer t.mu.RUnlock()
return t.ipmap[i]
}
func (t *rwmutex) set(k, v int) {
t.mu.Lock()
defer t.mu.Unlock()
k = k % 100
t.ipmap[k] = v
}
func newMutex() *mutex {
var t = &mutex{}
t.mu = &sync.Mutex{}
t.ipmap = make(map[int]int, 100)
for i := 0; i < 100; i++ {
t.ipmap[i] = 0
}
return t
}
type mutex struct {
mu *sync.Mutex
ipmap map[int]int
}
func (t *mutex) get(i int) int {
t.mu.Lock()
defer t.mu.Unlock()
return t.ipmap[i]
}
func (t *mutex) set(k, v int) {
t.mu.Lock()
defer t.mu.Unlock()
k = k % 100
t.ipmap[k] = v
}
结果:
测试了多协程使用mutex,rwmutex,在只读,只写,读写的测试场景。 貌似只有在只写的场景下,mutex要比rwmutex高一点。
only read
testRwmutexReadOnly cost: 465.546765ms
testMutexReadOnly cost: 2.146494288s
write and read
testRwmutexWriteRead cost: 1.80217194s
testMutexWriteRead cost: 2.322097403s
write only
testRwmutexWriteOnly cost: 2.836979159s
testMutexWriteOnly cost: 2.490377869s
把map的读写逻辑更换成全局的计数加减。可以发现跟上面的测试结果差不多,只写场景下mutex要比rwlock性能高一点。
only read
testRwmutexReadOnly cost: 10.583448ms
testMutexReadOnly cost: 10.908006ms
write and read
testRwmutexWriteRead cost: 12.405655ms
testMutexWriteRead cost: 14.471428ms
write only
testRwmutexWriteOnly cost: 13.763028ms
testMutexWriteOnly cost: 13.112282ms
sync.RwMutex源码
我们分析下golang sync.RwMutex的实现,他的结构里也是有读锁,写锁,reader计数器的,跟社区里的字段差不多。最大的区别是对于reader的计数使用atomic指令操作,而社区里reader的加减是通过拿互斥锁来实现的。
type RWMutex struct {
w Mutex // held if there are pending writers
writerSem uint32 // semaphore for writers to wait for completing readers
readerSem uint32 // semaphore for readers to wait for completing writers
readerCount int32 // number of pending readers
readerWait int32 // number of departing readers
}
读锁的过程, 他是直接使用atomic来减法操作。当reader小于0,等待读锁。
func (rw *RWMutex) RLock() {
if race.Enabled {
_ = rw.w.state
race.Disable()
}
if atomic.AddInt32(&rw.readerCount, 1) < 0 {
// A writer is pending, wait for it.
runtime_Semacquire(&rw.readerSem)
}
if race.Enabled {
race.Enable()
race.Acquire(unsafe.Pointer(&rw.readerSem))
}
}
释放读锁,也是使用atomic来对计数操作。当没有reader的时候,释放写锁。
func (rw *RWMutex) RUnlock() {
if race.Enabled {
_ = rw.w.state
race.ReleaseMerge(unsafe.Pointer(&rw.writerSem))
race.Disable()
}
if r := atomic.AddInt32(&rw.readerCount, -1); r < 0 {
if r+1 == 0 || r+1 == -rwmutexMaxReaders {
race.Enable()
throw("sync: RUnlock of unlocked RWMutex")
}
// A writer is pending.
if atomic.AddInt32(&rw.readerWait, -1) == 0 {
// The last reader unblocks the writer.
runtime_Semrelease(&rw.writerSem, false)
}
}
if race.Enabled {
race.Enable()
}
}
写锁的过程,首先判断是否有读,有读,等待读来唤醒。释放写锁的时候,也会把读锁释放,释放了,自然就把Rlock的协程唤醒。
// xiaorui.cc
func (rw *RWMutex) Lock() {
if race.Enabled {
_ = rw.w.state
race.Disable()
}
// First, resolve competition with other writers.
rw.w.Lock()
// Announce to readers there is a pending writer.
r := atomic.AddInt32(&rw.readerCount, -rwmutexMaxReaders) + rwmutexMaxReaders
// Wait for active readers.
if r != 0 && atomic.AddInt32(&rw.readerWait, r) != 0 {
runtime_Semacquire(&rw.writerSem)
}
if race.Enabled {
race.Enable()
race.Acquire(unsafe.Pointer(&rw.readerSem))
race.Acquire(unsafe.Pointer(&rw.writerSem))
}
}
func (rw *RWMutex) Unlock() {
if race.Enabled {
_ = rw.w.state
race.Release(unsafe.Pointer(&rw.readerSem))
race.Release(unsafe.Pointer(&rw.writerSem))
race.Disable()
}
// Announce to readers there is no active writer.
r := atomic.AddInt32(&rw.readerCount, rwmutexMaxReaders)
if r >= rwmutexMaxReaders {
race.Enable()
throw("sync: Unlock of unlocked RWMutex")
}
// Unblock blocked readers, if any.
for i := 0; i < int(r); i++ {
runtime_Semrelease(&rw.readerSem, false)
}
// Allow other writers to proceed.
rw.w.Unlock()
if race.Enabled {
race.Enable()
}
}
总结:
没什么好总结的,锁竞争一直是高并发系统的一个问题。对于上面map + mutex的使用,我们可以用1.9之后的sync.Map替换。在读多写少下,sync.Map的性能要比sync.RwMutex + map高的多。
大家可以看了sync.Map的实现原理后,会发现他的写性能不高,读是可以通过copy on write的方式无锁读,但是写操作还是会有锁的。我们可以使用类似java concurrentMap分段锁的方法来分解锁竞争的压力。
解决锁竞争问题,除了上面的分段锁,还可以通过atomic cas指令来实现乐观锁。