-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
We found that gh-ost always requires a lock time of 1 second or even longer during the cutOver phase, regardless of whether the database is busy.
Log:
Done waiting for events up to lock; duration=975.482481ms
This indicates that the write lock duration on the change table is approximately 1 second.
The investigation revealed that the source of the long-term lock holding was time.Sleep().
Source code:
func (this *Migrator) executeWriteFuncs() error {
...
for {
select {
case eventStruct := <-this.applyEventsQueue:
{
if err := this.onApplyEventStruct(eventStruct); err != nil {
return err
}
}
default:
{
select {
case copyRowsFunc := <-this.copyRowsQueue:
{
...
}
default:
{
// Hmmmmm... nothing in the queue; no events, but also no row copy.
// This is possible upon load. Let's just sleep it over.
this.migrationContext.Log.Debugf("Getting nothing in the write queue. Sleeping...")
time.Sleep(time.Second)
}
}
}
}
}
}
The presence of time.Sleep(time.Second) causes the waitForEventsUpToLock function in the cutOver phase to be unable to quickly obtain the binlog processing completion notification, resulting in an excessively long locking time.
Because the binlog processing completion notification is in the this.applyEventsQueue, it is ultimately triggered by the execution of the this.onApplyEventStruct() function.
I tried removing time.Sleep and running gh-ost again to get the lock time.
Done waiting for events up to lock; duration=9.305327ms
Thanks