Skip to content

Optimize the lock time in the cutOver phase #1630

@dongwenpeng

Description

@dongwenpeng

We found that gh-ost always requires a lock time of 1 second or even longer during the cutOver phase, regardless of whether the database is busy.

Log:

Done waiting for events up to lock; duration=975.482481ms

This indicates that the write lock duration on the change table is approximately 1 second.

The investigation revealed that the source of the long-term lock holding was time.Sleep().

Source code:

func (this *Migrator) executeWriteFuncs() error {
	...
	for {
		select {
		case eventStruct := <-this.applyEventsQueue:
			{
				if err := this.onApplyEventStruct(eventStruct); err != nil {
					return err
				}
			}
		default:
			{
				select {
				case copyRowsFunc := <-this.copyRowsQueue:
					{
						...
					}
				default:
					{
						// Hmmmmm... nothing in the queue; no events, but also no row copy.
						// This is possible upon load. Let's just sleep it over.
						this.migrationContext.Log.Debugf("Getting nothing in the write queue. Sleeping...")
						time.Sleep(time.Second)
					}
				}
			}
		}
	}
}

The presence of time.Sleep(time.Second) causes the waitForEventsUpToLock function in the cutOver phase to be unable to quickly obtain the binlog processing completion notification, resulting in an excessively long locking time.

Because the binlog processing completion notification is in the this.applyEventsQueue, it is ultimately triggered by the execution of the this.onApplyEventStruct() function.

I tried removing time.Sleep and running gh-ost again to get the lock time.

Done waiting for events up to lock; duration=9.305327ms

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions