1

I'm developing a slow query logs parser package associated with a slow query log replayer in golang. For the replayer, have the following piece of code (in which I added comments for readability):

for {
        // method from my package that returns a Query object, containing headers values
        // and the query itself
        q := p.GetNext()
        if q == (query.Query{}) {
            break
        }
        db.logger.Tracef("query: %s", q.Query)

        // we send the SQL query to a chan that is read by workers.
        // workers just execute the query on the database, that's all.
        // results from the db query are handled in another piece of the code, it doesn't really
        // matter here
        queries <- q.Query

        // We need a reference time
        if firstPass {
            firstPass = false
            previousDate = q.Time
            continue
        }
        
        // Time field contains the Time: field value in the query header
        now := q.Time
        sleeping := now.Sub(previousDate)
        db.logger.Tracef("next sleeping time: %s", sleeping)
        time.Sleep(sleeping) // Here is my issue. For MariaDB the value is < 0 so no sleep is done

        // For MariaDB, when there is multiple queries in a short amount of
        // time, the Time field is not repeated, so we do not have to update
        // the previous date.
        if now != (time.Time{}) {
            previousDate = now
        }
    }

I ran into an interesting issue: In MariaDB slow query logs, if 2 (or more) queries are close to each other, there is no Time: field in the header, which reduce the number of time.Sleep(sleeping) in the previous code snippet. However, with MySQL-style slow query logs, there is always a Time: field in the query header, which means that the sleep is done for each query (even for µs sleep duration).

I noticed a huge replaying time difference between MariaDB and MySQL logs; MariaDB replaying duration was quite similar to the real time (time difference between the first and the last query of the log file), but on the other hand MySQL replaying time were quite higher that IRL. After playing with pprof I noticed that the problem came from time.Sleep, especially from runtime.Futex that is CPU time consuming.

I did some benchmarking and the duration results correlate the number of time.Sleep that is done (which is higher with MySQL than with MariaDB).

So instead of doing all the time.Sleep in a single thread, I'm looking for a different way to execute them in parallel without altering the effective time, but I can't figure a way to do this.

hacb
  • 175
  • 2
  • 10
  • 1
    seems to me you are misreading the output of pprof. What else do you expect it to show ? –  Apr 07 '21 at 10:49
  • the numerous syscalls invoked by time.sleep is a known issue: https://github.com/golang/go/issues/25471 – hacb Apr 07 '21 at 11:57
  • What is your goal? To execute the same queries at the same pace? Or to test how much faster the set of queries _could_ be run? Or to compare hardware configurations? Or to test a change in indexes/etc? Or...? (Frankly, I don't understand why you are trying to simulate the inter-query delays.) – Rick James Apr 07 '21 at 14:57
  • the idea is to analyze the behavior of a database, by replaying clients queries. This way, I can compare different cloud provider performance (for example Cloud SQL) with a production dataset, and see if it is faster/slower than my current setup or another cloud provider. The package I wrote translates the slow query logs into Query objects, and the replayer replays those queries on a database. Sorry if my explanation isn't very clear ! – hacb Apr 07 '21 at 16:25

1 Answers1

0

The solution I came along with is the following:

type job struct {
    query string
    idle  time.Time
}

...
    var reference time.Time
    start := time.Now()
    for {
        q := p.GetNext()
        if q == (query.Query{}) {
            s.Stop()
            break
        }
        db.logger.Tracef("query: %s", q.Query)

        r.queries++
        s.Suffix = " queries replayed: " + strconv.Itoa(r.queries)

        // We need a reference time
        if firstPass {
            firstPass = false
            reference = q.Time
        }

        var j job
        delta := q.Time.Sub(reference)
        j.idle = start.Add(delta)
        j.query = q.Query
        db.logger.Tracef("next sleeping time: %s", j.idle)
        jobs <- j
    }

...

func (db database) worker(jobs chan job, errors chan error, wg *sync.WaitGroup) {
    defer wg.Done()
    for {
        j, ok := <-jobs
        if !ok {
            db.logger.Trace("channel closed, worker exiting")
            return
        }
        sleep := time.Until(j.idle)
        if sleep > 0 {
            time.Sleep(sleep)
        }
        rows, err := db.drv.Query(j.query)
        if err != nil {
            errors <- err
            db.logger.Debugf("failed to execute query:\n%s\nerror: %s", j.query, err)
        }
        if rows != nil {
            rows.Close()
        }
    }
}

Explaination:

We save the start of the program in a variable (here start). Next, we set a reference time (in reference) which is the very first timestamp of the slow query logs file. It will never change.

Then, at each new query, we calculate the duration between reference and the current query timestamp q.Time. Let's store it in delta.

We add delta to start and we have a timestamp in our timeline (not in the past as in the slow query log file). We send this timestamp to the worker next to the query in a new struct we created called job.

When the worker receives the job via the channel, he calculates the time to wait until he can do the query. If it <= 0, it does the query immediately, otherwise he waits.

hacb
  • 175
  • 2
  • 10
  • For those interested, the package will be at https://github.com/devops-works/slowql. It is not public yet, as it is under development. Thanks to those who took time to check my problem ! – hacb Apr 07 '21 at 14:31