I'm using Go routines to send queries to PostgreSQL master and slave nodes in parallel. The first host that returns a valid result wins. Error cases are outside the scope of this question.
The caller is the only one that cares about the contents of a *sql.Rows
object, so intentionally my function doesn't do any operations on those. I use buffered channels to retrieve return objects from the Go routines, so there should be no Go routine leak. Garbage collection should take care of the rest.
There is a problem I haven't taught about properly: the Rows objects that remain behind in the channel are never closed. When I call this function from a (read only) transaction, tx.Rollback()
returns an error for every instance of non-closed Rows
object: "unexpected command tag SELECT"
.
This function is called from higher level objects:
func multiQuery(ctx context.Context, xs []executor, query string, args ...interface{}) (*sql.Rows, error) {
rc := make(chan *sql.Rows, len(xs))
ec := make(chan error, len(xs))
for _, x := range xs {
go func(x executor) {
rows, err := x.QueryContext(ctx, query, args...)
switch { // Make sure only one of them is returned
case err != nil:
ec <- err
case rows != nil:
rc <- rows
}
}(x)
}
var me MultiError
for i := 0; i < len(xs); i++ {
select {
case err := <-ec:
me.append(err)
case rows := <-rc: // Return on the first success
return rows, nil
}
}
return nil, me.check()
}
Executors can be *sql.DB
, *sql.Tx
or anything that complies with the interface:
type executor interface {
ExecContext(ctx context.Context, query string, args ...interface{}) (sql.Result, error)
QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error)
QueryRowContext(ctx context.Context, query string, args ...interface{}) *sql.Row
}
Rollback logic:
func (mtx MultiTx) Rollback() error {
ec := make(chan error, len(mtx))
for _, tx := range mtx {
go func(tx *Tx) {
err := tx.Rollback()
ec <- err
}(tx)
}
var me MultiError
for i := 0; i < len(mtx); i++ {
if err := <-ec; err != nil {
me.append(err)
}
}
return me.check()
}
MultiTx
is a collection of open transactions on multiple nodes. It is a higher level object that calls multiQuery
What would be the best approach to "clean up" unused rows? Options I'm thinking about not doing:
- Cancel the context: I believe it will work inconsistently, multiple queries might already have returned by the time
cancel()
is called - Create a deferred Go routine which continues to drain the channels and close the rows objects: If a DB node is slow to respond,
Rollback()
is still called beforerows.Close()
- Use a
sync.WaitGroup
somewhere in the MultiTx type, maybe in combination with (2): This can cause Rollback to hang if one of the nodes is unresponsive. Also, I wouldn't be sure how I would implement that. - Ignore the Rollback errors: Ignoring errors never sounds like a good idea, they are there for a reason.
What would be the recommended way of approaching this?
Edit:
As suggested by @Peter, I've tried canceling the context, but it seems this also invalidates all the returned Rows from the query. On rows.Scan
I'm getting context canceled
error at the higher level caller.
This is what I've done so far:
func multiQuery(ctx context.Context, xs []executor, query string, args ...interface{}) (*sql.Rows, error) {
ctx, cancel := context.WithCancel(ctx)
defer cancel()
rc := make(chan *sql.Rows, len(xs))
ec := make(chan error, len(xs))
for _, x := range xs {
go func(x executor) {
rows, err := x.QueryContext(ctx, query, args...)
switch { // Make sure only one of them is returned
case err != nil:
ec <- err
case rows != nil:
rc <- rows
cancel() // Cancel on success
}
}(x)
}
var (
me MultiError
rows *sql.Rows
)
for i := 0; i < len(xs); i++ {
select {
case err := <-ec:
me.append(err)
case r := <-rc:
if rows == nil { // Only use the first rows
rows = r
} else {
r.Close() // Cleanup remaining rows, if there are any
}
}
}
if rows != nil {
return rows, nil
}
return nil, me.check()
}
Edit 2:
@Adrian mentioned:
we can't see the code that's actually using any of this.
This code is reused by type methods. First there is the transaction type. The issues in this question are appearing on the Rollback()
method above.
// MultiTx holds a slice of open transactions to multiple nodes.
// All methods on this type run their sql.Tx variant in one Go routine per Node.
type MultiTx []*Tx
// QueryContext runs sql.Tx.QueryContext on the tranactions in separate Go routines.
// The first non-error result is returned immediately
// and errors from the other Nodes will be ignored.
//
// If all nodes respond with the same error, that exact error is returned as-is.
// If there is a variety of errors, they will be embedded in a MultiError return.
//
// Implements boil.ContextExecutor.
func (mtx MultiTx) QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
return multiQuery(ctx, mtx2Exec(mtx), query, args...)
}
Then there is:
// MultiNode holds a slice of Nodes.
// All methods on this type run their sql.DB variant in one Go routine per Node.
type MultiNode []*Node
// QueryContext runs sql.DB.QueryContext on the Nodes in separate Go routines.
// The first non-error result is returned immediately
// and errors from the other Nodes will be ignored.
//
// If all nodes respond with the same error, that exact error is returned as-is.
// If there is a variety of errors, they will be embedded in a MultiError return.
//
// Implements boil.ContextExecutor.
func (mn MultiNode) QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
return multiQuery(ctx, nodes2Exec(mn), query, args...)
}
These methods the public wrappers around the multiQuery()
function. Now I realize that just sending the *Rows
into a buffered channel to die, is actually a memory leak. In the transaction cases it becomes clear, as Rollback()
starts to complain. But in the non-transaction variant, the *Rows
inside the channel will never be garbage collected, as the driver might hold reference to it until rows.Close()
is called.
I've written this package to by used by an ORM, sqlboiler. My higher level logic passes a MultiTX
object to the ORM. From that point, I don't have any explicit control over the returned Rows
. A simplistic approach would be that my higher level code cancels the context before Rollback()
, but I don't like that:
- It gives a non-intuitive API. This (idiomatic) approach would break:
ctx, cancel = context.WithCancel(context.Background())
defer cancel()
tx, _ := db.BeginTx(ctx)
defer tx.Rollback()
- The ORM's interfaces also specify the regular, non-context aware
Query()
variants, which in my package's case will run againstcontext.Background()
.
I'm starting to worry that this broken by design... Anyway, I will start by implementing a Go routine that will drain the channel and close the *Rows
. After that I will see if I can implement some reasonable waiting / cancellation mechanism that won't affect the returned *Rows