1

I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that entire data is not being pushed and a random number of records (between 2000 to 3000) are being insert only.

The code is:

package main

import (
    "database/sql"
    "log"
    "fmt"
    _ "github.com/denisenkom/go-mssqldb"
    "gopkg.in/couchbase/gocb.v1"
)


func main() {
    var (
        ID string
        JSONData string
    )

    var items []gocb.BulkOp      
    cluster, _ := gocb.Connect("couchbase://localhost")
    bucket, _ := cluster.OpenBucket("example", "")

    condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")

    // Get approx 5000 Records From SQL Server in JSON format
    rows, err = condb.Query("Select id, JSONData From User")
    if err != nil {
        log.Fatal(err)
        err = nil
    }

    for rows.Next() {
        _ = rows.Scan(&ID,&JSONData)
        items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    //Bulk Load JSON into Couchbase
    err = bucket.Do(items)
    if err != nil {
        fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
    }

    _ = bucket.Close() 
}

Please tell me where I went wrong here.

FYI the columns ID and JSONdata in sql query contain valid key and JSON strings. Also, any improvement advice in the the way its coded will be appreciated.

  • How are you verifying the 2-3000? From the Couchbase web UI? One recommendation (mostly as a safety) would be to use insert() rather than upsert just in case somehow id is not unique. – Matt Ingenthron Jul 14 '17 at 17:23
  • @MattIngenthron I am actually invoking a stored procedure which populates data in the user table with unique document id for each record and I tried using Insert operation too. Even If I flush the bucket and insert a fresh batch of data, I get the same result. and yeah I'm observing the number of inserts from the Couchbase web UI – Krishan Jangid Jul 17 '17 at 05:41

2 Answers2

1

I missed checking the Err field of InsertOp type and when I did that, I came to know that the items array overflows when the data exceeds it's capacity and a message 'queue overflowed' shows on the screen when you print that field

for i := range items {
    fmt.Println( items[i].(*gocb.InsertOp).Err)
}

Attatched screenshot of the error message is here: Err.png

Is there any workaround for this limitation apart from splitting the data into a number of batches and performing multiple bulk inserts?

0

Why not try using a number of goroutines and a channel to synchronize them. Create a channel of items that need to be inserted, and then start 16 or more goroutines which read form the channel, perform the insert and then continue. The most common obvious bottleneck for a strictly serial inserter is going to be the network round-trip, if you can have many goroutines performing inserts at once, you will vastly improve the performance.

P.S. The issue with bulk insert not inserting every document is a strange one, I am going to take a look into this. As @ingenthr mentioned above though, is it possible that you are doing upsert's and have multiple operations for the same keys?

Old Question, In the Answers section in error: Are you getting any error outputs from the bulk insert?

Brett L
  • 124
  • 2
  • 5
  • No error. As I said, when I execute it, only some part of the data is being uploaded. I tried another alternative by putting an Insert n1ql query in a loop and It works (partially) but I get extra double quotes and backslashes in the JSON plus using a loop would be a inefficient way I guess as I need to build this utility to insert 5000 documents per second. – Krishan Jangid Jul 14 '17 at 04:53
  • Also I know that the capacity of couchbase to handle insertions is way more than this but I can't figure out what's preventing this program to Insert only a bunch of documents at a time. – Krishan Jangid Jul 14 '17 at 13:58
  • Goroutine seems to be the best possible solution for this problem right now and thx for the advice but still the issue is bugging me a lot. As I mentioned in the reply to @ingenthr, I'm pretty sure there are no documents in the bucket having same ids as the data i am trying to insert. I had to split the data into 3 batches and do 3 bulk Insert operations to get the complete data but the speed is a big issue here (plus the number of documents to be inserted have also increased). I wish i could insert as many documents as I want in a single bulk insert operation. – Krishan Jangid Jul 17 '17 at 06:11
  • Also I ran a loop through the items array and verified that all records are being appended successfully. If there's another way to do the bulk insert (like any other similar package or something), then please let me know. – Krishan Jangid Jul 17 '17 at 06:16