1

The target table in kudu is huge. I have the following in scala and I would like to check if the row exists in kudu. These four columns are primary keys in kudu table but when I define a upper bound I seem to get all the rows.

How do I select a particular row in kudu? Here i expect only one row to be returned.

val table2 : KuduTable = kuduClient.openTable("event-sets")
    val eventColumns: util.List[String] = List(
      OccurrenceSchema.SetId.name,
      OccurrenceSchema.Period.name,
      OccurrenceSchema.Event.name,
      OccurrenceSchema.Date.name).asJava

     val end:PartialRow  = table2.getSchema.newPartialRow()
    end.addInt(OccurrenceSchema.Period.name,1476)
    end.addInt(OccurrenceSchema.SetId.name,82)
    end.addInt(OccurrenceSchema.Event.name,3195167)
    end.addLong(OccurrenceSchema.Date.name,1367922840000L)

    val kuduScanner: KuduScanner = kuduClient.newScannerBuilder(table2)
      .setProjectedColumnNames(eventColumns)
      .lowerBound(end)
      .exclusiveUpperBound((end))
      .build()

    assert(kuduScanner.hasMoreRows)
    while (kuduScanner.hasMoreRows) {
      val resultIterator: RowResultIterator = kuduScanner.nextRows()
      while (resultIterator.hasNext) {
        val result: RowResult = resultIterator.next()
        assert(result != null)
        logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name))
        logger.info(" : Period Value -- " + result.getInt(OccurrenceSchema.Period.name))
        logger.info(" : Event Value -- " + result.getInt(OccurrenceSchema.Event.name))
        logger.info(" : Date Value -- " + result.getLong(OccurrenceSchema.Date.name)) 
}
}
tk421
  • 5,775
  • 6
  • 23
  • 34
user3897533
  • 417
  • 1
  • 8
  • 24

1 Answers1

2

From my understanding, you are looking for eaxcly one record in your table. Using a scanner and defining bounds and / or a limit with didn't worked for me either. Instead I solved the problem by defining a KuduPredicate. Below you will find my solution.

val builder: KuduScannerBuilder = kuduClient.newScannerBuilder(table2)
// define columns, you want to select
builder.setProjectedColumnNames(eventColumns)

// add predicates to select a record by primary key
val pkPeriod: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Period.name), KuduPredicate.ComparisonOp.EQUAL, 1476)
builder.addPredicate(pkPeriod)
val pkSetId: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.SetId.name), KuduPredicate.ComparisonOp.EQUAL, 82)
builder.addPredicate(pkSetId)
val pkEvent: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Event.name), KuduPredicate.ComparisonOp.EQUAL, 3195167)
builder.addPredicate(pkEvent)
val pkDate: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Date.name), KuduPredicate.ComparisonOp.EQUAL, 1367922840000L)
builder.addPredicate(pkDate)

val kuduScanner: KuduScanner = builder.build()

while (kuduScanner.hasMoreRows) {
  val resultIterator: RowResultIterator = kuduScanner.nextRows()
  while (resultIterator.hasNext) {
    val result: RowResult = resultIterator.next()

    // do whatever you have to do with the selected record
    logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name))
  }
}

I'm new to Kudu, therefore I'm not sure, whether this solution is the most efficient one. At least, it returns the expected result.

My original code is written and tested in Java. I have ported it manually to Scala but I haven't tested it so far!

Olaf H
  • 496
  • 2
  • 9