0

Hello i facing some problem with creating genericUDF of hive and register as temporary function but when i call it its call twice see code given below

i create a genericUDF with following code

class GenUDF extends GenericUDF{
   var queryOI: StringObjectInspector = null
   var argumentsOI: Array[ObjectInspector] = null

 override def initialize (arguments: Array[ObjectInspector]):ObjectInspector = {
   /*if (arguments.length == 0) {
     throw new UDFArgumentLengthException("At least one argument must be specified")
   }
   if (!(arguments(0).isInstanceOf[StringObjectInspector])) {
     throw new UDFArgumentException("First argument must be a string")
   }
   queryOI = arguments(0).asInstanceOf[StringObjectInspector]
   argumentsOI = arguments*/
   println("inside initializeweeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee")
   return PrimitiveObjectInspectorFactory.javaStringObjectInspector
  }

  override def evaluate (arguments: Array[GenericUDF.DeferredObject]):Object = {
    println("inside generic UDF::::::::::::::::::::::((((((((((((((((((((((((FDDDDDDDDDDDDD:")
    4.toString
  }

  def getDisplayString(children: Array[String]): String = {
    println("inside displayssssssssssssssssssssssssssssssss")
      return "udft"
    }

}

And when i register it with following statement

 hiveContext.sql("CREATE TEMPORARY FUNCTION udft AS 'functions.GenUDF'")

and when i call this function with following command

select udft()

it will execute the print statement in evaluate body twice.

Sandeep Purohit
  • 3,652
  • 18
  • 22
  • you are running your code with --master local[2] ? – eliasah Apr 11 '16 at 08:37
  • yess i set configuration for master as local[2] so its take 2 threads so can u please explain what is the reason behind execute UDF twice. – Sandeep Purohit Apr 11 '16 at 08:49
  • it's not executed twice, it's just being distributed among 2 cores, so each core execute is once per partition. It's one of the basics of distributed computing. – eliasah Apr 11 '16 at 08:53
  • cool thankss i will set configuration for 1 core n then check it out. – Sandeep Purohit Apr 11 '16 at 09:38
  • But i have one question like if i have insert statement in my UDF then it will insert that record twice. – Sandeep Purohit Apr 11 '16 at 09:40
  • Why do you want to use one core ? What's the point of using a distributed computing framework if you want to running mono thread or mono core ? – eliasah Apr 11 '16 at 09:48
  • i want to run it in distributed manner but i m facing one problem like in my udf i have insert query soo when it is going to execute its insert twice and i also put configuration as local[1] its print twice and when i run with local[4] it even print it twice . so i think its not dependent on cores. – Sandeep Purohit Apr 11 '16 at 10:01
  • I've never seen such a usage for udf to insert. – eliasah Apr 11 '16 at 10:04
  • but i have use case like this bcoz i create my UDF at run time. and udf can have anything. – Sandeep Purohit Apr 11 '16 at 10:06
  • @eliasah Actually It can be called arbitrary number of times. – zero323 Apr 11 '16 at 11:58
  • @zero323 So there is any solution to stop this?? like if in my function there is insert statement it will call twice or thrice. – Sandeep Purohit Apr 12 '16 at 05:25

0 Answers0