1

I have a problem with using instr() function in Spark. The definition of function looks like below: instr(Column str, String substring)

I want to use instr in the same way as it is in Impala like:

instr(Column str, String substring, Int [position]) - return index position

In spark we option to give only 2 parameters, but i need to use 3rd parameter with int value basically (-1)

Col has value like

SNNNN NNNNN NNSNN SNSNS NNNNS

Expected code:- instr("ColName", "S", -1) Expected result :- 1 0 3 5 5

sathya
  • 1,982
  • 1
  • 20
  • 37
HEMANT PATEL
  • 77
  • 1
  • 11
  • 2
    But if you use `instr("ColName", "S", -1)` in impala it will return `29`. I guess you want to get index of first `S` occurence in each part delimited by space? If so, expected output should be 1,0,3,1,5 instead of 1,0,3,5,5 ? – chlebek Jul 15 '20 at 09:29
  • No, i want output as 1,0,3,5,5 only. in impala if we use in same way it is giving the output as i have mentioned – HEMANT PATEL Jul 15 '20 at 10:03
  • Check now, I have added udf. which will give you expected result i.e 1,0,3,5,5 – Srinivas Jul 15 '20 at 10:32

2 Answers2

3

If you wanted to go with -ve position number, substring_index + length might be helpful as below-

 val frame = Seq("SNNNN","NNNNN","NNSNN","SNSNS","NNNNS").toDF("values")
    frame.withColumn("x", length($"values") - length(substring_index($"values", "S", -1)))
      .show(false)
    /**
      * +------+---+
      * |values|x  |
      * +------+---+
      * |SNNNN |1  |
      * |NNNNN |0  |
      * |NNSNN |3  |
      * |SNSNS |5  |
      * |NNNNS |5  |
      * +------+---+
      */
Som
  • 6,193
  • 1
  • 11
  • 22
2

Solution using UDF in spark:

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val df = sc.parallelize(Seq("SNNNN","NNNNN","NNSNN","SNSNS","NNNNS")).toDF("values")
val findIndexUDF = udf( (x: String) => (x.lastIndexOf("S")+1))
df.withColumn("indexval", findIndexUDF($"values")).show()
/*+------+--------+
|values|indexval|
+------+--------+
| SNNNN|       1|
| NNNNN|       0|
| NNSNN|       3|
| SNSNS|       5|
| NNNNS|       5|
+------+--------+*/
sathya
  • 1,982
  • 1
  • 20
  • 37