0

I am trying to run substring function on a column(CompleteLine) using a variable(StringStartPoint) for the start position.

I tried few option as given below , but both are failing with different reason. How could I use variable inside select function easily.

StringStartPoint=10

df2 = df1.select(f.substring(f.col("CompleteLine"),StringStartPoint,f.col("StringLength"))).alias('MySubString')

TypeError: Column is not iterable . This is not recognizing the 3rd parameter as a value.

df2 = df1.select(f.expr("substring(col(CompleteLine),StringStartPoint,col(StringLength))").alias('MySubString')

AnalysisException: Cannot resolve StringStartPoint given input column . This is recognizing the 2nd parameter as a dataframe field.
Mohana B C
  • 5,021
  • 1
  • 9
  • 28
iamaj
  • 11
  • 4
  • Does this answer your question? [use length function in substring in spark](https://stackoverflow.com/questions/46353360/use-length-function-in-substring-in-spark) – notNull Sep 03 '21 at 20:07

1 Answers1

0

3rd parameter of substring() should be of type integer not column.

Pass length of the column as argument for 3rd parameter.

StringStartPoint=10

df2 = df1.select(f.substring(f.col("CompleteLine"),StringStartPoint,f.length(f.col("CompleteLine"))).alias('MySubString'))
Mohana B C
  • 5,021
  • 1
  • 9
  • 28
  • StringLength is an integer type field in df1. It is already evaluated through the length function. – iamaj Sep 03 '21 at 20:29
  • Type of the value which you are passing will be `Column` though the column type is `int`'. Do you need substring from starting point till the end of the main string. If that's the case no need to pass the length also if you use this - `df.select(expr('substring(CompleteLine, 10)'))` – Mohana B C Sep 03 '21 at 20:48