-1

I need to extract a text from a String row items in pyspark. I have tried almost all options available but no luck.

Submission ID
Dx.CS1.2023-21-01
DX.RE1.2023-12-01
DX.G1.2021-23-01
DX.G2.2022-10-12

Desired Out:

ID
CS1
RE1
G1
G2
Biplab1985
  • 127
  • 2
  • 9

1 Answers1

1

Because the requirements are not particularly clear, suppose you need to separate by multiple separators (such as spaces and dots in the sample data) to obtain the second position element. Assuming the original field name is col, the solution is as follows:

df = df.select(F.split('col', ' |\\.')[1].alias('col'))
过过招
  • 3,722
  • 2
  • 4
  • 11