I need to extract a text from a String row items in pyspark. I have tried almost all options available but no luck.
Submission ID
Dx.CS1.2023-21-01
DX.RE1.2023-12-01
DX.G1.2021-23-01
DX.G2.2022-10-12
Desired Out:
ID
CS1
RE1
G1
G2
I need to extract a text from a String row items in pyspark. I have tried almost all options available but no luck.
Submission ID
Dx.CS1.2023-21-01
DX.RE1.2023-12-01
DX.G1.2021-23-01
DX.G2.2022-10-12
Desired Out:
ID
CS1
RE1
G1
G2
Because the requirements are not particularly clear, suppose you need to separate by multiple separators (such as spaces
and dots
in the sample data) to obtain the second position element. Assuming the original field name is col
, the solution is as follows:
df = df.select(F.split('col', ' |\\.')[1].alias('col'))