0

The column names of my spark dataframe df are: A_x1, A_x2, B_x1, B_x2, C_x1, C_x2.

How do I create 3 new spark dataframes from df by using the prefixes? The output should look like this:

  • dataframe named A_ contains the columns A_x1, A_x2,
  • dataframe named B_ contains the columns B_x1, B_x2,
  • dataframe named C_ contains the columns C_x1, C_x2.

Thank you!

mck
  • 40,932
  • 13
  • 35
  • 50
Nele
  • 81
  • 1
  • 9

1 Answers1

4

You can use colRegex to filter the columns:

A_ = df.select(df.colRegex('`A_.*`'))
B_ = df.select(df.colRegex('`B_.*`'))
C_ = df.select(df.colRegex('`C_.*`'))
mck
  • 40,932
  • 13
  • 35
  • 50