pyspark retrieve variable numbers of values from a dataframe

Question

using pyspark I have reached a point where I can no longer move forward. I have a table that passes me the name of certain fields separated by a hyphen (-), the number of these fields is variable. I need to find a way to go and read (and concatenate with each other) the various values of these fields in a predetermined table.

Assuming that the field names are in a "columnsname" variable and the table (Dataframe) is called df, how can I solve this problem?

columnsnames = columnsnames1.split("-")
df = spark.read.parquet(path_table + table_name)

EDIT: I need to read the values of the columnsnames, I tried doing

for c in columnsnames:
F.col(c)

but it didn't work

Does this answer your question? [Concat multiple columns of a dataframe using pyspark](https://stackoverflow.com/questions/54921359/concat-multiple-columns-of-a-dataframe-using-pyspark) — blackbishop, Nov 12 '21 at 11:38

score 1 · Accepted Answer · answered Nov 12 '21 at 11:31

For can use concat after upacking the list of columnsnames using *.

import pyspark.sql.functions as F


df = spark.createDataFrame([('abcd','123', '456')], ['s', 'd', 'f'])

df.select(*[columnsnames]).select(F.concat(*[F.col(colname) for colname in columnsnames])).show()

Output

+---------------+
|concat(s, d, f)|
+---------------+
|     abcd123456|
+---------------+

pyspark retrieve variable numbers of values from a dataframe

1 Answers1

Output