2

I have two dataframes, one of them is just a single row, and I would like to transform each of the columns in the first one with the values in the single row in some fashion. How do I do this? Here's what I want to achieve:

df1 = pl.DataFrame({'c1': [2,4,6],'c2': [20,40,60],'c3': [10,20,30]})
df2 = pl.DataFrame({'c1': [2],'c2': [20],'c3': [10]})
df = df.select([
    pl.col('c1')/df2['c1'],
    pl.col('c2')/df2['c2'],
    pl.col('c3')/df2['c3'],
])

Now, imagine I have hundreds of columns. Above code doesn't scale, how do I do this best? Thanks!

ste_kwr
  • 820
  • 1
  • 5
  • 21
  • Perhaps this operation has been simplified since I last checked, but you can use `.lazy()` + `.with_context()` - https://stackoverflow.com/a/74835365 – jqurious Jun 12 '23 at 15:14
  • 2
    @jqurious seems overkill - he only has 1 row in the second data frame, which means, that thing is actually a mapping (let's call it `m`), in which case `df1.select([pl.col(c)/m[c] for c in df1.columns])` would be just fine. – Radu Jun 12 '23 at 15:19
  • 1
    Radu's solution works as is, without mapping to a dict. The good thing about it is I can modify only a subset of columns if I like by using with_columns and providing the list of columns to modify. – ste_kwr Jun 12 '23 at 16:01

1 Answers1

1

If df2 is guaranteed to be a single row AND the names from df1 and df2 will always match then you can do:

df1.select(pl.col(x)/df2[x] for x in df1.columns)

If df2 is more than a single row or if the name in df1 don't exist in df2 then this will error out.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72