0

I want to add columns containing polynomials to a dataframe (DF).

Background: I need to use polynomials in a glmnet setting. I cannot call poly() directly in the glmnet() estimation command. I get an error, likely because my “Xtrain” data contain factors. My workaround is to slice my Xtrain DF in two pieces, one containing all factors (for which no transformation is needed) and one containing the rest, viz. the numeric columns.

Now I want to add columns with polynomials to my numeric DF. Here is a minimal example of my problem.

# Some data
x <- 1:10
y <- 11:20
df = as.data.frame(cbind(x,y))

# Looks like this
    x  y
1   1 11
2   2 12
3   3 13

# Now I generate polys
lapply(df, function(i) poly(i, 2, raw=T)[,1:2])

However, I cannot figure out how to "cbind" the results. What I want to have in the end is a DF in which x, x^2, y, y^2, are contained. Order does not matter. However, ideally I would also have column labels (to identify the polys). For instance like this:

     x x2 y  y2
 1   1 1 11 121
 2   2 4 12 144
 3   3 9 13 169

Thank you... Cheers!

Peter
  • 2,120
  • 2
  • 19
  • 33

3 Answers3

2

We can use do.call

do.call(cbind, lapply(df, function(i) poly(i, 2, raw=T)[,1:2]))

If we just need squares

cbind(df, as.matrix(df)^2)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • akrun would you by chance have an answer for this one: https://stackoverflow.com/questions/53033225/r-data-table-row-based-conditions-split-apply-combine – gaut Oct 29 '18 at 19:50
2

Another option is

as.data.frame(lapply(df, function(i) poly(i, 2, raw=T)[,1:2]))
#   x.1 x.2 y.1 y.2
#1    1   1  11 121
#2    2   4  12 144
#3    3   9  13 169
# ...

As mentioned by @gpier and @akrun already, you might use ^ instead of poly

n <- 2
df[paste(names(df), n, sep = "_")] <- df^n
df
markus
  • 25,843
  • 5
  • 39
  • 58
1

poly is not the right function if you need squares. Try

cbind(df,lapply(df, function(x) x^2))

    x  y   x   y
1   1 11   1 121
2   2 12   4 144
3   3 13   9 169
4   4 14  16 196
5   5 15  25 225
6   6 16  36 256
7   7 17  49 289
8   8 18  64 324
9   9 19  81 361
10 10 20 100 400

EDIT: indeed you don't even need lapply, you could just use cbind(df, df^2)

gaut
  • 5,771
  • 1
  • 14
  • 45