7

I have a data.frame with 3 cols: date, rate, price. I want to add columns that come from a matrix, after rate and before price.

df = tibble('date' = c('01/01/2000', '02/01/2000', '03/01/2000'),
        'rate' = c(7.50, 6.50, 5.54),
        'price' = c(92, 94, 96))

I computed the lags of rate using a function that outputs a matrix:

rate_Lags = matrix(data = c(NA, 7.50, 5.54, NA, NA, 7.50), ncol=2, dimnames=list(c(), c('rate_tMinus1', 'rate_tMinus2'))

I want to insert those lags after rate (and before price) using names indexing rather than column order.

The add_column function from tibble package (Adding a column between two columns in a data.frame) does not work because it only accepts an atomic vector (hence if I have 10 lags I will have to call add_column 10 times). I could use apply in my rate_Lags matrix. Then, however, I lose the dimnames from my rate_Lags matrix.

Using number indexing (subsetting) (https://stat.ethz.ch/pipermail/r-help/2011-August/285534.html) could work if I knew the position of a specific column name (any function that retrieves the position of a column name?).

Is there any simple way of inserting a bunch of columns in a specific position in a data frame/tibble object?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66

3 Answers3

3

You may be overlooking the following

library(dplyr)
I <- which(names(df) == "rate")
if (I == ncol(df)) {
    cbind(df, rate_Lags)
} else {
    cbind(select(df, 1:I), rate_Lags, select(df, (I+1):ncol(df))) 
}

#     date rate rate_tMinus1 rate_tMinus2 price
# 1 0.0005 7.50           NA           NA    92
# 2 0.0010 6.50         7.50           NA    94
# 3 0.0015 5.54         5.54          7.5    96
CPak
  • 13,260
  • 3
  • 30
  • 48
  • I have edited the question. There was a bug in the definition of `df`: the OP wrote `01/01/2000`, etc, which are sequences of divisions giving `0.0005`, etc, when it should obviously be dates. – Rui Barradas Jun 19 '18 at 03:42
1

Maybe this is not very elegant, but you only call the function once and I believe it's more or less general purpose.

fun <- function(DF, M){
  nms_DF <- colnames(DF)
  nms_M <- colnames(M)
  inx <- which(sapply(nms_DF, function(x) length(grep(x, nms_M)) > 0))
  cbind(DF[seq_len(inx)], M, DF[ seq_along(nms_DF)[-seq_len(inx)] ])
}

fun(df, rate_Lags)
#        date rate rate_tMinus1 rate_tMinus2 price
#1 01/01/2000 7.50           NA           NA    92
#2 02/01/2000 6.50         7.50           NA    94
#3 03/01/2000 5.54         5.54          7.5    96
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I think you can further simplify this logic - `idx <- seq.int(1L,match("rate",names(df)))` and `cbind(df[idx], rate_Lags, df[-(idx)])` for instance. – thelatemail Jun 19 '18 at 04:14
  • @thelatemail Yes, you're right, but like it is I don't need to know the name of the column to match. That's why I thought of `grep`. It's more complicated but a `sapply` loop over a `colnames` attribute shouldn't take much time. – Rui Barradas Jun 19 '18 at 04:59
1

We could unclass the dataset to a list and then use append to insert 'rate_Lags' at specific locations, reconvert the list to data.frame

i1 <- match('rate', names(df))
data.frame(append(unclass(df), as.data.frame(rate_Lags), after = i1))
#        date rate rate_tMinus1 rate_tMinus2 price
#1 01/01/2000 7.50           NA           NA    92
#2 02/01/2000 6.50         7.50           NA    94
#3 03/01/2000 5.54         5.54          7.5    96

Or with tidyverse

library(tidyverse)
rate_Lags %>%
       as_tibble %>%
       append(unclass(df), ., after = i1) %>% 
       bind_cols
# A tibble: 3 x 5
#  date        rate rate_tMinus1 rate_tMinus2 price
#  <chr>      <dbl>        <dbl>        <dbl> <dbl>
#1 01/01/2000  7.5         NA            NA      92
#2 02/01/2000  6.5          7.5          NA      94
#3 03/01/2000  5.54         5.54          7.5    96
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Easy solution and it doesn't require a library. The only drawback is that I don't know the position in which to insert the column (in this case `after = 2` ). This could be solved with the use of `which(names(df) = 'rate')`. – Emiliano A. Carlevaro Jun 19 '18 at 05:04
  • @EmilianoA.Carlevaro or use `match('rate', names(df))` – akrun Jun 19 '18 at 05:04
  • There's also an `add_column()` function in `dplyr` – meriops Jun 19 '18 at 06:41
  • @meriops Yes, it is there, but it would be to add to a data.frame and the OP have several columns to add instead of a single one – akrun Jun 19 '18 at 06:42
  • 1
    oops, I missed that, sorry. Then a quick-and-dirty way could be to extract "date" into df1 and "rate" and "price" into df2, and then column-bind df1, the columns to insert, and df2... – meriops Jun 20 '18 at 07:29